# Performance Benchmarks This page presents performance benchmarks comparing QuaTorch against other quaternion libraries across different computational backends. ## System Information The benchmarks were performed on the following system: - **CPU**: 13th Gen Intel(R) Core(TM) i7-13700H (20 cores) - **RAM**: 16 GB - **GPU**: NVIDIA GeForce RTX 4050 Laptop GPU (6141 MiB) - **System**: Linux 6.14.0-37-generic - **Python**: 3.13.3 (CPython) ## Benchmark Methodology The benchmarks compare the following implementations: - **numpy-quaternion**: Using the [`numpy-quaternion`](https://quaternion.readthedocs.io/en/latest/) package - **quaternionic**: Using the [`quaternionic`](https://quaternionic.readthedocs.io/en/latest/) library - **cpu_eager**: QuaTorch on CPU in eager mode - **cpu_compiled**: QuaTorch on CPU with `torch.compile` - **cuda_eager**: QuaTorch on CUDA in eager mode - **cuda_compiled**: QuaTorch on CUDA with `torch.compile` Each benchmark measures execution time across different input sizes to evaluate scaling behavior. The bars represent mean execution time, with whiskers showing the range between minimum and maximum measured times. ```{admonition} ⚠️ **Note on Fair Comparisons** These benchmarks should be interpreted with caution for several reasons: 1. **Different Compute Devices**: CPU-based methods (numpy, quaternionic, cpu_eager, cpu_compiled) run on the Intel i7-13700H processor, while CUDA methods run on the NVIDIA RTX 4050 GPU. These are fundamentally different architectures optimized for different workloads. 2. **Compilation Benefits**: The `torch.compile` results show the benefits of PyTorch's JIT compilation, but this benefit is only realized after a warmup period. First-run performance may differ. 3. **Dependencies**: The `numpy-quaternion` library has no dependencies. `quaternionic` depends on `numba` for JIT compilation. QuaTorch's performance depends on PyTorch's optimizations and GPU acceleration. ``` ## Reproducing the Benchmarks To reproduce these benchmarks on your own system: ### 1. Run the Benchmark Tests ```bash # Run all benchmarks and save results to benchmark_output.json uv run pytest test/benchmark/test_performance.py --benchmark-only ``` The benchmark results will be saved to `benchmark_output.json` in the project root. ### 2. Generate Plots After running the benchmarks, generate the visualization plots: ```bash # Run the plotting test uv run python test/benchmark/test_performance.py benchmark_output.json ``` The plots will be saved as PNG files in the project root directory. ## Benchmark Results ### Quaternion Multiplication Multiplication of two vectors of `n` quaternions. Quatorch (CPU Compiled) performed 1.33x faster than `numpy-quaternion` and 5.11x faster than `quaternionic` (geometric mean across all input sizes). ![Multiplication Benchmark](_static/test_performance_multiplication_benchmark.png) ### Vector Rotation Rotation of a 3D vector by a vector of `n` quaternions. Quatorch (CPU Compiled) performed 21.32x faster than `numpy-quaternion` and 4.89x faster than `quaternionic` (geometric mean across all input sizes). ![Rotate Vector Benchmark](_static/test_performance_rotate_vector_benchmark.png) ### Spherical Linear Interpolation (SLERP) SLERP interpolation between two vectors of `n` quaternions. Quatorch (CPU Compiled) performed 14.32x faster than `quaternionic` (geometric mean across all input sizes). ![SLERP Benchmark](_static/test_performance_slerp_benchmark.png)