Created by
Rasmus Larsen
last modified
| SSE:
BM_eigen_rsqrt_float/1 5.00ns ± 0% 5.00ns ± 0% ~ (p=1.000 n=5+5)
BM_eigen_rsqrt_float/8 6.21ns ± 0% 5.14ns ± 0% -17.25% (p=0.008 n=5+5)
BM_eigen_rsqrt_float/64 41.9ns ± 0% 33.0ns ± 0% -21.19% (p=0.008 n=5+5)
BM_eigen_rsqrt_float/512 336ns ± 0% 263ns ± 0% -21.88% (p=0.008 n=5+5)
BM_eigen_rsqrt_float/4k 2.65µs ± 0% 2.06µs ± 0% -22.04% (p=0.008 n=5+5)
BM_eigen_rsqrt_float/32k 21.4µs ± 1% 16.8µs ± 1% -21.81% (p=0.008 n=5+5)
BM_eigen_rsqrt_float/256k 175µs ± 2% 145µs ± 2% -17.17% (p=0.008 n=5+5)
BM_eigen_rsqrt_float/1M 699µs ± 1% 580µs ± 2% -17.08% (p=0.008 n=5+5)
AVX (-FMA) on Haswell:
name old time/op new time/op delta
BM_eigen_rsqrt_float/1 5.00ns ± 0% 5.00ns ± 0% ~ (p=1.000 n=5+5)
BM_eigen_rsqrt_float/8 4.03ns ± 0% 3.45ns ± 0% -14.44% (p=0.008 n=5+5)
BM_eigen_rsqrt_float/64 25.0ns ± 1% 20.4ns ± 0% -18.11% (p=0.008 n=5+5)
BM_eigen_rsqrt_float/512 196ns ± 2% 158ns ± 0% -19.26% (p=0.008 n=5+5)
BM_eigen_rsqrt_float/4k 1.54µs ± 1% 1.24µs ± 0% -19.47% (p=0.016 n=5+4)
BM_eigen_rsqrt_float/32k 12.9µs ± 3% 11.2µs ± 5% -13.01% (p=0.008 n=5+5)
BM_eigen_rsqrt_float/256k 123µs ± 3% 112µs ± 4% -8.95% (p=0.008 n=5+5)
BM_eigen_rsqrt_float/1M 489µs ± 3% 447µs ± 4% -8.57% (p=0.008 n=5+5)
AVX+FMA on Haswell:
name old time/op new time/op delta
BM_eigen_rsqrt_float/1 5.01ns ± 0% 6.70ns ± 0% +33.88% (p=0.008 n=5+5)
BM_eigen_rsqrt_float/8 3.80ns ± 0% 7.16ns ± 0% +88.59% (p=0.008 n=5+5)
BM_eigen_rsqrt_float/64 23.1ns ± 0% 23.6ns ± 0% +2.39% (p=0.008 n=5+5)
BM_eigen_rsqrt_float/512 180ns ± 0% 151ns ± 0% -16.05% (p=0.008 n=5+5)
BM_eigen_rsqrt_float/4k 1.42µs ± 1% 1.14µs ± 0% -19.26% (p=0.008 n=5+5)
BM_eigen_rsqrt_float/32k 12.1µs ± 6% 10.3µs ±10% -14.75% (p=0.008 n=5+5)
BM_eigen_rsqrt_float/256k 119µs ± 4% 106µs ± 5% -10.95% (p=0.008 n=5+5)
BM_eigen_rsqrt_float/1M 471µs ± 3% 420µs ± 4% -10.84% (p=0.008 n=5+5)
AVX512 on Skylake:
name old time/op new time/op delta
BM_eigen_rsqrt_double/1 3.95ns ± 7% 4.08ns ± 8% ~ (p=0.690 n=5+5)
BM_eigen_rsqrt_double/8 3.73ns ± 1% 4.59ns ±18% +23.11% (p=0.016 n=4+5)
BM_eigen_rsqrt_double/64 25.5ns ± 0% 32.6ns ±18% +27.98% (p=0.016 n=4+5)
BM_eigen_rsqrt_double/512 201ns ± 0% 258ns ±17% +28.57% (p=0.016 n=4+5)
BM_eigen_rsqrt_double/4k 1.93µs ± 1% 2.24µs ±20% +16.11% (p=0.016 n=4+5)
BM_eigen_rsqrt_double/32k 17.0µs ±17% 15.8µs ± 0% ~ (p=0.730 n=5+4)
BM_eigen_rsqrt_double/256k 177µs ± 2% 181µs ± 1% +2.18% (p=0.032 n=5+5)
BM_eigen_rsqrt_double/1M 718µs ± 0% 725µs ± 1% +0.92% (p=0.008 n=5+5)
BM_eigen_rsqrt_float/1 2.96ns ± 0% 5.71ns ± 0% +92.56% (p=0.008 n=5+5)
BM_eigen_rsqrt_float/8 17.8ns ± 0% 17.8ns ± 0% ~ (p=0.151 n=5+5)
BM_eigen_rsqrt_float/64 10.0ns ± 0% 14.3ns ± 0% +43.06% (p=0.016 n=4+5)
BM_eigen_rsqrt_float/512 77.8ns ± 0% 86.5ns ± 1% +11.13% (p=0.008 n=5+5)
BM_eigen_rsqrt_float/4k 633ns ± 0% 687ns ± 0% +8.38% (p=0.008 n=5+5)
BM_eigen_rsqrt_float/32k 5.91µs ± 0% 6.27µs ± 1% +6.05% (p=0.008 n=5+5)
BM_eigen_rsqrt_float/256k 84.9µs ± 0% 85.8µs ± 1% +1.04% (p=0.008 n=5+5)
BM_eigen_rsqrt_float/1M 340µs ± 0% 343µs ± 1% +0.90% (p=0.032 n=5+5)
|