Snippets

Rasmus Larsen 5LBq9o: Untitled snippet

Created by Rasmus Larsen last modified
SSE:
BM_eigen_rsqrt_float/1                  5.00ns ± 0%             5.00ns ± 0%     ~             (p=1.000 n=5+5)
BM_eigen_rsqrt_float/8                  6.21ns ± 0%             5.14ns ± 0%  -17.25%          (p=0.008 n=5+5)
BM_eigen_rsqrt_float/64                 41.9ns ± 0%             33.0ns ± 0%  -21.19%          (p=0.008 n=5+5)
BM_eigen_rsqrt_float/512                 336ns ± 0%              263ns ± 0%  -21.88%          (p=0.008 n=5+5)
BM_eigen_rsqrt_float/4k                2.65µs ± 0%             2.06µs ± 0%  -22.04%          (p=0.008 n=5+5)
BM_eigen_rsqrt_float/32k               21.4µs ± 1%             16.8µs ± 1%  -21.81%          (p=0.008 n=5+5)
BM_eigen_rsqrt_float/256k               175µs ± 2%              145µs ± 2%  -17.17%          (p=0.008 n=5+5)
BM_eigen_rsqrt_float/1M                 699µs ± 1%              580µs ± 2%  -17.08%          (p=0.008 n=5+5)

AVX (-FMA) on Haswell:
name                                   old time/op             new time/op             delta
BM_eigen_rsqrt_float/1                  5.00ns ± 0%             5.00ns ± 0%     ~             (p=1.000 n=5+5)
BM_eigen_rsqrt_float/8                  4.03ns ± 0%             3.45ns ± 0%  -14.44%          (p=0.008 n=5+5)
BM_eigen_rsqrt_float/64                 25.0ns ± 1%             20.4ns ± 0%  -18.11%          (p=0.008 n=5+5)
BM_eigen_rsqrt_float/512                 196ns ± 2%              158ns ± 0%  -19.26%          (p=0.008 n=5+5)
BM_eigen_rsqrt_float/4k                1.54µs ± 1%             1.24µs ± 0%  -19.47%          (p=0.016 n=5+4)
BM_eigen_rsqrt_float/32k               12.9µs ± 3%             11.2µs ± 5%  -13.01%          (p=0.008 n=5+5)
BM_eigen_rsqrt_float/256k               123µs ± 3%              112µs ± 4%   -8.95%          (p=0.008 n=5+5)
BM_eigen_rsqrt_float/1M                 489µs ± 3%              447µs ± 4%   -8.57%          (p=0.008 n=5+5)

AVX+FMA on Haswell:
name                                   old time/op             new time/op             delta
BM_eigen_rsqrt_float/1                  5.01ns ± 0%             6.70ns ± 0%  +33.88%          (p=0.008 n=5+5)
BM_eigen_rsqrt_float/8                  3.80ns ± 0%             7.16ns ± 0%  +88.59%          (p=0.008 n=5+5)
BM_eigen_rsqrt_float/64                 23.1ns ± 0%             23.6ns ± 0%   +2.39%          (p=0.008 n=5+5)
BM_eigen_rsqrt_float/512                 180ns ± 0%              151ns ± 0%  -16.05%          (p=0.008 n=5+5)
BM_eigen_rsqrt_float/4k                1.42µs ± 1%             1.14µs ± 0%  -19.26%          (p=0.008 n=5+5)
BM_eigen_rsqrt_float/32k               12.1µs ± 6%             10.3µs ±10%  -14.75%          (p=0.008 n=5+5)
BM_eigen_rsqrt_float/256k               119µs ± 4%              106µs ± 5%  -10.95%          (p=0.008 n=5+5)
BM_eigen_rsqrt_float/1M                 471µs ± 3%              420µs ± 4%  -10.84%          (p=0.008 n=5+5)

AVX512 on Skylake:
name                                    old time/op             new time/op             delta
BM_eigen_rsqrt_double/1                  3.95ns ± 7%             4.08ns ± 8%     ~             (p=0.690 n=5+5)
BM_eigen_rsqrt_double/8                  3.73ns ± 1%             4.59ns ±18%  +23.11%          (p=0.016 n=4+5)
BM_eigen_rsqrt_double/64                 25.5ns ± 0%             32.6ns ±18%  +27.98%          (p=0.016 n=4+5)
BM_eigen_rsqrt_double/512                 201ns ± 0%              258ns ±17%  +28.57%          (p=0.016 n=4+5)
BM_eigen_rsqrt_double/4k                1.93µs ± 1%             2.24µs ±20%  +16.11%          (p=0.016 n=4+5)
BM_eigen_rsqrt_double/32k               17.0µs ±17%             15.8µs ± 0%     ~             (p=0.730 n=5+4)
BM_eigen_rsqrt_double/256k               177µs ± 2%              181µs ± 1%   +2.18%          (p=0.032 n=5+5)
BM_eigen_rsqrt_double/1M                 718µs ± 0%              725µs ± 1%   +0.92%          (p=0.008 n=5+5)
BM_eigen_rsqrt_float/1                   2.96ns ± 0%             5.71ns ± 0%  +92.56%          (p=0.008 n=5+5)
BM_eigen_rsqrt_float/8                   17.8ns ± 0%             17.8ns ± 0%     ~             (p=0.151 n=5+5)
BM_eigen_rsqrt_float/64                  10.0ns ± 0%             14.3ns ± 0%  +43.06%          (p=0.016 n=4+5)
BM_eigen_rsqrt_float/512                 77.8ns ± 0%             86.5ns ± 1%  +11.13%          (p=0.008 n=5+5)
BM_eigen_rsqrt_float/4k                   633ns ± 0%              687ns ± 0%   +8.38%          (p=0.008 n=5+5)
BM_eigen_rsqrt_float/32k                5.91µs ± 0%             6.27µs ± 1%   +6.05%          (p=0.008 n=5+5)
BM_eigen_rsqrt_float/256k               84.9µs ± 0%             85.8µs ± 1%   +1.04%          (p=0.008 n=5+5)
BM_eigen_rsqrt_float/1M                  340µs ± 0%              343µs ± 1%   +0.90%          (p=0.032 n=5+5)

Comments (0)

HTTPS SSH

You can clone a snippet to your computer for local editing. Learn more.