Wiki
Clone wikim4ri / Performance
Performance
This page contains some benchmarketing results for M4RI and other systems. If you think that any result presented here is wrong and/or misleading, please let us know so that we can fix it. All times are in seconds.
Matrix Multiplication
Multiplying two random square matrices over \(\mathbb{F}_2\) of the given dimension. Magma and M4RI both implement Strassen-Winograd on top of M4RM (cf. Further Reading). Our multicore code maxes out at 4 cores.
Intel(R) Xeon(R) CPU E5-2667 v2 @ 3.30GHz (strombenzin@rhul)
n | M4RI/1 core b09b6ef | M4RI/2 cores b09b6ef | M4RI/4 cores b09b6ef | Magma V2.20-5 |
---|---|---|---|---|
8192 | 0.33184 | 0.19273 | 0.10365 | 0.430 |
10000 | 0.62534 | 0.37489 | 0.21772 | 0.710 |
16384 | 2.37502 | 1.38256 | 0.73492 | 2.880 |
32768 | 16.73710 | 9.71417 | 5.12534 | 19.840 |
65536 | 121.48419 | 68.23255 | 35.74736 | 137.580 |
M4RI timings were produced as follows:
$ export OMP_NUM_THREADS=1; for n in 8192 10000 16384 32768 65536; ./bench_multiplication -t 7200 $n $n $n 0 0 $ export OMP_NUM_THREADS=2; for n in 8192 10000 16384 32768 65536; ./bench_multiplication -t 7200 $n $n $n 0 1 $ export OMP_NUM_THREADS=4; for n in 8192 10000 16384 32768 65536; ./bench_multiplication -t 7200 $n $n $n 0 1
Magma timings were produced as follows:
n:=8192; A:=RandomMatrix(GF(2), n, n); B:=RandomMatrix(GF(2), n, n); time C:=A*B;
Rank
Computing the rank of random square matrices over \(\mathbb{F}_2\). We are comparing M4RI, Magma and GF2Toolkit. The benchmarks below indicate that we still have some work to do.
Intel(R) Xeon(R) CPU E5-2667 v2 @ 3.30GHz (strombenzin@rhul)
n | M4RI/PLE b09b6ef | M4RI/M4RI b09b6ef | M4RI/MMPF b09b6ef | Magma V2.20-5 | GF2Toolkit |
---|---|---|---|---|---|
8192 | 0.15024 | 0.12097 | 0.13964 | 0.750 | 0.12268 |
10000 | 0.27492 | 0.21744 | 0.26816 | 1.300 | -- |
16384 | 1.08527 | 1.04686 | 1.24549 | 3.080 | 0.86451 |
32768 | 7.88854 | 9.11358 | 14.85795 | 20.830 | 6.30347 |
65536 | 56.96278 | 69.32725 | 166.14155 | 111.790 | 46.49620 |
M4RI timings were produced as follows:
$ for n in 8192 10000 16384 32768 65536; ./bench_rank -t 7200 $n $n ple 0 $ for n in 8192 10000 16384 32768 65536; ./bench_rank -t 7200 $n $n m4ri 0 $ for n in 8192 10000 16384 32768 65536; ./bench_rank -t 7200 $n $n mmpf 0
Magma timings were produced as follows:
n:=2^16; A:=RandomMatrix(GF(2),n,n); time E:=EchelonForm(A);
GF2Toolkit timings were produced by patching testRankComputation.cc and using the best timing reported (typically for 128-bit wide words).
Old
Timings for previous versions of M4RI are archived here
Updated