Full commit

M4RI is a library for fast arithmetic with dense matrices over F2. The
name M4RI comes from the first implemented algorithm: The "Method of
the Four Russian"” inversion algorithm published by Gregory Bard. This
algorithm in turn is named after the "Method of the Four Russians"
multiplication algorithm which is probably better referred to as
Kronrod's method. 

M4RI is available at


 * basic arithmetic with dense matrices over F2 (addition, equality
   testing, stacking, augmenting, sub-matrices, randomisation);

 * asymptotically fast O(n^log_2(7)) matrix multiplication via the "Method
   of the Four Russians" (M4RM) & Strassen-Winograd algorithm;

 * fast row echelon form computation and matrix inversion via the "Method
   of the Four Russians" (M4RI, O(n^3/log(n)));

 * asymptotically fast PLUQ factorisation; 

 * support for the x86/x86_64 SSE2 instruction set where available;

 * preliminary support for parallisation on shared memory systems via

 * and support for Linux and OS X (GCC), support for Solaris (Sun
   Studio Express) and support for Windows (Visual Studio 2008 Express).

OpenMP support for parallel multiplication and elimination is enabled
with the


configure switch. If GCC is used to compile the library it is avised
to use at least GCC 4.3 since earlier versions have problems with
OpenMP in shared libraries. OpenMP support was introduced in GCC
4.2. Both MSVC and SunCC support OpenMP but we have no experience with
these yet.

Generally speaking better performance improvements can be expected on
dual-core Opteron CPUs than on dual-core Core2Duo CPUs. This is
because the later has a shared L2 cache which is already almost fully
utilised in the single-core implementation.

Overall, the speed-up is considerably but sublinear. See 

for details.