Commits

Author Commit Message Labels Comments Date
Martin Albrecht
make OpenMP support configurable
Martin Albrecht
added Bill's cutoff improvement
Martin Albrecht
fixed bug Bill Hart reported, fix all things Valgrind reported and made code run faster on C2D. Unsure about sage.math though, it isn't benchmarkable right now
Martin Albrecht
added more test (corner) cases
Martin Albrecht
added new testcase, cleanup for valgrind
Martin Albrecht
fix bug in reduction introduced by speeding up make_table add test code to catch these things
Martin Albrecht
added support for SSE2 to new _mzd_mul_m4rm_impl this improves performance on C2D considerably, but makes things worse on the Opteron
Martin Albrecht
allow control over number of Gray code tables via define GRAY8 renamed HAVE_OMP to HAVE_OPENMP
Martin Albrecht
use 8 instead of 2 Graycode tables (implementation and idea by Bill Hart)
Martin Albrecht
fixes for the last check-in (all rows are aligned now if no windows are used)
Martin Albrecht
some (style) improvements for SSE2 code by Bill Hart
Martin Albrecht
implemented first parallel strassen-winograd multiplication (compile with -fopnemp -DHAVE_OPENMP)
Martin Albrecht
new implementation of M4RM multiplication with two Gray code tables. The idea is by Bill Hart
Martin Albrecht
removed parameters T and L for M4RM (they weren't used anyway)
Martin Albrecht
fix commenting style
Martin Albrecht
copy window to matrix to improve data locality in strassen multiplication
Martin Albrecht
reverting benchmarking code to square matrices
Martin Albrecht
block'ing naiv matrix multiplication and using that by default if B->ncols < some threshold
Martin Albrecht
faster transpose faster naiv multiplication faster mzd_make_table
Martin Albrecht
re-added SSE2 support to mul_m4rm which gives a quite tiny speed-up removed unused variables
Martin Albrecht
nicer parameter names for mzd_combine
Martin Albrecht
make run_bench return min,median,average and max
Martin Albrecht
document M4RM_BLOCKSIZE
Martin Albrecht
added William Hart's Block M4RM implementation which gives a significant speed-up! adapted m4ri_opt_k for that purpose too
Martin Albrecht
faster naiv multiplication but still not as fast as is could be.
Martin Albrecht
only call _mm_malloc if it is really available
Martin Albrecht
fixing benchmarking/testing code and adding it to revision control
Martin Albrecht
don't use free on _mm_malloc'd memory
Martin Albrecht
compile fix for HAVE_SSE2 == False
Martin Albrecht
some minor documentation updates
  1. Prev
  2. Next