Commits

Show all
Author Commit Message Labels Comments Date
Martin Albrecht
print cycles per bit in bench elimination and multiplication
Tags
release-20110601
Martin Albrecht
updating README and AUTHORS for upcoming 20110601 release
Martin Albrecht
xor is a restricted keyword in C++
Martin Albrecht
fixed typo which prevented compilation
Martin Albrecht
MS Visual Studio 10 support
Martin Albrecht
adapting release version
Martin Albrecht
merging Carlos' swap patches
Martin Albrecht
only set HAVE_PAPI if we have papi
CarloWood
Copied the improved code of mzd_col_swap to mzd_col_swap_in_rows and added support for start_row/stop_row. The result has the same speed mzd_col_swap (per row).
CarloWood
Add support for transposing multi-block matrices.
CarloWood
Also ignore generated maintainer file ltmain.sh
Martin Albrecht
do not fail if realpath is not installed
Martin Albrecht
follow-up check-in for cache size fix
Martin Albrecht
initialise variables (i.e., take care of Wall reported errors)
Martin Albrecht
install debug_dump.h otherwise programs linking against the library will fail to compile
Martin Albrecht
remove ltmain.sh which is autogenerated
CarloWood
Speed up of mzd_col_swap with a factor of two. Plus added a testsuite for it.
CarloWood
Bug fix in mzd_equal. When shift = B->offset - A->offset turns out to be negative, we swap A and B. I forgot to also reinitialize 'width'. Renamed 'width' to Awidth. Also got rid of __M4RI_LEFT/RIGHT_BITMASK macros.
CarloWood
Bug fix and general fixups. Testsuite for transpose. Added test_transpose.c to the testsuite. Fixed a bug for non-square matrices of specific sizes where uninitialized data was written to the excess bits of the destination matrix of mzd_transpose. Added a few asserts related to multiblock matrices. A few minor documentation fixes and typos.
CarloWood
Major improvement of transposing.
CarloWood
Rewrite of _mzd_addmul_even_weird to use rowstride. Doesn't seem to speed anything up, but it was a 'test case' to show how it's done ;). Eliminates the use of 'rows', reducing the memory access roughly with a factor of two. Of course, in the light of calling mzd_init, which still calls malloc for blocks, and rows and fills the latter with data... this all makes little sense unless we really get rid of rows (and also cache allocations of blocks[])…
CarloWood
Compiler warning fixes.
CarloWood
Implement separate cache for mzd_t. This cache only calls malloc if more than 64 matrices are used, and then allocates space for 64 mzd_t structs at a time.
CarloWood
Added row_offset and accessor functions for mzd_t using it. row_offset is the distance in rows from the beginning of block 0 to the first row. This allows to calculate the following functions in just a few clock cycles. Note that I had to reduce the size of offset, flags and blockrows_log in order to keep sizeof(mzd_t) equal to 64 byte. This patch adds the mzd_t functions: mzd_first_row, mzd_first_row_next_block, mzd_row_to_block, mzd_rows_in_block and mz…
CarloWood
Added mzd_t::offset_vector and made mzd_t::blocks non-zero also for windowed matrices. After this patch, also for windowed matrices, you can find the first word of the first row with M->blocks[0].begin + M->offset_vector. Subsequently you can find the next rows by adding rowstride until the resulting pointer >= M->blocks[n].end. Then increment n and set the pointer to the first word in the next block with M->blocks[n].begin + (M->offset_vector % M->rowstride). For example, to run …
CarloWood
Move __M4RI_CPU_L1_CACHE and __M4RI_CPU_L2_CACHE to m4ri_config.h.in.
CarloWood
Add option --debug-mzd. This allows one to run heavy consistency checks on the elements of mzd_t (which are heavily correlated) without dumping the hash values. Using just --debug-dump also does the consistency checks on top of printing the hash values.
CarloWood
Add new elements to mzd_t and keep them consistent. After this patch it is guaranteed that excess bits and padding are zero (I think they already were, but now I checked it, see next commit). rowstride is the offset between two rows within a block. blockrows_log and blockrows_mask are respectively the log2 of the number of rows in blocks before the last block, and the number of rows minus one. Note that number of rows is exactly a power of 2: 1 <…
CarloWood
Documentation fix. mzd_t::offset is already modulo m4ri_radix.
CarloWood
Add --enable-debug-dump. When configured with --enable-debug-dump, print a trace of (hash values of) output values and their function/location, upon leaving any function that does something significant. This can be used to quickly find the function that behaves different in the case that some patch breaks the testsuite.
  1. Prev
  2. Next