Commits

Author Commit Message Labels Comments Date
CarloWood
Copied the improved code of mzd_col_swap to mzd_col_swap_in_rows and added support for start_row/stop_row. The result has the same speed mzd_col_swap (per row).
CarloWood
Add support for transposing multi-block matrices.
CarloWood
Also ignore generated maintainer file ltmain.sh
Martin Albrecht
do not fail if realpath is not installed
Martin Albrecht
follow-up check-in for cache size fix
Martin Albrecht
initialise variables (i.e., take care of Wall reported errors)
Martin Albrecht
install debug_dump.h otherwise programs linking against the library will fail to compile
Martin Albrecht
remove ltmain.sh which is autogenerated
CarloWood
Speed up of mzd_col_swap with a factor of two. Plus added a testsuite for it.
CarloWood
Bug fix in mzd_equal. When shift = B->offset - A->offset turns out to be negative, we swap A and B. I forgot to also reinitialize 'width'. Renamed 'width' to Awidth. Also got rid of __M4RI_LEFT/RIGHT_BITMASK macros.
CarloWood
Bug fix and general fixups. Testsuite for transpose. Added test_transpose.c to the testsuite. Fixed a bug for non-square matrices of specific sizes where uninitialized data was written to the excess bits of the destination matrix of mzd_transpose. Added a few asserts related to multiblock matrices. A few minor documentation fixes and typos.
CarloWood
Major improvement of transposing.
CarloWood
Rewrite of _mzd_addmul_even_weird to use rowstride. Doesn't seem to speed anything up, but it was a 'test case' to show how it's done ;). Eliminates the use of 'rows', reducing the memory access roughly with a factor of two. Of course, in the light of calling mzd_init, which still calls malloc for blocks, and rows and fills the latter with data... this all makes little sense unless we really get rid of rows (and also cache allocations of blocks[])…
CarloWood
Compiler warning fixes.
CarloWood
Implement separate cache for mzd_t. This cache only calls malloc if more than 64 matrices are used, and then allocates space for 64 mzd_t structs at a time.
CarloWood
Added row_offset and accessor functions for mzd_t using it. row_offset is the distance in rows from the beginning of block 0 to the first row. This allows to calculate the following functions in just a few clock cycles. Note that I had to reduce the size of offset, flags and blockrows_log in order to keep sizeof(mzd_t) equal to 64 byte. This patch adds the mzd_t functions: mzd_first_row, mzd_first_row_next_block, mzd_row_to_block, mzd_rows_in_block and mz…
CarloWood
Added mzd_t::offset_vector and made mzd_t::blocks non-zero also for windowed matrices. After this patch, also for windowed matrices, you can find the first word of the first row with M->blocks[0].begin + M->offset_vector. Subsequently you can find the next rows by adding rowstride until the resulting pointer >= M->blocks[n].end. Then increment n and set the pointer to the first word in the next block with M->blocks[n].begin + (M->offset_vector % M->rowstride). For example, to run …
CarloWood
Move __M4RI_CPU_L1_CACHE and __M4RI_CPU_L2_CACHE to m4ri_config.h.in.
CarloWood
Add option --debug-mzd. This allows one to run heavy consistency checks on the elements of mzd_t (which are heavily correlated) without dumping the hash values. Using just --debug-dump also does the consistency checks on top of printing the hash values.
CarloWood
Add new elements to mzd_t and keep them consistent. After this patch it is guaranteed that excess bits and padding are zero (I think they already were, but now I checked it, see next commit). rowstride is the offset between two rows within a block. blockrows_log and blockrows_mask are respectively the log2 of the number of rows in blocks before the last block, and the number of rows minus one. Note that number of rows is exactly a power of 2: 1 <…
CarloWood
Documentation fix. mzd_t::offset is already modulo m4ri_radix.
CarloWood
Add --enable-debug-dump. When configured with --enable-debug-dump, print a trace of (hash values of) output values and their function/location, upon leaving any function that does something significant. This can be used to quickly find the function that behaves different in the case that some patch breaks the testsuite.
CarloWood
More constness and some whitespace issues.
CarloWood
A few more compiler warning fixes and a const thingy.
CarloWood
More constness fixes. This should cause all pointers passed to functions to be a pointer-to-const when the content is not changed. I needed to introduce mzd_init_window_const, which creates a window into a const mzd_t, returning mzd_t const* as well. I decided to demand an explicit cast when freeing such a window (we can fix that later once I added flags to mzd_t, and add runtime checking when freeing a const window, …
CarloWood
Fix constness of trsm* functions.
CarloWood
Do not install or include config.h in header files. This patch introduces src/m4ri_config.h.in, from which src/m4ri_config.h is generated during configure, which subsequently is installed instead of config.h and included by other headers. This to avoid to have the whole list of macros defined in config.h polute the macro namespace for users of the library. Header files now use __M4RI_ prefixed versions of HAVE_SSE2, HAVE_MM_MALLOC, HAVE_POSIX_MEMA…
CarloWood
Make it harder for the compiler to put parts of inlined functions outside our loop.
CarloWood
Add dependency on m4ri headers to testsuite.
CarloWood
Allow to only dump a single counter.
  1. Prev
  2. Next