Improve small-ish matrices

Issue #19 open
Former user created an issue

Our performance for * 64x64 times 64x1, * 64x64 times 64x64 and * Nx64 times 64x64 matrix multiplication is embarrassingly bad. We must improve performance here, probably by using code by Emmanuel Thomé who worked on these dimensions quite a bit.

Comments (4)

  1. Martin Albrecht repo owner

    The attached file matops.c contains a variety of matrix multiplication routines by Emmanuel Thomé. Many of these routines do considerably better than we do. The file says LGPL but we have permission from the author to re-license what we need as GPL.

  2. Martin Albrecht repo owner

    Here's where we stand as of version 20110601:

    Transpose

    transp_6464-64           201092 times in 5.0226 micros each
    mzd_transpose-64        1629377 times in 0.6199 micros each
    

    Copy

    copy_6464-64            5421604 times in 0.1863 micros each
    mzd_copy-64             4031327 times in 0.2505 micros each
    

    Addition

    add_6464_6464_C-64      5036282 times in 0.2005 micros each
    _mzd_add-64             3331602 times in 0.3032 micros each
    

    Multiplication

    mul_o64_6464_C_lsb-1    4277948 times in 0.2361 micros each
    _mzd_mul_naive-1        2429804 times in 0.4157 micros each
    
    mul_N64_6464_sse-64      584267 times in 3.4402 micros each
    mul_N64_6464_lookup4-64 1984253 times in 1.0130 micros each
    mul_N64_6464_lookup8-64 1783218 times in 1.1272 micros each
    _mzd_mul_naive-64        87081 times in 11.5984 micros each
    
  3. Martin Albrecht repo owner

    This hasn’t been done. Perhaps your simplification made some progress on this though by simplifying the data structures? I’d keep it open a thorn in our side.

  4. Log in to comment