- marked as enhancement
- attached matops.c
Improve small-ish matrices
Issue #19
open
Our performance for * 64x64 times 64x1, * 64x64 times 64x64 and * Nx64 times 64x64 matrix multiplication is embarrassingly bad. We must improve performance here, probably by using code by Emmanuel Thomé who worked on these dimensions quite a bit.
Comments (4)
-
repo owner -
repo owner -
assigned issue to
- changed status to open
Here's where we stand as of version 20110601:
Transpose
transp_6464-64 201092 times in 5.0226 micros each mzd_transpose-64 1629377 times in 0.6199 micros each
Copy
copy_6464-64 5421604 times in 0.1863 micros each mzd_copy-64 4031327 times in 0.2505 micros each
Addition
add_6464_6464_C-64 5036282 times in 0.2005 micros each _mzd_add-64 3331602 times in 0.3032 micros each
Multiplication
mul_o64_6464_C_lsb-1 4277948 times in 0.2361 micros each _mzd_mul_naive-1 2429804 times in 0.4157 micros each mul_N64_6464_sse-64 584267 times in 3.4402 micros each mul_N64_6464_lookup4-64 1984253 times in 1.0130 micros each mul_N64_6464_lookup8-64 1783218 times in 1.1272 micros each _mzd_mul_naive-64 87081 times in 11.5984 micros each
-
assigned issue to
-
Do we need to keep this one open or has this been done?
-
repo owner This hasn’t been done. Perhaps your simplification made some progress on this though by simplifying the data structures? I’d keep it open a thorn in our side.
- Log in to comment
The attached file matops.c contains a variety of matrix multiplication routines by Emmanuel Thomé. Many of these routines do considerably better than we do. The file says LGPL but we have permission from the author to re-license what we need as GPL.