Use Greasing in TRSM
Issue #21
new
We should use a similar caching trick as in M4RI and M4RM for TRSM as this should provide some performance gain. TRSM (completely, including the recursive multiplications etc.) accounts for about 30% of the running time 10,000 x 10,000 matrices on my i7 CPU.
Comments (1)
-
reporter - Log in to comment
TRSM upper left is done, cf. http://martinralbrecht.wordpress.com/2010/11/19/trsm-with-greasing-trsm-reduced-to-matrix-multiplication/