Use Greasing in TRSM

Issue #21 new
Martin Albrecht repo owner created an issue

We should use a similar caching trick as in M4RI and M4RM for TRSM as this should provide some performance gain. TRSM (completely, including the recursive multiplications etc.) accounts for about 30% of the running time 10,000 x 10,000 matrices on my i7 CPU.