Padding and speed

Dear All,

maybe this is just a misunderstanding on my side: I think blaze gains speed by padding to an array size that is efficient for sse/sse2/avx/... instructions, so I guess by padding to some multiple of 4, 8, or something. Is that correct?

For the sake of curiosity, I ran a few blazemarks (for example dmatdmatadd) without padding but array sizes that are powers of 2 (and multiples of 4, 8 and 16). To my surprise, blaze was suddenly not faster than the competitors! Suddenly it was up to 4 times slower than with padding enabled.

Why can an array of size power of two (512, 1024, ...) profit from padding?

Comments (2)