What would be needed to make this happen? It could improve performance on architectures with a large vector size, as long as the memory allocation takes note of the ghost size, and aligns the first //evolved// point to the vector size. I seem to remember that unaligned memory access on Intel systems is fairly fast, so maybe the benefit of padding applies to non-Intel systems only, in which case this enhancement might be less interesting?
It seems that padding is currently broken, in the sense that simulation results differ. To make this happen, someone needs to build with padding enabled, and then track down where the differences come from.
Aligning the first non-ghost point is straightforward.
Unaligned memory access is fast on Intel system, but it requires additional instructions. Since McLachlan is limited by instruction cache size, ensuring aligned access may have a larger benefit than one would otherwise expect. Also, with AVX instructions, unaligned access has become more expensive.