HTTPS SSH
This is a set of fast implementations of a simple Trotter-Suzuki solver.
(c) 2010-2012 by Carlos Bederián <bc@famaf.unc.edu.ar>

reference/ has a naive CPU implementation for testing purposes.

sse/ has a fast CPU implementation using SSE intrinsics and a red-black split of
the matrices. This implementation is limited by memory bandwidth and doesn't
perform well on large systems that don't fit in cache.

block/ adds cache tiling on top of the red-black SSE code. This code isn't fully
optimized (some overhead can be removed) but it scales better than the simple
SSE code for large systems.

cuda/ has a fast GPGPU implementation tuned for Fermi-class GPUs, also using a
tiling strategy.


Our paper on these implementations:
http://www.famaf.unc.edu.ar/grupos/GPGPU/boosting_trotter-suzuki.pdf