Support for simultaneous transforms
Apologies if this feature exists and I missed it – but is it possible to do multiple interleaved transforms simultaneously, in analogy to the “howmany” argument in the advanced FFTW interface? For instance if you have multiple variables stored sequentially in memory at each grid point or have e.g. a radial discretization as the fastest memory axis. With these layouts and moderate transform sizes, it seems like rearranging the data (e.g. into pencils that are contiguous in latitude) can take longer than the actual transforms!
Comments (3)
-
repo owner -
reporter Thanks for the information. We would need the individual Legendre transforms, but there’s no rush. For FFTs though, it seems like transforms along slow memory axes, while slower than along the fastest axis, are still much faster than rearranging the data. Is it clear if/why that wouldn’t be the case with the Legendre transforms? It seems like having multiple datasets interleaved might still be quite efficient if SIMD was used at the lowest level. But you mentioned there might be cache reuse issues? I’m far from understanding the real implementation details that make SHTns so fast, but I’m just curious!
-
repo owner Basically, for a spectral to spatial transform, SHTns will read many times (proportional to nlat) the array of spectral values. If this stays in cache, it will be very efficient. If it goes out of cache (because now you have several fields together and it can’t stay in cache), performance will drop significantly.
For spatial to spectral it is similar: SHTns will read many times (proportional to lmax) the array of spatial values.
If the fastest direction was another one, SHTns would need to gather each field separately (or maybe in small groups of 2 to 4). This would indeed be a little bit more efficient than doing it on the caller side.
It would probably be possible to do it reasonably well (like what you say for FFT), but it would require a lot of work I think.All that said, If many users ask for such a feature (having the fast memory axis being something else that latitude), I can consider implementing it.
- Log in to comment
Multiple simultaneous transforms is currently not available in the stable or devel branch, but it exists in an experimental branch.
However, for performance reasons, the fastest memory axis should always be the latitude. As you mention, rearranging the data can take longer than the transform, but this would be roughly the same if SHTns needed to rearrange internally. So if you care about performance and SHTs is a performance bottleneck , you should definitely consider changing your data layout. Note that, although FFTW offers immense flexibility in the layout, not all layouts perform the same.
Currently, the experimental simultaneous transforms have the following features:
If you are interested, I can give you access to the experimental branch. If you need it with legendre-only transforms (single m), it is possible to implement, but I cannot do it before fall this year.