Float (single precision) support

Hi, I came across your library and found it very helpful (and I’m the reporter of ~~#42~~). An important reason why it’s helpful for me is its GPU support. However, I also notice that the whole library uses double precision, and single precision can be many times faster than double precision on GPUs (and perhaps also on CPUs, but to a lesser extent). ECMWF has also started to use single-precision arithmetic in its IFS ensemble forecasts.

Fortunately, CUDA programs are build upon C++, so I can make modifications relatively easily. Most of the coding work is to add a type template parameter to a lot of functions and replace “double” with that parameter. Of course, that’s not enough. I also need to make float copies of CUDA arrays stored in struct shtns_info, and adjust several constants controlling the rescaling for large lmax. After repeated tuning of the constants I can make SH_to_spat and spat_to_SH functions achieve a satisfactory accuracy (~1e-5 avg relative error) at lmax=767, which is enough for my usage. 180 rounds of back and forth single-precision SHT transforms at lmax=767 take ~0.7s on a NVIDIA T4 GPU, while the same double-precision transforms take ~3s, corresponding to a 4x speedup.

Although I have satisfied my needs by doing it myself, I’m looking forward to an official release with single precision support. If you are interested, I could also provide my modifications.

Comments (6)