Vecmathlib provides efficient, accurate, tunable, and most importantly vectorizable math functions such as sqrt, sin, or atan.
The library is implemented in C++, and intended to be called on SIMD vectors, e.g. those provided by SSE, AVX, or available in Power7 and Blue Gene architectures. The same algorithms should also work efficiently on accelerators such as GPUs. Even without vectorization, vecmathlib's algorithms are efficient on standard CPUs.
Vecmathlib consists of three parts:
- vecmathlib's algorithms, i.e. the implementations of various math functions (e.g. sqrt)
- SIMD vector classes, wrapping e.g. SSE or AVX vectors
- a test harness
The algorithms themselves are written in a generic way. They assume an IEEE floating point layout (consisting of sign bit, exponent, and mantissa), but work for arbitrary precision and vector sizes. For example, there is a routine vml_sqrt() that calculates a square root via an iterative scheme based on Newton's root finding algorithm. Although not available yet, there can be different implementations with different performance characteristics for certain math functions.
The SIMD vector classes wrap architecture-specific SIMD capabilities; for example, there is an implementation of a class realvec<double,4> based on Intel's AVX instruction set. These classes either provide math operations and math functions themselves, or implement them via calls to the generic algorithms. This way, vecmathlib can provide efficient implementations for all hardware architectures.
It goes without saying that vecmathlib can also be used for scalar types, e.g. plain float or double, thus providing math functions for architectures where they are otherwise not available.
Things To Do
Vecmathlib is not finished. Contributions are welcome! There are several areas where it can be improved:
- make test harness more systematic, improve coverage
- research and implement improved algorithms for certain math functions
- implement vector classes for additional hardware architectures
- make code more portable -- currently works with GCC 4.7, should also support other versions or compilers
- review C++ class hierarchy, improve design, reduce redundancy
- measure performance, compare to system library
- provide "vector" implementation of math functions by calculating them element-wise via libc's scalar functions
- handle inf, nan, negative zero, rounding modes, and everything else required for IEEE compliance (if -ffast-math is not used)