![]() |
The size of a StaticVector
, StaticMatrix
, HybridVector
, or HybridMatrix
can indeed be larger than expected:
In order to achieve the maximum possible performance the Blaze library tries to enable SIMD vectorization even for small vectors. For that reason Blaze by default uses padding elements for all dense vectors and matrices to guarantee that at least a single SIMD vector can be loaded. Depending on the used SIMD technology that can significantly increase the size of a StaticVector
, StaticMatrix
, HybridVector
or HybridMatrix:
The configuration file ./blaze/config/Optimizations.h
provides a compile time switch that can be used to (de-)activate padding:
Alternatively it is possible to (de-)activate padding via command line or by defining this symbol manually before including any Blaze header file:
If BLAZE_USE_PADDING
is set to 1 padding is enabled for all dense vectors and matrices, if it is set to 0 padding is disabled. Note however that disabling padding can considerably reduce the performance of all dense vector and matrix operations!
Despite disabling padding via the BLAZE_USE_PADDING
compile time switch (see A StaticVector/StaticMatrix is larger than expected. Is this a bug?), the size of a StaticVector
, StaticMatrix
, HybridVector
, or HybridMatrix
can still be larger than expected:
The reason for this behavior is the used SIMD technology. If SSE is used, which provides 128 bit wide registers, a single SIMD pack can usually hold 4 integers (128 bit divided by 32 bit). Since the second vector contains enough elements is possible to benefit from vectorization. However, SSE requires an alignment of 16 bytes, which ultimately results in a total size of 32 bytes for the StaticVector
(2 times 16 bytes due to 5 integer elements). If AVX or AVX-512 is used, which provide 256 bit or 512 bit wide registers, a single SIMD vector can hold 8 or 16 integers, respectively. Even the second vector does not hold enough elements to benefit from vectorization, which is why Blaze does not enforce a 32 byte (for AVX) or even 64 byte alignment (for AVX-512).
It is possible to disable the vectorization entirely by the compile time switch in the ./blaze/config/Vectorization.h
configuration file:
It is also possible to (de-)activate vectorization via command line or by defining this symbol manually before including any Blaze header file:
In case the switch is set to 1, vectorization is enabled and the Blaze library is allowed to use intrinsics and the necessary alignment to speed up computations. In case the switch is set to 0, vectorization is disabled entirely and the Blaze library chooses default, non-vectorized functionality for the operations. Note that deactivating the vectorization may pose a severe performance limitation for a large number of operations!
Currently the only BLAS functions that are utilized by Blaze are the gemm()
functions for the multiplication of two dense matrices (i.e. sgemm()
, dgemm()
, cgemm()
, and zgemm()
). All other operations are always and unconditionally performed by native Blaze kernels.
The BLAZE_BLAS_MODE
config switch (see ./blaze/config/BLAS.h
) determines whether Blaze is allowed to use BLAS kernels. If BLAZE_BLAS_MODE
is set to 0 then Blaze does not utilize the BLAS kernels and unconditionally uses its own custom kernels. If BLAZE_BLAS_MODE
is set to 1 then Blaze is allowed to choose between using BLAS kernels or its own custom kernels. In case of the dense matrix multiplication this decision is based on the size of the dense matrices. For large matrices, Blaze uses the BLAS kernels, for small matrices it uses its own custom kernels. The threshold for this decision can be configured via the BLAZE_DMATDMATMULT_THRESHOLD
, BLAZE_DMATTDMATMULT_THRESHOLD
, BLAZE_TDMATDMATMULT_THRESHOLD
and BLAZE_TDMATTDMATMULT_THRESHOLD
config switches (see ./blaze/config/Thresholds.h
).
Please note that the extend to which Blaze uses BLAS kernels can change in future releases of Blaze!
Blaze uses LAPACK functions for matrix decomposition, matrix inversion, computing the determinants and eigenvalues, and the SVD. In contrast to the BLAS functionality (see To which extend does Blaze make use of BLAS functions under the hood?), you cannot disable LAPACK or switch to custom kernels. In case you try to use any of these functionalities, but do not provide (i.e. link) a LAPACK library you will get link time errors.
Please note that the extend to which Blaze uses LAPACK kernels can change in future releases of Blaze!
The include file <blaze/Blaze.h>
includes the entire functionality of the Blaze library, which by now is several hundred thousand lines of source code. That means that a lot of source code has to be parsed whenever <blaze/Blaze.h>
is encountered. However, it is rare that everything is required within a single compilation unit. Therefore it is easily possible to reduce compile times by including only those Blaze features that are used within the compilation unit. For instance, instead of including <blaze/Blaze.h>
it could be enough to include <blaze/math/DynamicVector.h>
, which would reduce the compilation times by about 20%.
Additionally we are taking care to implement new Blaze functionality such that compile times do not explode and try to reduce the compile times of existing features. Thus newer releases of Blaze can also improve compile times.
Previous: Intra-Statement Optimization Next: Issue Creation Guidelines