Blaze  3.6
Frequently Asked Questions (FAQ)

A StaticVector/StaticMatrix is larger than expected. Is this a bug?

The size of a StaticVector, StaticMatrix, HybridVector, or HybridMatrix can indeed be larger than expected:

StaticVector<int,3> a;
StaticMatrix<int,3,3> A;
sizeof( a ); // Evaluates to 16, 32, or even 64, but not 12
sizeof( A ); // Evaluates to 48, 96, or even 144, but not 36

In order to achieve the maximum possible performance the Blaze library tries to enable SIMD vectorization even for small vectors. For that reason Blaze by default uses padding elements for all dense vectors and matrices to guarantee that at least a single SIMD vector can be loaded. Depending on the used SIMD technology that can significantly increase the size of a StaticVector, StaticMatrix, HybridVector or HybridMatrix:

StaticVector<int,3> a;
StaticMatrix<int,3,3> A;
sizeof( a ); // Evaluates to 16 in case of SSE, 32 in case of AVX, and 64 in case of AVX-512
// (under the assumption that an integer occupies 4 bytes)
sizeof( A ); // Evaluates to 48 in case of SSE, 96 in case of AVX, and 144 in case of AVX-512
// (under the assumption that an integer occupies 4 bytes)

The configuration file ./blaze/config/Optimizations.h provides a compile time switch that can be used to (de-)activate padding:

#define BLAZE_USE_PADDING 1

Alternatively it is possible to (de-)activate padding via command line or by defining this symbol manually before including any Blaze header file:

#define BLAZE_USE_PADDING 1
#include <blaze/Blaze.h>

If BLAZE_USE_PADDING is set to 1 padding is enabled for all dense vectors and matrices, if it is set to 0 padding is disabled. Note however that disabling padding can considerably reduce the performance of all dense vector and matrix operations!


Despite disabling padding, a StaticVector/StaticMatrix is still larger than expected. Is this a bug?

Despite disabling padding via the BLAZE_USE_PADDING compile time switch (see A StaticVector/StaticMatrix is larger than expected. Is this a bug?), the size of a StaticVector, StaticMatrix, HybridVector, or HybridMatrix can still be larger than expected:

#define BLAZE_USE_PADDING 1
#include <blaze/Blaze.h>
StaticVector<int,3> a;
StaticVector<int,5> b;
sizeof( a ); // Always evaluates to 12
sizeof( b ); // Evaluates to 32 with SSE (larger than expected) and to 20 with AVX or AVX-512 (expected)

The reason for this behavior is the used SIMD technology. If SSE is used, which provides 128 bit wide registers, a single SIMD pack can usually hold 4 integers (128 bit divided by 32 bit). Since the second vector contains enough elements is possible to benefit from vectorization. However, SSE requires an alignment of 16 bytes, which ultimately results in a total size of 32 bytes for the StaticVector (2 times 16 bytes due to 5 integer elements). If AVX or AVX-512 is used, which provide 256 bit or 512 bit wide registers, a single SIMD vector can hold 8 or 16 integers, respectively. Even the second vector does not hold enough elements to benefit from vectorization, which is why Blaze does not enforce a 32 byte (for AVX) or even 64 byte alignment (for AVX-512).

It is possible to disable the vectorization entirely by the compile time switch in the ./blaze/config/Vectorization.h configuration file:

#define BLAZE_USE_VECTORIZATION 1

It is also possible to (de-)activate vectorization via command line or by defining this symbol manually before including any Blaze header file:

#define BLAZE_USE_VECTORIZATION 1
#include <blaze/Blaze.h>

In case the switch is set to 1, vectorization is enabled and the Blaze library is allowed to use intrinsics and the necessary alignment to speed up computations. In case the switch is set to 0, vectorization is disabled entirely and the Blaze library chooses default, non-vectorized functionality for the operations. Note that deactivating the vectorization may pose a severe performance limitation for a large number of operations!


To which extend does Blaze make use of BLAS functions under the hood?

Currently the only BLAS functions that are utilized by Blaze are the gemm() functions for the multiplication of two dense matrices (i.e. sgemm(), dgemm(), cgemm(), and zgemm()). All other operations are always and unconditionally performed by native Blaze kernels.

The BLAZE_BLAS_MODE config switch (see ./blaze/config/BLAS.h) determines whether Blaze is allowed to use BLAS kernels. If BLAZE_BLAS_MODE is set to 0 then Blaze does not utilize the BLAS kernels and unconditionally uses its own custom kernels. If BLAZE_BLAS_MODE is set to 1 then Blaze is allowed to choose between using BLAS kernels or its own custom kernels. In case of the dense matrix multiplication this decision is based on the size of the dense matrices. For large matrices, Blaze uses the BLAS kernels, for small matrices it uses its own custom kernels. The threshold for this decision can be configured via the BLAZE_DMATDMATMULT_THRESHOLD, BLAZE_DMATTDMATMULT_THRESHOLD, BLAZE_TDMATDMATMULT_THRESHOLD and BLAZE_TDMATTDMATMULT_THRESHOLD config switches (see ./blaze/config/Thresholds.h).

Please note that the extend to which Blaze uses BLAS kernels can change in future releases of Blaze!


To which extend does Blaze make use of LAPACK functions under the hood?

Blaze uses LAPACK functions for matrix decomposition, matrix inversion, computing the determinants and eigenvalues, and the SVD. In contrast to the BLAS functionality (see To which extend does Blaze make use of BLAS functions under the hood?), you cannot disable LAPACK or switch to custom kernels. In case you try to use any of these functionalities, but do not provide (i.e. link) a LAPACK library you will get link time errors.

Please note that the extend to which Blaze uses LAPACK kernels can change in future releases of Blaze!


The compile time is too high if I include <blaze/Blaze.h>. Can I reduce it?

The include file <blaze/Blaze.h> includes the entire functionality of the Blaze library, which by now is several hundred thousand lines of source code. That means that a lot of source code has to be parsed whenever <blaze/Blaze.h> is encountered. However, it is rare that everything is required within a single compilation unit. Therefore it is easily possible to reduce compile times by including only those Blaze features that are used within the compilation unit. For instance, instead of including <blaze/Blaze.h> it could be enough to include <blaze/math/DynamicVector.h>, which would reduce the compilation times by about 20%.

Additionally we are taking care to implement new Blaze functionality such that compile times do not explode and try to reduce the compile times of existing features. Thus newer releases of Blaze can also improve compile times.


Blaze does not provide feature XYZ. What can I do?

In some cases you might be able to implement the required functionality very conveniently by building on the existing map() functions (see The map() Functions). For instance, the following code demonstrates the addition of a function that merges two vectors of floating point type into a vector of complex numbers:

template< typename VT1, typename VT2, bool TF >
decltype(auto) zip( const blaze::DenseVector<VT1,TF>& lhs, const blaze::DenseVector<VT2,TF>& rhs )
{
return blaze::map( ~lhs, ~rhs, []( const auto& r, const auto& i ) {
using ET1 = ElementType_t<VT1>;
using ET2 = ElementType_t<VT2>;
return std::complex<std::common_type_t<ET1,ET2>>( r, i );
} );
}

You will find a summary of the necessary steps to create custom features in Customization.

Sometimes, however, the available customization points might not be sufficient. In this case you are cordially invited to create a pull request that provides the implementation of a feature or to create an issue according to our Issue Creation Guidelines. Please try to explain the feature as descriptive as possible, for instance by providing conceptual code examples.


Previous: Intra-Statement Optimization     Next: Issue Creation Guidelines