Wiki

Default Vector Storage
Default Matrix Storage
BLAS Mode
Cache Size
Vectorization
Sleef
XSIMD
Thresholds
Alignment
Padding
Streaming (Non-Temporal Stores)

Sometimes it is necessary to adapt Blaze to specific requirements. For this purpose Blaze provides several configuration files in the ./blaze/config/ subdirectory, which provide ample opportunity to customize internal settings, behavior, and thresholds. This chapter explains the most important of these configuration files. For a complete overview of all customization opportunities, please go to the configuration files in the ./blaze/config/ subdirectory or see the complete Blaze documentation.

Default Vector Storage

The Blaze default is that all vectors are created as column vectors (if not specified explicitly):

blaze::StaticVector<double,3UL> x;  // Creates a 3-dimensional static column vector

The header file ./blaze/config/TransposeFlag.h allows the configuration of the default vector storage (i.e. the default transpose flag) of all vectors within the Blaze library. The default transpose flag is specified via the BLAZE_DEFAULT_TRANSPOSE_FLAG macro:

#define BLAZE_DEFAULT_TRANSPOSE_FLAG blaze::columnVector

Alternatively the default transpose flag can be specified via command line or by defining this symbol manually before including any Blaze header file:

g++ ... -DBLAZE_DEFAULT_TRANSPOSE_FLAG=blaze::columnVector ...

#define BLAZE_DEFAULT_TRANSPOSE_FLAG blaze::columnVector
#include <blaze/Blaze.h>

Valid settings for BLAZE_DEFAULT_TRANSPOSE_FLAG are blaze::rowVector and blaze::columnVector.

Default Matrix Storage

Matrices are by default created as row-major matrices:

blaze::StaticMatrix<double,3UL,3UL>  A;  // Creates a 3x3 row-major matrix

The header file ./blaze/config/StorageOrder.h allows the configuration of the default matrix storage order. Via the BLAZE_DEFAULT_STORAGE_ORDER macro the default storage order for all matrices of the Blaze library can be specified.

#define BLAZE_DEFAULT_STORAGE_ORDER blaze::rowMajor

Alternatively the default storage order can be specified via command line or by defining this symbol manually before including any Blaze header file:

g++ ... -DBLAZE_DEFAULT_STORAGE_ORDER=blaze::rowMajor ...

#define BLAZE_DEFAULT_STORAGE_ORDER blaze::rowMajor
#include <blaze/Blaze.h>

Valid settings for BLAZE_DEFAULT_STORAGE_ORDER are blaze::rowMajor and blaze::columnMajor.

BLAS Mode

In order to achieve maximum performance for multiplications with dense matrices, Blaze can be configured to use a BLAS library. Via the following compilation switch in the configuration file ./blaze/config/BLAS.h BLAS can be enabled:

#define BLAZE_BLAS_MODE 1

By default, Blaze assumes a 32-bit BLAS library. Via the BLAZE_BLAS_IS_64BIT compilation switch, the 64-bit BLAS mode can be selected:

#define BLAZE_BLAS_IS_64BIT 1

Note that the BLAZE_BLAS_IS_64BIT switch also has an effect on the LAPACK Functions. Please also note that it might additionally be necessary to use a compilation switch to put the BLAS/LAPACK library into 64-bit mode (e.g. MKL_ILP64 for the Intel MKL library).

In case the selected BLAS library provides parallel execution, the BLAZE_BLAS_IS_PARALLEL switch should be activated to prevent Blaze from parallelizing on its own:

#define BLAZE_BLAS_IS_PARALLEL 1

Additionally, it is possible to specify the name of the BLAS include file via the BLAZE_BLAS_INCLUDE_FILE switch. The default setting is <cblas.h>:

#define BLAZE_BLAS_INCLUDE_FILE <cblas.h>

Alternatively, all settings can be specified via command line or by defining the symbols manually before including any Blaze header file:

g++ ... -DBLAZE_BLAS_MODE=1 -DBLAZE_BLAS_IS_64BIT=1 -DBLAZE_BLAS_IS_PARALLEL=1 -DBLAZE_BLAS_INCLUDE_FILE='<cblas.h>' ...

#define BLAZE_BLAS_MODE 1
#define BLAZE_BLAS_IS_64BIT 1
#define BLAZE_BLAS_IS_PARALLEL 1
#define BLAZE_BLAS_INCLUDE_FILE <cblas.h>
#include <blaze/Blaze.h>

In case no BLAS library is available, Blaze will still work and will not be reduced in functionality, but performance may be limited.

Cache Size

The optimization of several Blaze compute kernels depends on the cache size of the target architecture. By default, Blaze assumes a cache size of 3 MiByte. However, for optimal speed the exact cache size of the system should be provided via the cacheSize value in the ./blaze/config/CacheSize.h configuration file:

#define BLAZE_CACHE_SIZE 3145728UL;

The cache size can also be specified via command line or by defining this symbol manually before including any Blaze header file:

g++ ... -DBLAZE_CACHE_SIZE=3145728

#define BLAZE_CACHE_SIZE 3145728UL
#include <blaze/Blaze.h>

Vectorization

In order to achieve maximum performance and to exploit the compute power of a target platform the Blaze library attempts to vectorize all linear algebra operations by SSE, AVX, and/or AVX-512 intrinsics, depending on which instruction set is available. However, it is possible to disable the vectorization entirely by the compile time switch in the configuration file ./blaze/config/Vectorization.h:

#define BLAZE_USE_VECTORIZATION 1

It is also possible to (de-)activate vectorization via command line or by defining this symbol manually before including any Blaze header file:

g++ ... -DBLAZE_USE_VECTORIZATION=1 ...

#define BLAZE_USE_VECTORIZATION 1
#include <blaze/Blaze.h>

In case the switch is set to 1, vectorization is enabled and the Blaze library is allowed to use intrinsics to speed up computations. In case the switch is set to 0, vectorization is disabled entirely and the Blaze library chooses default, non-vectorized functionality for the operations. Note that deactivating the vectorization may pose a severe performance limitation for a large number of operations!

Sleef

For several complex operations Blaze can make use of the Sleef library for vectorization (https://github.com/shibatch/sleef). This compilation switch enables/disables the vectorization by means of Sleef. In case the switch is set to 1, Blaze uses Sleef for instance for the vectorized computation of trigonometric functions (i.e. sin(), cos(), tan(), etc.) and exponential functions (i.e. exp(), log(), ...).

#define BLAZE_USE_SLEEF 1

It is also possible to enable/disable Sleef vectorization via command line or by defining this symbol manually before including any Blaze header file:

g++ ... -DBLAZE_USE_SLEEF=1 ...

#define BLAZE_USE_SLEEF 1
#include <blaze/Blaze.h>

XSIMD

For several complex operations Blaze can make use of the XSIMD library for vectorization (https://github.com/xtensor-stack/xsimd). This compilation switch enables/disables the vectorization by means of XSIMD. In case the switch is set to 1, Blaze uses XSIMD for instance for the vectorized computation of trigonometric functions (i.e. sin(), cos(), tan(), etc.) and exponential functions (i.e. exp(), log(), ...).

#define BLAZE_USE_XSIMD 1

It is also possible to enable/disable XSIMD vectorization via command line or by defining this symbol manually before including any Blaze header file:

g++ ... -DBLAZE_USE_XSIMD=1 ...

#define BLAZE_USE_XSIMD 1
#include <blaze/Blaze.h>

Thresholds

For many computations Blaze distinguishes between small and large vectors and matrices. This separation is especially important for the parallel execution of computations, since the use of several threads only pays off for sufficiently large vectors and matrices. Additionally, it also enables Blaze to select kernels that are optimized for a specific size.

In order to distinguish between small and large data structures Blaze provides several thresholds that can be adapted to the characteristics of the target platform. For instance, the DMATDVECMULT_THRESHOLD specifies the threshold between the application of the custom Blaze kernels for small dense matrix/dense vector multiplications and the BLAS kernels for large multiplications. All thresholds, including the thresholds for the OpenMP- and thread-based parallelization, are contained within the configuration file <blaze/config/Thresholds.h>.

Alignment

For performance reasons, the vector types StaticVector and HybridVector and the matrix types StaticMatrix and HybridMatrix by default make use of aligned memory. Via the configuration file ./blaze/config/Alignment.h it is possible to define the default alignment flag:

#define BLAZE_DEFAULT_ALIGNMENT_FLAG blaze::aligned

Alternatively it is possible set the default alignment flag via command line or by defining this symbol manually before including any Blaze header file:

g++ ... -DBLAZE_DEFAULT_ALIGNMENT_FLAG=blaze::aligned ...

#define BLAZE_DEFAULT_ALIGNMENT_FLAG blaze::aligned
#include <blaze/Blaze.h>

If BLAZE_DEFAULT_ALIGNMENT_FLAG is set to blaze::aligned then StaticVector, HybridVector, StaticMatrix, and HybridMatrix use aligned memory by default. If it is set to blaze::unaligned they don't enforce aligned memory. Note however that disabling alignment can considerably reduce the performance of all operations with these vector and matrix types!

Padding

By default the Blaze library uses padding for the vector types StaticVector and HybridVector and the matrix types StaticMatrix and HybridMatrix in order to achieve maximum performance in all operations. Due to padding, the proper alignment of data elements can be guaranteed and the need for remainder loops is minimized. However, on the downside padding introduces an additional memory overhead, which can be large depending on the used data type.

The configuration file ./blaze/config/Padding.h provides a compile time switch that can be used to define the default padding flag:

#define BLAZE_DEFAULT_PADDING_FLAG blaze::padded

Alternatively it is possible to define the default padding flag via command line or by defining this symbol manually before including any Blaze header file:

g++ ... -DBLAZE_DEFAULT_PADDING_FLAG=blaze::padded ...

#define BLAZE_DEFAULT_PADDING_FLAG blaze::padded
#include <blaze/Blaze.h>

If BLAZE_DEFAULT_ALIGNMENT_FLAG is set to blaze::padded, by default padding is enabled for StaticVector, HybridVector, StaticMatrix and HybridMatrix. If it is set to blaze::unpadded, then padding is by default disabled. Note however that disabling padding can considerably reduce the performance of all dense vector and matrix operations!

Streaming (Non-Temporal Stores)

For vectors and matrices that don't fit into the cache anymore non-temporal stores can provide a significant performance advantage of about 20%. However, this advantage is only in effect in case the memory bandwidth of the target architecture is maxed out. If the target architecture's memory bandwidth cannot be exhausted the use of non-temporal stores can decrease performance instead of increasing it.

The configuration file ./blaze/config/Optimizations.h provides a compile time switch that can be used to (de-)activate streaming:

#define BLAZE_USE_STREAMING 1

Alternatively streaming can be (de-)activated via command line or by defining this symbol manually before including any Blaze header file:

g++ ... -DBLAZE_USE_STREAMING=1 ...

#define BLAZE_USE_STREAMING 1
#include <blaze/Blaze.h>

If BLAZE_USE_STREAMING is set to 1 streaming is enabled, if it is set to 0 streaming is disabled. It is recommended to consult the target architecture's white papers to decide whether streaming is beneficial or hurtful for performance.

Previous: Customization ---- Next: Vector and Matrix Customization