Configuration Files

Table of Contents

Sometimes it might necessary to adapt Blaze to specific requirements. For this purpose Blaze provides several configuration files in the ./blaze/config/ subdirectory, which provide ample opportunity to customize internal settings, behavior, and thresholds. This chapter explains the most important of these configuration files.


Default Vector Storage


The Blaze default is that all vectors are created as column vectors (if not specified explicitly):

blaze::StaticVector<double,3UL> x; // Creates a 3-dimensional static column vector

The header file ./blaze/config/TransposeFlag.h allows the configuration of the default vector storage (i.e. the default transpose flag of the vectors). Via the defaultTransposeFlag value the default transpose flag for all vector of the Blaze library can be specified:

Valid settings for the defaultTransposeFlag are blaze::rowVector and blaze::columnVector.


Default Matrix Storage


Matrices are by default created as row-major matrices:

blaze::StaticMatrix<double,3UL,3UL> A; // Creates a 3x3 row-major matrix

The header file ./blaze/config/StorageOrder.h allows the configuration of the default matrix storage order. Via the defaultStorageOrder value the default storage order for all matrices of the Blaze library can be specified.

constexpr bool defaultStorageOrder = rowMajor;

Valid settings for the defaultStorageOrder are blaze::rowMajor and blaze::columnMajor.


BLAS Mode


In order to achieve maximum performance for multiplications with dense matrices, Blaze can be configured to use a BLAS library. Via the following compilation switch in the configuration file ./blaze/config/BLAS.h BLAS can be enabled:

#define BLAZE_BLAS_MODE 1

In case the selected BLAS library provides parallel execution, the BLAZE_BLAS_IS_PARALLEL switch should be activated to prevent Blaze from parallelizing on its own:

#define BLAZE_BLAS_IS_PARALLEL 1

In case no BLAS library is available, Blaze will still work and will not be reduced in functionality, but performance may be limited.


Cache Size


The optimization of several Blaze compute kernels depends on the cache size of the target architecture. By default, Blaze assumes a cache size of 3 MiByte. However, for optimal speed the exact cache size of the system should be provided via the cacheSize value in the ./blaze/config/CacheSize.h configuration file:

constexpr size_t cacheSize = 3145728UL;


Vectorization


In order to achieve maximum performance and to exploit the compute power of a target platform the Blaze library attempts to vectorize all linear algebra operations by SSE, AVX, and/or MIC intrinsics, depending on which instruction set is available. However, it is possible to disable the vectorization entirely by the compile time switch in the configuration file ./blaze/config/Vectorization.h:

#define BLAZE_USE_VECTORIZATION 1

In case the switch is set to 1, vectorization is enabled and the Blaze library is allowed to use intrinsics to speed up computations. In case the switch is set to 0, vectorization is disabled entirely and the Blaze library chooses default, non-vectorized functionality for the operations. Note that deactivating the vectorization may pose a severe performance limitation for a large number of operations!


Thresholds


For many computations Blaze distinguishes between small and large vectors and matrices. This separation is especially important for the parallel execution of computations, since the use of several threads only pays off for sufficiently large vectors and matrices. Additionally, it also enables Blaze to select kernels that are optimized for a specific size.

In order to distinguish between small and large data structures Blaze provides several thresholds that can be adapted to the characteristics of the target platform. For instance, the DMATDVECMULT_THRESHOLD specifies the threshold between the application of the custom Blaze kernels for small dense matrix/dense vector multiplications and the BLAS kernels for large multiplications. All thresholds, including the thresholds for the OpenMP- and thread-based parallelization, are contained within the configuration file ./blaze/config/Thresholds.h.


Padding


By default the Blaze library uses padding for all dense vectors and matrices in order to achieve maximum performance in all operations. Due to padding, the proper alignment of data elements can be guaranteed and the need for remainder loops is minimized. However, on the downside padding introduces an additional memory overhead, which can be large depending on the used data type.

The configuration file ./blaze/config/Optimizations.h provides a compile time switch that can be used to (de-)activate padding:

constexpr bool usePadding = true;

If usePadding is set to true padding is enabled for all dense vectors and matrices, if it is set to false padding is disabled. Note however that disabling padding can considerably reduce the performance of all dense vector and matrix operations!


Streaming (Non-Temporal Stores)


For vectors and matrices that don't fit into the cache anymore non-temporal stores can provide a significant performance advantage of about 20%. However, this advantage is only in effect in case the memory bandwidth of the target architecture is maxed out. If the target architecture's memory bandwidth cannot be exhausted the use of non-temporal stores can decrease performance instead of increasing it.

The configuration file ./blaze/config/Optimizations.h provides a compile time switch that can be used to (de-)activate streaming:

constexpr bool useStreaming = true;

If useStreaming is set to true streaming is enabled, if it is set to false streaming is disabled. It is recommended to consult the target architecture's white papers to decide whether streaming is beneficial or hurtful for performance.


Previous: LAPACK Functions     Next: Block Vectors and Matrices