![]() |
Sometimes it might necessary to adapt Blaze to specific requirements. For this purpose Blaze provides several configuration files in the ./blaze/config/
subdirectory, which provide ample opportunity to customize internal settings, behavior, and thresholds. This chapter explains the most important of these configuration files.
The Blaze default is that all vectors are created as column vectors (if not specified explicitly):
The header file ./blaze/config/TransposeFlag.h
allows the configuration of the default vector storage (i.e. the default transpose flag of the vectors). Via the defaultTransposeFlag
value the default transpose flag for all vector of the Blaze library can be specified:
Valid settings for the defaultTransposeFlag
are blaze::rowVector and blaze::columnVector.
Matrices are by default created as row-major matrices:
The header file ./blaze/config/StorageOrder.h
allows the configuration of the default matrix storage order. Via the defaultStorageOrder
value the default storage order for all matrices of the Blaze library can be specified.
Valid settings for the defaultStorageOrder
are blaze::rowMajor and blaze::columnMajor.
In order to achieve maximum performance for multiplications with dense matrices, Blaze can be configured to use a BLAS library. Via the following compilation switch in the configuration file ./blaze/config/BLAS.h
BLAS can be enabled:
In case the selected BLAS library provides parallel execution, the BLAZE_BLAS_IS_PARALLEL
switch should be activated to prevent Blaze from parallelizing on its own:
In case no BLAS library is available, Blaze will still work and will not be reduced in functionality, but performance may be limited.
The optimization of several Blaze compute kernels depends on the cache size of the target architecture. By default, Blaze assumes a cache size of 3 MiByte. However, for optimal speed the exact cache size of the system should be provided via the cacheSize
value in the ./blaze/config/CacheSize.h
configuration file:
In order to achieve maximum performance and to exploit the compute power of a target platform the Blaze library attempts to vectorize all linear algebra operations by SSE, AVX, and/or MIC intrinsics, depending on which instruction set is available. However, it is possible to disable the vectorization entirely by the compile time switch in the configuration file ./blaze/config/Vectorization.h
:
In case the switch is set to 1, vectorization is enabled and the Blaze library is allowed to use intrinsics to speed up computations. In case the switch is set to 0, vectorization is disabled entirely and the Blaze library chooses default, non-vectorized functionality for the operations. Note that deactivating the vectorization may pose a severe performance limitation for a large number of operations!
For many computations Blaze distinguishes between small and large vectors and matrices. This separation is especially important for the parallel execution of computations, since the use of several threads only pays off for sufficiently large vectors and matrices. Additionally, it also enables Blaze to select kernels that are optimized for a specific size.
In order to distinguish between small and large data structures Blaze provides several thresholds that can be adapted to the characteristics of the target platform. For instance, the DMATDVECMULT_THRESHOLD
specifies the threshold between the application of the custom Blaze kernels for small dense matrix/dense vector multiplications and the BLAS kernels for large multiplications. All thresholds, including the thresholds for the OpenMP- and thread-based parallelization, are contained within the configuration file ./blaze/config/Thresholds.h
.
By default the Blaze library uses padding for all dense vectors and matrices in order to achieve maximum performance in all operations. Due to padding, the proper alignment of data elements can be guaranteed and the need for remainder loops is minimized. However, on the downside padding introduces an additional memory overhead, which can be large depending on the used data type.
The configuration file ./blaze/config/Optimizations.h
provides a compile time switch that can be used to (de-)activate padding:
If usePadding
is set to true
padding is enabled for all dense vectors and matrices, if it is set to false
padding is disabled. Note however that disabling padding can considerably reduce the performance of all dense vector and matrix operations!
For vectors and matrices that don't fit into the cache anymore non-temporal stores can provide a significant performance advantage of about 20%. However, this advantage is only in effect in case the memory bandwidth of the target architecture is maxed out. If the target architecture's memory bandwidth cannot be exhausted the use of non-temporal stores can decrease performance instead of increasing it.
The configuration file ./blaze/config/Optimizations.h
provides a compile time switch that can be used to (de-)activate streaming:
If useStreaming
is set to true
streaming is enabled, if it is set to false
streaming is disabled. It is recommended to consult the target architecture's white papers to decide whether streaming is beneficial or hurtful for performance.
Previous: LAPACK Functions Next: Block Vectors and Matrices