![]() |
Sometimes it might necessary to adapt Blaze to specific requirements. For this purpose Blaze provides several configuration files in the ./blaze/config/ subdirectory, which provide ample opportunity to customize internal settings, behavior, and thresholds. This chapter explains the most important of these configuration files.
The Blaze default is that all vectors are created as column vectors (if not specified explicitly):
The header file ./blaze/config/TransposeFlag.h allows the configuration of the default vector storage (i.e. the default transpose flag of the vectors). Via the defaultTransposeFlag value the default transpose flag for all vector of the Blaze library can be specified:
Valid settings for the defaultTransposeFlag are blaze::rowVector and blaze::columnVector.
Matrices are by default created as row-major matrices:
The header file ./blaze/config/StorageOrder.h allows the configuration of the default matrix storage order. Via the defaultStorageOrder value the default storage order for all matrices of the Blaze library can be specified.
Valid settings for the defaultStorageOrder are blaze::rowMajor and blaze::columnMajor.
In order to achieve maximum performance and to exploit the compute power of a target platform the Blaze library attempts to vectorize all linear algebra operations by SSE, AVX, and/or MIC intrinsics, depending on which instruction set is available. However, it is possible to disable the vectorization entirely by the compile time switch in the configuration file ./blaze/config/Vectorization.h:
In case the switch is set to 1, vectorization is enabled and the Blaze library is allowed to use intrinsics to speed up computations. In case the switch is set to 0, vectorization is disabled entirely and the Blaze library chooses default, non-vectorized functionality for the operations. Note that deactivating the vectorization may pose a severe performance limitation for a large number of operations!
Blaze provides several thresholds that can be adapted to the characteristics of the target platform. For instance, the DMATDVECMULT_THRESHOLD specifies the threshold between the application of the custom Blaze kernels for small dense matrix/dense vector multiplications and the BLAS kernels for large multiplications. All thresholds, including the thresholds for the OpenMP-based parallelization, are contained within the configuration file ./blaze/config/Thresholds.h.
For vectors and matrices that don't fit into the cache anymore non-temporal stores can provide a significant performance advantage of about 20%. However, this advantage is only in effect in case the memory bandwidth of the target architecture is maxed out. If the target architecture's memory bandwidth cannot be exhausted the use of non-temporal stores can decrease performance instead of increasing it.
The configuration file ./blaze/config/Streaming.h provides a compile time switch that can be used to (de-)activate streaming:
If useStreaming is set to true streaming is enabled, if it is set to false streaming is disabled. It is recommended to consult the target architecture's white papers to decide whether streaming is beneficial or hurtful for performance.