Wiki
Clone wikiblaze / Blazemark
Introduction
The Blazemark is the benchmark suite of the Blaze math library. It provides benchmarks for a direct comparison of several (Smart) Expression Template based libraries for various arithmetic operations. Additionally, for some operations, it allows a comparison to "good old" C code and/or C++ operator overloading. The Blazemark It is located in the ./blazemark
subdirectory of the Blaze library.
Please note that the Blazemark has been deprecated since the release of Blaze 3.5!
Supported Libraries
Currently the following libraries are included in the Blazemark:
- Blitz++; Minimum requirement: Version 0.10
- Boost uBLAS; Minimum requirement: Version 1.54
- GMM++; Minimum requirement: Version 4.1
- Armadillo; Minimum requirement: Version 2.4.2
- MTL4; Minimum requirement: Version 4.0
- Eigen3; Minimum requirement: Version 3.1
Additionally, it is possible to include a plain BLAS library in the benchmark (such as for instance the Intel MKL, the ACML, Atlas, or Goto).
Configuration and Compilation
Just as the Blaze library itself, the Blazemark is fairly easy to configure and compile. Since currently there is no direct support for Windows available, the following instructions only summarize the necessary steps to configure and compile the Blazemark on Linux and MacOSX systems.
The first step is to adapt the Configfile
of the Blazemark home directory (./blazemark
). In the Configfile
the necessary parameters for all required libraries (such as Boost) as well as all parameters for the optional libraries (any BLAS library, Blitz++, GMM, Armadillo, MTL4, Eigen3) can be specified. Note that it is also necessary to specifically select a compiler along with a set of suitable compilation flags. It is of highest importance to chose a reasonable set of compilation flags in order to achieve maximum performance of the benchmarked libraries. Assuming a "Sandy Bridge" or "Ivy Bridge" CPU capable of AVX, for the GNU C++ compiler, the following set of compiler flags are recommended:
-Wall -Wextra -Werror -Wshadow -Woverloaded-virtual -ansi -O3 -mavx -DNDEBUG -fpermissive
In case of the Intel C++ compiler, the following flags are recommended:
-Werror -Wshadow -w1 -ansi -O3 -mavx -DNDEBUG -inline-level=2 -finline -fbuiltin
Any text editor can be used to adapt the Configfile
:
vi ./Configfile
The next step is the execution of the configure
script. It uses the Configfile
to create a Makefile
and to adapt several header files required for the compilation process:
./configure
Alternatively it is also possible to specify a specific configuration file, which for instance enables different configurations:
./configure my_configuration
In order to compile the complete benchmark suite, the Makefile
can be used:
make
Due to the large number of available benchmarks, this process will take several minutes. Alternatively, a specific benchmark can be build (for instance the sparse matrix/dense vector multiplication; for a complete list of all available benchmarks see the last section in this wiki):
make smatdvecmult
The resulting executable benchmark is located in the ./blazemark/bin
subdirectory. Before executing the benchmark, the size of the vectors and/or matrices involved in the benchmark can be specified via the according parameter file in the ./blazemark/params
subdirectory. Again, any text editor can be used for this task:
vi ./params/smatdvecmult.prm
Follow the instructions contained in the file to properly configure the parameters for the benchmark. After that, the benchmark can be started by simply calling the executable:
./bin/smatdvecmult
Command Line Parameters
Each benchmark executable offers the possibility to activate or deactivate specific kernels/libraries via the command line. Per default, all kernels/libraries for a particular benchmark are executed (which can be set in the ./blazemark/config/Config.h
header file). However, via the following command line arguments the kernels can be activated very flexibly:
-clike
: Activates the C-like kernel.-classic
: Activates the classic C++ kernel.-blas
: Activates the plain BLAS kernel.-blaze
: Activates the Blaze kernel.-boost
: Activates the Boost uBLAS kernel.-blitz
: Activates the Blitz++ kernel.-gmm
: Activates the GMM++ kernel.-armadillo
: Activates the Armadillo kernel.-mtl
: Activates the MTL kernel.-eigen
: Activates the Eigen kernel.
Additionally, these command line arguments can be prefixed with the -no
or -only
command:
-no
: Deactivates the according kernel.-only
: Deactivates all other kernels except the given one. Since the command line arguments are evaluated from left to right, all succeeding activation commands will activate the according kernels.
The following two examples illustrate the use of these command. The call
./bin/smatdvecmult -no-boost
runs the sparse row-major matrix/dense column vector multiplication benchmark using all available libraries except the Boost uBLAS library, since the -no-boost
command specifically deactivates the Boost kernel. The call
./bin/smatdvecmult -only-blaze -eigen
runs the multiplication with only the Blaze and the Eigen libraries, since the -only-blaze
command deactivates all kernels except Blaze and the succeeding -eigen
command re-activates the Eigen kernel.
Note that not all libraries support every operation. If no kernel exists for a certain library for a particular benchmark the command line argument is without effect.
Interpreting the Output
The following example show the output of the dense vector/dense vector addition benchmark, which was run for vectors of size 100 and 10000000:
Dense Vector/Dense Vector Addition: C-like implementation [MFlop/s]: 100 1115.44 10000000 206.317 Classic operator overloading [MFlop/s]: 100 415.703 10000000 112.557 Blaze [MFlop/s]: 100 2602.56 10000000 292.569 Boost uBLAS [MFlop/s]: 100 1056.75 10000000 208.639 Blitz++ [MFlop/s]: 100 1011.1 10000000 207.855 GMM++ [MFlop/s]: 100 1115.42 10000000 207.699 Armadillo [MFlop/s]: 100 1095.86 10000000 208.658 MTL [MFlop/s]: 100 1018.47 10000000 209.065 Eigen [MFlop/s]: 100 2173.48 10000000 209.899 N=100, steps=55116257 C-like = 2.33322 (4.94123) Classic = 6.26062 (13.2586) Blaze = 1 (2.11777) Boost uBLAS = 2.4628 (5.21565) Blitz++ = 2.57398 (5.4511) GMM++ = 2.33325 (4.94129) Armadillo = 2.3749 (5.0295) MTL = 2.55537 (5.41168) Eigen = 1.19742 (2.53585) N=10000000, steps=8 C-like = 1.41805 (0.387753) Classic = 2.5993 (0.710753) Blaze = 1 (0.27344) Boost uBLAS = 1.40227 (0.383437) Blitz++ = 1.40756 (0.384884) GMM++ = 1.40862 (0.385172) Armadillo = 1.40215 (0.383403) MTL = 1.39941 (0.382656) Eigen = 1.39386 (0.381136)
The first section presents the individual results of each library in MFlop/s. The competitors in this benchmarks are a plain C-like implementation, classical C++ operator overloading, the Blaze library, the Boost uBLAS library, Blitz++, GMM++, Armadillo, MTL, and Eigen.
The second section compares the libraries for each specified vector size. The first line shows the size of the vectors and the number of iteration steps used for the benchmark. After that the libraries are listed. The first number represents a factor of how much slower the library was than the fastest competitor; a number of 1 therefore represents the fastest competitor. The number in brackets is the runtime spend in the benchmark.
Available Benchmarks
Due to the high number of possible operations and thus also benchmarks, the names of the benchmarks and executables are assembled from abbreviations for the involved left and right operands:
- Vectors:
- dvec: Dense column vector
- tdvec: Dense row vector
- vec3: Dense 3D column vector
- tvec3: Dense 3D row vector
- vec6: Dense 6D column vector
- tvec6: Dense 6D row vector
- svec: Sparse column vector
- tsvec: Sparse row vector
- Matrices:
- dmat: Dense row-major matrix
- tdmat: Dense column-major matrix
- mat3: Dense 3x3 row-major matrix
- tmat3: Dense 3x3 column-major matrix
- mat6: Dense 6x6 row-major matrix
- tmat6: Dense 6x6 column-major matrix
- smat: Sparse row-major matrix
- tsmat: Sparse column-major matrix
For instance, the tdvecsmatmult
benchmark represents the multiplication between a dense row vector with a row-major sparse matrix.
The following is a complete list of all available benchmarks, listed in alphabetical order:
Updated