Assignment of ZeroMatrix to DynamicMatrix is extremely slow
Issue #230
resolved
Assigment of a ZeroMatrix
to DynamicMatrix
is >400 times slower than the uniform scalar assignment. A benchmark:
#include <blaze/Math.h>
#include <benchmark/benchmark.h>
template <typename Real, size_t M, size_t N>
static void BM_DynamicMatrixZeroMatrixAssign(::benchmark::State& state)
{
blaze::DynamicMatrix<Real> A(M, N);
for (auto _ : state)
::benchmark::DoNotOptimize(A = blaze::ZeroMatrix<Real>(M, N));
}
template <typename Real, size_t M, size_t N>
static void BM_DynamicMatrixZeroAssign(::benchmark::State& state)
{
blaze::DynamicMatrix<Real> A(M, N);
for (auto _ : state)
::benchmark::DoNotOptimize(A = Real {0});
}
BENCHMARK_TEMPLATE(BM_DynamicMatrixZeroMatrixAssign, double, 4, 1);
BENCHMARK_TEMPLATE(BM_DynamicMatrixZeroAssign, double, 4, 1);
Output:
2019-02-19 16:24:13
Running build/bin/tmpc_bench
Run on (12 X 4100 MHz CPU s)
CPU Caches:
L1 Data 32K (x6)
L1 Instruction 32K (x6)
L2 Unified 256K (x6)
L3 Unified 9216K (x1)
Load Average: 0.27, 0.62, 1.11
-----------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------------------------------------
BM_DynamicMatrixZeroMatrixAssign<double, 4, 1> 2372 ns 2372 ns 317329
BM_DynamicMatrixZeroAssign<double, 4, 1> 4.99 ns 4.99 ns 118211356
Compiler: gcc-8.2.0, compiler flags: -O2 -g -DNDEBUG
Comments (5)
-
reporter -
Hi Mikhail!
Thanks a lot for pointing to this defect: You are correct: Combining
DynamicMatrix
,ZeroMatrix
and OpenMP will result in a significantly more expensive assignment for tiny matrices. We apologize for the inconvenience and will fix the problem as quickly as possible.Best regards,
Klaus!
-
-
assigned issue to
-
assigned issue to
-
- changed status to open
-
- changed status to resolved
Commit b742cdd resolves the significant performance penalty when assigning a
ZeroMatrix
to aDynamicMatrix
while using OpenMP. The fix is immediately available via cloning the Blaze repository and will be officially released in Blaze 3.5. - Log in to comment
Profiling shows that most of the time is wasted in
omp
functions: