Unchecked column of a checked submatrix is slower than checked column of a checked submatrix
Benchmark:
#include <blaze/Math.h>
#include <benchmark/benchmark.h>
namespace tmpc :: benchmark
{
template <typename Real, size_t N, bool CL, bool CR>
static void BM_column(::benchmark::State& state)
{
blaze::StaticMatrix<double, N, N, blaze::columnMajor> A;
randomize(A);
for (auto _ : state)
{
size_t const k = N / 2;
size_t const rs = N - k;
auto D21 = submatrix(A, k, k, rs, 1, blaze::checked);
auto const D20 = submatrix(A, k, 0, rs, k, blaze::checked);
for (size_t j = 0; j < k; ++j)
column(D21, 0, blaze::Check<CL> {}) -= (~A)(k, j) * column(D20, j, blaze::Check<CR> {});
::benchmark::DoNotOptimize(A(N - 1, N - 1));
}
}
BENCHMARK_TEMPLATE(BM_column, double, 60, false, false);
BENCHMARK_TEMPLATE(BM_column, double, 60, false, true);
BENCHMARK_TEMPLATE(BM_column, double, 60, true, false);
BENCHMARK_TEMPLATE(BM_column, double, 60, true, true);
}
Compiled with g++-8.3.0, options -O2 -g -DNDEBUG
Output:
2019-09-17 23:26:01
Run on (4 X 3600 MHz CPU s)
CPU Caches:
L1 Data 32K (x4)
L1 Instruction 32K (x4)
L2 Unified 256K (x4)
L3 Unified 6144K (x1)
----------------------------------------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------------------------------------
BM_column<double, 60, false, false>_mean 278 ns 278 ns 2472675
BM_column<double, 60, false, false>_median 278 ns 278 ns 2472675
BM_column<double, 60, false, false>_stddev 1 ns 1 ns 2472675
BM_column<double, 60, false, true>_mean 138 ns 138 ns 5012875
BM_column<double, 60, false, true>_median 138 ns 138 ns 5012875
BM_column<double, 60, false, true>_stddev 0 ns 0 ns 5012875
BM_column<double, 60, true, false>_mean 278 ns 278 ns 2510850
BM_column<double, 60, true, false>_median 278 ns 278 ns 2510850
BM_column<double, 60, true, false>_stddev 0 ns 0 ns 2510850
BM_column<double, 60, true, true>_mean 138 ns 138 ns 5037161
BM_column<double, 60, true, true>_median 138 ns 138 ns 5037161
BM_column<double, 60, true, true>_stddev 0 ns 0 ns 5037161
One can see that having blaze::unchecked
on the right-hand side of the expression makes the code ~2x slower, whereas on the left side checked
vs unchecked
has 0 effect on performance.
I would expect the unchecked
submatrices and subvectors to be at least not slower than the checked ones.
Comments (2)
-
-
- changed status to wontfix
Hi Misha!
Thanks again for pointing out this possible defect. As it turns out, the problem is only reproducible with GCC-8, but not with GCC-7 or GCC-9. Also Clang does not show any of the described performance problems. Since we did not find any problem within Blaze that would explain the GCC-8 behavior, we now consider it a GCC issue that fortunately has already been fixed. Hence we'll not update any Blaze code.
Best regards,
Klaus!
- Log in to comment
Hi Misha!
Thanks a lot for pointing out this problem. We’ll analyze it as soon as problem.
Best regards,
Klaus!