Unchecked column of a checked submatrix is slower than checked column of a checked submatrix

Issue #291 wontfix
Mikhail Katliar created an issue

Benchmark:

#include <blaze/Math.h>

#include <benchmark/benchmark.h>


namespace tmpc :: benchmark
{
    template <typename Real, size_t N, bool CL, bool CR>
    static void BM_column(::benchmark::State& state)
    {
        blaze::StaticMatrix<double, N, N, blaze::columnMajor> A;
        randomize(A);

        for (auto _ : state)
        {
            size_t const k = N / 2;
            size_t const rs = N - k;

            auto D21 = submatrix(A, k, k, rs, 1, blaze::checked);
            auto const D20 = submatrix(A, k, 0, rs, k, blaze::checked);

            for (size_t j = 0; j < k; ++j)
                column(D21, 0, blaze::Check<CL> {}) -= (~A)(k, j) * column(D20, j, blaze::Check<CR> {});

            ::benchmark::DoNotOptimize(A(N - 1, N - 1));
        }
    }


    BENCHMARK_TEMPLATE(BM_column, double, 60, false, false);
    BENCHMARK_TEMPLATE(BM_column, double, 60, false, true);
    BENCHMARK_TEMPLATE(BM_column, double, 60, true, false);
    BENCHMARK_TEMPLATE(BM_column, double, 60, true, true);
}

Compiled with g++-8.3.0, options -O2 -g -DNDEBUG

Output:

2019-09-17 23:26:01
Run on (4 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x4)
  L1 Instruction 32K (x4)
  L2 Unified 256K (x4)
  L3 Unified 6144K (x1)
----------------------------------------------------------------------------------
Benchmark                                           Time           CPU Iterations
----------------------------------------------------------------------------------
BM_column<double, 60, false, false>_mean          278 ns        278 ns    2472675
BM_column<double, 60, false, false>_median        278 ns        278 ns    2472675
BM_column<double, 60, false, false>_stddev          1 ns          1 ns    2472675
BM_column<double, 60, false, true>_mean           138 ns        138 ns    5012875
BM_column<double, 60, false, true>_median         138 ns        138 ns    5012875
BM_column<double, 60, false, true>_stddev           0 ns          0 ns    5012875
BM_column<double, 60, true, false>_mean           278 ns        278 ns    2510850
BM_column<double, 60, true, false>_median         278 ns        278 ns    2510850
BM_column<double, 60, true, false>_stddev           0 ns          0 ns    2510850
BM_column<double, 60, true, true>_mean            138 ns        138 ns    5037161
BM_column<double, 60, true, true>_median          138 ns        138 ns    5037161
BM_column<double, 60, true, true>_stddev            0 ns          0 ns    5037161

One can see that having blaze::unchecked on the right-hand side of the expression makes the code ~2x slower, whereas on the left side checked vs unchecked has 0 effect on performance.

I would expect the unchecked submatrices and subvectors to be at least not slower than the checked ones.

Comments (2)

  1. Klaus Iglberger

    Hi Misha!

    Thanks a lot for pointing out this problem. We’ll analyze it as soon as problem.

    Best regards,

    Klaus!

  2. Klaus Iglberger

    Hi Misha!

    Thanks again for pointing out this possible defect. As it turns out, the problem is only reproducible with GCC-8, but not with GCC-7 or GCC-9. Also Clang does not show any of the described performance problems. Since we did not find any problem within Blaze that would explain the GCC-8 behavior, we now consider it a GCC issue that fortunately has already been fixed. Hence we'll not update any Blaze code.

    Best regards,

    Klaus!

  3. Log in to comment