SEGFAULT in matrix-matrix assignment due to unaligned SIMD instruction operand under specific conditions

Issue #290 wontfix
Mikhail Katliar created an issue

The code reproducing the issue:

#include <blaze/Math.h>

#include <iostream>


int main(int, char **)
{
    using namespace blaze;

    using Real = double;
    size_t constexpr NX = 8;
    size_t constexpr NU = 1;

    std::cout << "SIMDSIZE=" << SIMDTrait<Real>::size << std::endl;

    auto H = std::make_unique<SymmetricMatrix<StaticMatrix<Real, NU + NX, NU + NX, columnMajor>>>();
    auto Q = submatrix<NU, NU, NX, NX>(*H);

    std::cout << "data(H)=" << data(*H) << std::endl;
    std::cout << "data(Q)=" << data(Q) << std::endl;

    auto LL = std::make_unique<StaticMatrix<Real, NX + NU, NX + NU, columnMajor>>();
    auto Lcal = submatrix<NU, NU, NX, NX>(*LL);

    std::cout << "data(LL)=" << data(*LL) << std::endl;
    std::cout << "data(Lcal)=" << data(Lcal) << std::endl;

    Lcal = Q;

    std::cout << "H=\n" << *H << std::endl;
    std::cout << "LL=\n" << *LL << std::endl;

    return 0;
}

Compiler command line:

g++ -std=c++17 -O2 -g -DNDEBUG -march=skylake matrix_assign.cpp

Compiler version: g++ (Ubuntu 8.3.0-6ubuntu1) 8.3.0

Program output:

SIMDSIZE=4
data(H)=0
data(Q)=0
data(LL)=0x557a4727e660
data(Lcal)=0x557a4727e6c8
Segmentation fault (core dumped)

Stack trace:

#0  blaze::Submatrix<blaze::StaticMatrix<double, 9ul, 9ul, true>, (blaze::AlignmentFlag)0, true, true, 1ul, 1ul, 8ul, 8ul>::assign<blaze::Submatrix<blaze::SymmetricMatrix<blaze::StaticMatrix<double, 9ul, 9ul, true>, true, true, true>, (blaze::AlignmentFlag)0, true, true, 1ul, 1ul, 8ul, 8ul> > (rhs=..., this=<optimized out>) at /usr/local/include/blaze/math/views/submatrix/Dense.h:5907
#1  blaze::assign_backend<blaze::Submatrix<blaze::StaticMatrix<double, 9ul, 9ul, true>, (blaze::AlignmentFlag)0, true, true, 1ul, 1ul, 8ul, 8ul>, blaze::Submatrix<blaze::SymmetricMatrix<blaze::StaticMatrix<double, 9ul, 9ul, true>, true, true, true>, (blaze::AlignmentFlag)0, true, true, 1ul, 1ul, 8ul, 8ul>, true> (rhs=..., lhs=...) at /usr/local/include/blaze/math/expressions/Matrix.h:1016
#2  blaze::assign<blaze::Submatrix<blaze::StaticMatrix<double, 9ul, 9ul, true>, (blaze::AlignmentFlag)0, true, true, 1ul, 1ul, 8ul, 8ul>, true, blaze::Submatrix<blaze::SymmetricMatrix<blaze::StaticMatrix<double, 9ul, 9ul, true>, true, true, true>, (blaze::AlignmentFlag)0, true, true, 1ul, 1ul, 8ul, 8ul>, true> (rhs=..., lhs=...) at /usr/local/include/blaze/math/expressions/Matrix.h:1101
#3  blaze::smpAssign<blaze::Submatrix<blaze::StaticMatrix<double, 9ul, 9ul, true>, (blaze::AlignmentFlag)0, true, true, 1ul, 1ul, 8ul, 8ul>, true, blaze::Submatrix<blaze::SymmetricMatrix<blaze::StaticMatrix<double, 9ul, 9ul, true>, true, true, true>, (blaze::AlignmentFlag)0, true, true, 1ul, 1ul, 8ul, 8ul>, true> (rhs=..., lhs=...) at /usr/local/include/blaze/math/smp/default/DenseMatrix.h:108
#4  blaze::Submatrix<blaze::StaticMatrix<double, 9ul, 9ul, true>, (blaze::AlignmentFlag)0, true, true, 1ul, 1ul, 8ul, 8ul>::operator=<blaze::Submatrix<blaze::SymmetricMatrix<blaze::StaticMatrix<double, 9ul, 9ul, true>, true, true, true>, (blaze::AlignmentFlag)0, true, true, 1ul, 1ul, 8ul, 8ul>, true> (this=0x7fffffffd2a0, rhs=...) at /usr/local/include/blaze/math/views/submatrix/Dense.h:4711
#5  0x00005555555553a1 in main () at /home/kotlyar/projects/tmp/matrix_assign.cpp:28

Failing line: https://bitbucket.org/blaze-lib/blaze/src/cc8016b5cfcb9b003709db686a1086c0d40f5f02/blaze/math/views/submatrix/Dense.h#lines-5907

Disassembly of the failing function:

Dump of assembler code for function blaze::Submatrix<blaze::StaticMatrix<double, 9ul, 9ul, true>, (blaze::AlignmentFlag)0, true, true, 1ul, 1ul, 8ul, 8ul>::operator=<blaze::Submatrix<blaze::SymmetricMatrix<blaze::StaticMatrix<double, 9ul, 9ul, true>, true, true, true>, (blaze::AlignmentFlag)0, true, true, 1ul, 1ul, 8ul, 8ul>, true>(blaze::Matrix<blaze::Submatrix<blaze::SymmetricMatrix<blaze::StaticMatrix<double, 9ul, 9ul, true>, true, true, true>, (blaze::AlignmentFlag)0, true, true, 1ul, 1ul, 8ul, 8ul>, true> const&):
   0x0000555555555800 <+0>: push   %rbp
   0x0000555555555801 <+1>: mov    %rdi,%rax
   0x0000555555555804 <+4>: mov    $0x68,%r8d
   0x000055555555580a <+10>:    mov    %rsp,%rbp
   0x000055555555580d <+13>:    and    $0xffffffffffffffe0,%rsp
   0x0000555555555811 <+17>:    sub    $0x220,%rsp
   0x0000555555555818 <+24>:    mov    (%rax),%rcx
   0x000055555555581b <+27>:    mov    (%rsi),%rdx
   0x000055555555581e <+30>:    mov    %fs:0x28,%rdi
   0x0000555555555827 <+39>:    mov    %rdi,0x218(%rsp)
   0x000055555555582f <+47>:    xor    %edi,%edi
   0x0000555555555831 <+49>:    cmp    %rdx,%rcx
   0x0000555555555834 <+52>:    jne    0x555555555843 <blaze::Submatrix<blaze::StaticMatrix<double, 9ul, 9ul, true>, (blaze::AlignmentFlag)0, true, true, 1ul, 1ul, 8ul, 8ul>::operator=<blaze::Submatrix<blaze::SymmetricMatrix<blaze::StaticMatrix<double, 9ul, 9ul, true>, true, true, true>, (blaze::AlignmentFlag)0, true, true, 1ul, 1ul, 8ul, 8ul>, true>(blaze::Matrix<blaze::Submatrix<blaze::SymmetricMatrix<blaze::StaticMatrix<double, 9ul, 9ul, true>, true, true, true>, (blaze::AlignmentFlag)0, true, true, 1ul, 1ul, 8ul, 8ul>, true> const&)+67>
   0x0000555555555836 <+54>:    jmp    0x555555555888 <blaze::Submatrix<blaze::StaticMatrix<double, 9ul, 9ul, true>, (blaze::AlignmentFlag)0, true, true, 1ul, 1ul, 8ul, 8ul>::operator=<blaze::Submatrix<blaze::SymmetricMatrix<blaze::StaticMatrix<double, 9ul, 9ul, true>, true, true, true>, (blaze::AlignmentFlag)0, true, true, 1ul, 1ul, 8ul, 8ul>, true>(blaze::Matrix<blaze::Submatrix<blaze::SymmetricMatrix<blaze::StaticMatrix<double, 9ul, 9ul, true>, true, true, true>, (blaze::AlignmentFlag)0, true, true, 1ul, 1ul, 8ul, 8ul>, true> const&)+136>
   0x0000555555555838 <+56>:    nopl   0x0(%rax,%rax,1)
   0x0000555555555840 <+64>:    mov    (%rsi),%rdx
   0x0000555555555843 <+67>:    add    %r8,%rdx
   0x0000555555555846 <+70>:    vmovupd (%rdx),%ymm4
   0x000055555555584a <+74>:    vmovupd %ymm4,(%rcx,%r8,1)
=> 0x0000555555555850 <+80>:    vmovapd 0x20(%rdx),%ymm5
   0x0000555555555855 <+85>:    vmovupd %ymm5,0x20(%rcx,%r8,1)
   0x000055555555585c <+92>:    add    $0x60,%r8
   0x0000555555555860 <+96>:    cmp    $0x368,%r8
   0x0000555555555867 <+103>:   jne    0x555555555840 <blaze::Submatrix<blaze::StaticMatrix<double, 9ul, 9ul, true>, (blaze::AlignmentFlag)0, true, true, 1ul, 1ul, 8ul, 8ul>::operator=<blaze::Submatrix<blaze::SymmetricMatrix<blaze::StaticMatrix<double, 9ul, 9ul, true>, true, true, true>, (blaze::AlignmentFlag)0, true, true, 1ul, 1ul, 8ul, 8ul>, true>(blaze::Matrix<blaze::Submatrix<blaze::SymmetricMatrix<blaze::StaticMatrix<double, 9ul, 9ul, true>, true, true, true>, (blaze::AlignmentFlag)0, true, true, 1ul, 1ul, 8ul, 8ul>, true> const&)+64>
   0x0000555555555869 <+105>:   mov    0x218(%rsp),%rdi
   0x0000555555555871 <+113>:   xor    %fs:0x28,%rdi
   0x000055555555587a <+122>:   jne    0x5555555558ee <blaze::Submatrix<blaze::StaticMatrix<double, 9ul, 9ul, true>, (blaze::AlignmentFlag)0, true, true, 1ul, 1ul, 8ul, 8ul>::operator=<blaze::Submatrix<blaze::SymmetricMatrix<blaze::StaticMatrix<double, 9ul, 9ul, true>, true, true, true>, (blaze::AlignmentFlag)0, true, true, 1ul, 1ul, 8ul, 8ul>, true>(blaze::Matrix<blaze::Submatrix<blaze::SymmetricMatrix<blaze::StaticMatrix<double, 9ul, 9ul, true>, true, true, true>, (blaze::AlignmentFlag)0, true, true, 1ul, 1ul, 8ul, 8ul>, true> const&)+238>
   0x000055555555587c <+124>:   vzeroupper 
   0x000055555555587f <+127>:   leaveq 
   0x0000555555555880 <+128>:   retq   
   0x0000555555555881 <+129>:   nopl   0x0(%rax)
   0x0000555555555888 <+136>:   lea    0x68(%rcx),%rdx
   0x000055555555588c <+140>:   mov    %rsp,%r8
   0x000055555555588f <+143>:   mov    %rdx,%rsi
   0x0000555555555892 <+146>:   add    $0x368,%rcx
   0x0000555555555899 <+153>:   mov    %r8,%r9
   0x000055555555589c <+156>:   nopl   0x0(%rax)
   0x00005555555558a0 <+160>:   vmovupd (%rsi),%ymm0
   0x00005555555558a4 <+164>:   vmovupd 0x20(%rsi),%ymm1
   0x00005555555558a9 <+169>:   add    $0x60,%rsi
   0x00005555555558ad <+173>:   vmovapd %ymm0,(%r9)
   0x00005555555558b2 <+178>:   vmovapd %ymm1,0x20(%r9)
   0x00005555555558b8 <+184>:   add    $0x40,%r9
   0x00005555555558bc <+188>:   cmp    %rcx,%rsi
   0x00005555555558bf <+191>:   jne    0x5555555558a0 <blaze::Submatrix<blaze::StaticMatrix<double, 9ul, 9ul, true>, (blaze::AlignmentFlag)0, true, true, 1ul, 1ul, 8ul, 8ul>::operator=<blaze::Submatrix<blaze::SymmetricMatrix<blaze::StaticMatrix<double, 9ul, 9ul, true>, true, true, true>, (blaze::AlignmentFlag)0, true, true, 1ul, 1ul, 8ul, 8ul>, true>(blaze::Matrix<blaze::Submatrix<blaze::SymmetricMatrix<blaze::StaticMatrix<double, 9ul, 9ul, true>, true, true, true>, (blaze::AlignmentFlag)0, true, true, 1ul, 1ul, 8ul, 8ul>, true> const&)+160>
   0x00005555555558c1 <+193>:   nopl   0x0(%rax)
   0x00005555555558c8 <+200>:   vmovapd (%r8),%ymm2
   0x00005555555558cd <+205>:   vmovapd 0x20(%r8),%ymm3
   0x00005555555558d3 <+211>:   vmovupd %ymm2,(%rdx)
   0x00005555555558d7 <+215>:   vmovupd %ymm3,0x20(%rdx)
   0x00005555555558dc <+220>:   add    $0x60,%rdx
   0x00005555555558e0 <+224>:   add    $0x40,%r8
   0x00005555555558e4 <+228>:   cmp    %rdx,%rcx
   0x00005555555558e7 <+231>:   jne    0x5555555558c8 <blaze::Submatrix<blaze::StaticMatrix<double, 9ul, 9ul, true>, (blaze::AlignmentFlag)0, true, true, 1ul, 1ul, 8ul, 8ul>::operator=<blaze::Submatrix<blaze::SymmetricMatrix<blaze::StaticMatrix<double, 9ul, 9ul, true>, true, true, true>, (blaze::AlignmentFlag)0, true, true, 1ul, 1ul, 8ul, 8ul>, true>(blaze::Matrix<blaze::Submatrix<blaze::SymmetricMatrix<blaze::StaticMatrix<double, 9ul, 9ul, true>, true, true, true>, (blaze::AlignmentFlag)0, true, true, 1ul, 1ul, 8ul, 8ul>, true> const&)+200>
   0x00005555555558e9 <+233>:   jmpq   0x555555555869 <blaze::Submatrix<blaze::StaticMatrix<double, 9ul, 9ul, true>, (blaze::AlignmentFlag)0, true, true, 1ul, 1ul, 8ul, 8ul>::operator=<blaze::Submatrix<blaze::SymmetricMatrix<blaze::StaticMatrix<double, 9ul, 9ul, true>, true, true, true>, (blaze::AlignmentFlag)0, true, true, 1ul, 1ul, 8ul, 8ul>, true>(blaze::Matrix<blaze::Submatrix<blaze::SymmetricMatrix<blaze::StaticMatrix<double, 9ul, 9ul, true>, true, true, true>, (blaze::AlignmentFlag)0, true, true, 1ul, 1ul, 8ul, 8ul>, true> const&)+105>
   0x00005555555558ee <+238>:   vzeroupper 
   0x00005555555558f1 <+241>:   callq  0x5555555550e0 <__stack_chk_fail@plt>
End of assembler dump.

Registers:

rax            0x7fffffffd2a0      140737488343712
rbx            0x55555556c660      93824992331360
rcx            0x55555556c660      93824992331360
rdx            0x55555556c2e8      93824992330472
rsi            0x7fffffffd290      140737488343696
rdi            0x0                 0
rbp            0x7fffffffd260      0x7fffffffd260
rsp            0x7fffffffd040      0x7fffffffd040
r8             0x68                104
r9             0x7ffff7a21740      140737347983168
r10            0xa                 10
r11            0x246               582
r12            0x55555556c5e0      93824992331232
r13            0x7fffffffd3d0      140737488344016
r14            0x0                 0
r15            0x0                 0
rip            0x555555555850      0x555555555850 <blaze::Submatrix<blaze::StaticMatrix<double, 9ul, 9ul, true>, (blaze::AlignmentFlag)0, true, true, 1ul, 1ul, 8ul, 8ul>::operator=<blaze::Submatrix<blaze::SymmetricMatrix<blaze::StaticMatrix<double, 9ul, 9ul, true>, true, true, true>, (blaze::AlignmentFlag)0, true, true, 1ul, 1ul, 8ul, 8ul>, true>(blaze::Matrix<blaze::Submatrix<blaze::SymmetricMatrix<blaze::StaticMatrix<double, 9ul, 9ul, true>, true, true, true>, (blaze::AlignmentFlag)0, true, true, 1ul, 1ul, 8ul, 8ul>, true> const&)+80>
eflags         0x10206             [ PF IF RF ]
cs             0x33                51
ss             0x2b                43
ds             0x0                 0
es             0x0                 0
fs             0x0                 0
gs             0x0                 0

The vmovapd instruction at 0x0000555555555850 expects a 0x20-bytes aligned argument, whereas rdxis0x55555556c2e8and hence 0x20(%rdx)is not 0x20-bytes aligned. This results in SEGFAULT.

The issue is very specific. It does not reproduce, if one of the following is true:

  • The compiler is changed to clang version 8.0.0-3
  • The -march option is removed or changed to -mavx or -mavx2
  • The optimization option is changed to -O1or -O0
  • One of the matrices H, LL is auto-allocated instead of heap-allocated
  • The H matrix is just a StaticMatrix<Real, NU + NX, NU + NX, columnMajor>instead of SymmetricMatrix<StaticMatrix<Real, NU + NX, NU + NX, columnMajor>>
  • The NX constant is set to 4, 5, 6, or 7

It is still reproduced if

  • -march=skylakeis changed to -march=haswell

I am not sure whether it is a compiler bug or not.

Comments (12)

  1. Klaus Iglberger

    Hi Misha!

    First of all thanks a lot for providing so much valuable feedback about Blaze. The provided info (minimum working example, stack trace, etc.) is exemplary!

    In this particular case the problem is caused by using new for an overaligned data structure. When using std::make_unique, the underlying new (usually) allocates 16 byte aligned memory. However, since you explicitly compile for the Skylake architecture, AVX is enabled, which requires 32 byte aligned memory. This memory requirement is used by StaticMatrix (see line 544 in <blaze/math/dense/StaticMatrix.h>), which unfortunately can cause problems since the underlying dynamic memory might be misaligned (16 byte instead of 32 byte). Hopefully this explains the problem well enough. Hopefully this also explains why the problem is so difficult to reproduce, since in every run you might get different memory.

    Still, thanks again for the valuable feedback, this is very much appreciated.

    Best regards,

    Klaus!

  2. Mikhail Katliar reporter

    Hello Klaus,

    the alignment does not seem to be the problem. Please look at the following code, which is a modification of the previous example:

    #include <blaze/Math.h>
    
    #include <iostream>
    
    
    using Real = double;
    blaze::size_t constexpr NX = 8;
    blaze::size_t constexpr NU = 1;
    
    using HType = blaze::SymmetricMatrix<blaze::StaticMatrix<Real, NU + NX, NU + NX, blaze::columnMajor>>;
    using LLType = blaze::StaticMatrix<Real, NX + NU, NX + NU, blaze::columnMajor>;
    
    
    void f(HType& H, LLType& LL)
    {
        randomize(H);
    
        auto Q = blaze::submatrix<NU, NU, NX, NX>(H);
        auto Lcal = blaze::submatrix<NU, NU, NX, NX>(LL);
    
        decltype(Q)::Iterator Q_begin(Q.begin(0));
        std::cout << "Q_begin.isAligned() = " << Q_begin.isAligned() << std::endl;
    
        Lcal = Q;
    }
    
    
    template <typename MT, bool SO>
    void printInfo(std::ostream& os, std::string const& name, blaze::Matrix<MT, SO> const& m)
    {
        std::cout << name << ": size=" << sizeof(~m) << ", addr=" << &(~m) << ", data=" << (~m).data() << std::endl;
    }
    
    
    int main(int, char **)
    {
        std::cout << "SIMDSIZE=" << blaze::SIMDTrait<Real>::size << std::endl;
    
        HType H_stack;
        LLType LL_stack;
        auto H_heap = std::make_unique<HType>();
        auto LL_heap = std::make_unique<LLType>();
    
        printInfo(std::cout, "H_stack", H_stack);
        printInfo(std::cout, "LL_stack", LL_stack);
        printInfo(std::cout, "H_heap", *H_heap);
        printInfo(std::cout, "LL_heap", *LL_heap);
    
        std::cout << "Using stack" << std::endl;
        f(H_stack, LL_stack);
        std::cout << "Ok!" << std::endl;
    
        std::cout << "Using heap" << std::endl;
        f(*H_heap, *LL_heap);
        std::cout << "Ok!" << std::endl;
    
        return 0;
    }
    

    Program output:

    SIMDSIZE=4
    H_stack: size=864, addr=0x7ffc12450f00, data=0x7ffc12450f00
    LL_stack: size=864, addr=0x7ffc12451260, data=0x7ffc12451260
    H_heap: size=864, addr=0x55809f4c1280, data=0x55809f4c1280
    LL_heap: size=864, addr=0x55809f4c1660, data=0x55809f4c1660
    Using stack
    Q_begin.isAligned() = 0
    Segmentation fault (core dumped)
    

    You see that both stack- and heap-allocated matrices are aligned on 0x20 boundary. Furthermore, the SEGFAULT now happens with the stack-allocated matrix.

  3. Mikhail Katliar reporter
    • changed status to new

    The matrices are aligned on 0x20 boundary. Furthermore, the issue is reproducible with stack-allocated matrices.

  4. Klaus Iglberger

    Hi Misha!

    I have continued to analyse the issue, but unfortunately so far haven’t been able to reproduce the segmentation fault despite the attempt to provide exactly the same settings:

    • I used exactly the same compiler(s) as you (g++-mp-8 (MacPorts gcc8 8.3.0_4) 8.3.0);
    • I used exactly the same compiler flags as you (g++ -std=c++17 -O2 -g -DNDEBUG -march=skylake ...);
    • I used both of your code examples.

    I have (re-)analyzed all functions that are involved in the matrix assignment and additionally added output statements to the loada() function for double precision values (see <blaze/math/simd/Loada.h>, line 432) to prove that no aligned load is explicitly triggered which could cause the problematic vmovapd instruction. So far I have not found anything that could cause the problem, everything seems to work as expected.

    In summary, either the problem is very well hidden and only surfaces under very specific conditions, or it is indeed a compiler issue. Since there is no way for me to prove that the error is not in Blaze I hope that you have the time to dig a little deeper yourself. Since you have already some experience with Blaze code it should be possible for you to modify some functions to see if there is any effect. For instance, you could explicitly disable aligned loads in the submatrix iterators by setting the isAligned flag to false (see <blaze/math/views/submatrix/Dense.h>, line 3509). Alternatively you can simplify the load() and store() functions in the same nested class (see line 3621 and line 3670) to directly call loadu() and storeu(), respectively (this is the functions that are called from the assignment kernel in line 5907).

    Thanks a lot for your help, I hope you find something.

    Best regards,

    Klaus!

  5. Mikhail Katliar reporter

    Hello Klaus,

    It seems that the reason why the issue is not reproduced on your system is that despite the -march=skylake option your compiler tunes the instruction scheduling not for Skylake, but for the CPU architecture it runs on: https://lemire.me/blog/2018/07/25/it-is-more-complicated-than-i-thought-mtune-march-in-gcc/. This can be controlled with the -mtune option. I found out that if I add -mtune=generic, then the issue is not reproduced. So it seems necessary to add-mtune=skylake to reproduce it on a machine with different CPU. Could you please try it?

    I have provided the archive with the source, the Makefile and the output files from my system. Could you also please check what is the output of g++-8 -march=native -Q --help=target | grep -- '-march=' | cut -f3?

  6. Mikhail Katliar reporter
    -mtune option value reproduced
    generic NO
    sandybridge NO
    ivybridge NO
    haswell YES
    broadwell YES
    skylake YES
    cannonlake YES

  7. Mikhail Katliar reporter

    The code generated for ivybridge is quite close to the one for generic, whereas the difference between haswell and generic is quite big (see above).

  8. Klaus Iglberger

    Hi Misha!

    Thanks a lot for providing a completely runable example. With this I was indeed able to reproduce the segmentation fault and could start to properly analyze it on my side. Despite several hours of investigation, I did not find any bug in Blaze. On the other hand, I did find several indications that this is indeed a compiler bug:

    • The segfault only occurs in GCC, but not in Clang;
    • The error only appears with a special "optimization" flag, but not with the usual flags to enable AVX or SSE;
    • The Blaze function that would trigger an aligned load of a double value is never called, i.e. there is no explanation for the movapd instruction;
    • Commenting out branches that are not taken can make the segfault disappear;
    • Slightly rewriting the apparently failing load()/store() functions without changing the meaning make the segfault disappear:
    left.store( right.load() );  // Segfault
    
    auto xmm( right.load() );
    std::cout << "Output";
    left.store( xmm );  // No segfault
    

    Of course I don't have any proof that this is a compiler error, but since I could not detect anything wrong in Blaze and since the Blazetest has not detected any bug in years, even with random submatrix assignments, I am bound to believe that this is not an issue in Blaze itself.

    Still, many thanks to you for being persistent and for investing so much time to provide the information.

    Best regards,

    Klaus!

  9. Mikhail Katliar reporter

    Hello Klaus,

    thank you for looking into this issue one again. I also have the opinion that it is a compiler bug.

  10. Log in to comment