Vectorization on Windows 10 x64 VS2015

Issue #100 resolved
Fabien Péan
created an issue

Hi Blaze team,

I was testing a simple DenseMatrix / DenseMatrix Multiplication on Windows with the snippet below and noticed serious speed issues that I could not reproduce on my Linux machine.

Windows 10 x64 VS2015 with OpenBLAS(+Lapack) in Release mode. Default configuration except for BLAS_MODE, set to 1, and CACHE_SIZE, fitted to my L3 cache size.

       #define BLAZE_BLAS_MODE 1
       //...
       const size_t N(2000L);
       DynamicMatrix<double, rowMajor> U(N, N), V(N,N), X(N,N), W(N,N);
       // Initialize U,V with random values and X,W to 0
       // Using Blaze expression
       W = U*V;
       // Using direct call to BLAS
       gemm(X, U, V, 1.0, 1.0);

The call to gemm was much faster than the expression by a factor 10 or so. WC time was around 7s for call to expression and 0.8s for call to gemm.

Basically, is it a known behaviour related to Microsoft compiler, or is there actually something wrong ?


EDIT:

So I could effectively verify this was a vectorization issue as stated below. However I noticed another problem. On Windows x64, there is not compiler definition such as SSE4_2, SSSE3, SSE3, _M_IX86_FP By default, it always gives a static assert error when using /arch:AVX2 or /arch:AVX in system/Vectorization.h Therefore, it is necessary to set a definition manually offered by an "hidden" switch BLAZE_ENFORCE_AVX to make it work.

According to https://blogs.msdn.microsoft.com/vcblog/2014/02/28/avx2-support-in-visual-studio-c-compiler/, in visual studio

"If you specify /arch:AVX2, then it also enables /arch:AVX – we try to keep those /arch switches ‘monotonic’: the capabilities of each switch in the sequence {IA32, SSE, SSE2, AVX, AVX2} subsumes its predecessor (not sure I’ve explained this well – is it clear?)"

I did not find a mention of that in the wiki, please correct me if I am wrong. The outcome of that is that either the documentation should be improved, or the definitions should be set in a better way depending on the platform.

Comments (7)

  1. Klaus Iglberger

    Hi Fabien!

    Thanks for raising this issue. You have stumbled upon an unfortunate deficiency of the Visual Studio compiler. VS does not set the same preprocessor flags for the AVX(2) mode as other compilers. In fact, until VS2013 it was very difficult to make VS use AVX(2) in the first place. However, the situation has changed in VS2015, which now gives us an opportunity to update our code. We will use the opportunity of this issue to update Blaze accordingly. Thanks again for pointing us in this direction and sorry for the inconveniences,

    Best regards,

    Klaus!

  2. Log in to comment