OpenMP for Nested Matrix

I've been looking for a matrix library with block capability for a long time so I was happy to find Blaze. My simple test program (shown below) doesn't seem to accelerate using OpenMP. Ideally, this scenario should perform the individual (10x10) block operations on a single thread then use OpenMP over the rows of the outer most matrix. I seem to get no difference with OpenMP on or off. I've also tried setting all the SMP thresholds to 0 and different OMP_NUM_THREADS which report correctly in the output but do not produce any difference in execution speed. Any ideas ??? Thanks

#include <iostream>
#include <blaze/Math.h>

using blaze::DynamicMatrix;
using blaze::StaticMatrix;
using blaze::DynamicVector;
using blaze::rowMajor;
using blaze::columnVector;
int main() {
  std::cout << "Threads = " << blaze::getNumThreads() << std::endl;

  const int NROW = 3000;
  const int NCOL = 3000;

  DynamicMatrix< StaticMatrix<double,10,10,rowMajor>, rowMajor > A;

  DynamicVector< StaticVector<double,10,columnVector >, columnVector > x, y;

  // Resize
  A.resize(NROW,NCOL);
  x.resize(NCOL);
  y.resize(NROW);

  y = A * x;
}

Comments (6)