Thread Utilization on row/row multiplication

Hi,

I'm testing out some basic functionality, and for some reason, I'm not seeing any thread utilization beyond one CPU.

I have compiled with -fopenmp and called both blaze::setNumThreads(60) and omp_set_num_threads(60) on a machine with 112 cores.

I'm multiplying rows in a matrix by each other of size 1000 each, so that I should expect that to exceed any thresholds in blaze/blaze/config/Thresholds.txt.

I have all BLAS macros set to 1 except for "BLAZE_BLAS_IS_PARALLEL 0", as my blas implementation is not parallelizing.

I've also attempted this using std::thread with [edit: similar results]. Actually, I see 60 threads spawned who are each using 0.1% CPU.

Might you be able to point me at where I'm going wrong? I'm calling the function 10,000-row, 10,000 column DynamicMatrix of floats.

Here is the code I'm using.

template<typename FloatType=float>
struct TanhKernelMatrix {
    const FloatType k_; 
    const FloatType c_; 
    template<typename MatrixType>
    blaze::SymmetricMatrix<MatrixType> operator()(MatrixType &a) const {
        blaze::SymmetricMatrix<MatrixType> ret(a.rows());
        for(size_t i(0); i < a.rows(); ++i) {
            for(size_t j(i); j < a.rows(); ++j) {
                ret(i, j) = dot(row(a, i), row(a, j)) + c_; 
            }   
        }   
        ret *= k_; 
        return tanh(ret);
    }   
    TanhKernelMatrix(FloatType k, FloatType c): k_(k), c_(c){}
};

Comments (3)