Nested Threads using BLAZE

Issue #421 resolved

Esmail Abdul Fattah created an issue 2021-11-04

Hello!

Why blaze doesn’t provide nested parallelization? If it will be always the case, any recommendation of another dense algebra libraries?

After a certain maximum of threads, performance of blaze becomes almost stable.

What is the highest dimension blaze matrices and decomposition can hold? Can I use eigen decomposition easily for dimension = 1 million?

Thank you!

Comments (10)

Klaus Iglberger
Hi Esmail!

Why blaze doesn’t provide nested parallelization?

The only limitation is that you cannot use Blaze with OpenMP within an OpenMP environment. Blaze just wouldn’t know how many threads to use for itself. However, it is very well possible to use Blaze with C++ threads or HPX within an OpenMP environment.

If it will be always the case, any recommendation of another dense algebra libraries?

For OpenMP that will always be the case because OpenMP just isn’t prepared for that kind of application. However, there is no limitation for any other combinations. I also cannot recommend any other LA library because Blaze (to my best knowledge) is the only C++ LA library that provides full parallelization.

After a certain maximum of threads, performance of blaze becomes almost stable.

The performance is of course highly depending on the size of the problem, the number of cores available and the used algorithm. Using a reasonable combination of problem size and threads Blaze indeed performs admirably (and even provides superscalar scaling).

What is the highest dimension blaze matrices and decomposition can hold? Can I use eigen decomposition easily for dimension = 1 million?

Blaze builds on the decomposition algorithms provided by your chosen LAPACK library, which are expected to perform very well. Whether or not the decomposition runs in parallel also depends on the LAPACK library. Given a machine with enough memory, an eigen decomposition of a matrix of 1Mx1M elements is possible, but will still take some time to compute (even in parallel).

I hope this answers the questions,

Best regards,

Klaus!
- 2021-11-04T18:50:41+00:00
Esmail Abdul Fattah reporter
Thank you for your quick reply!!!

My concern:

For example, I want to call a function 10 times in parallel, and use 4 threads for each function. Can BLAZE do this using C++ threads?

‌
- 2021-11-05T17:03:33+00:00
Klaus Iglberger
Hi Esmail!

Thanks for clarifying. No, that’s something Blaze will not do. For every function call Blaze will try to use all available threads to speed up the computation. That is because Blaze cannot know the optimum setting for a specific call and because Blaze only knows about the maximum number of threads. Since I’m not convinced that this kind of nested parallelism would generally help with performance, this will not change in the foreseeable future.

Best regards,

Klaus!
- 2021-11-07T07:11:44+00:00

Esmail Abdul Fattah reporter

Thank you Klaus! This is helpful!
Would it be possible to use intel mkl library with BLAZE Library? Here is part of my makefile.

CC = ccache icc
MainObjects = main.o
OptObjects = GLP_splines.o GLP_libraries.o GLP_functions.o GLP_Data.o GLP_DisUtensils.o GLP_Recipes.o

parallized = -DBLAZE_USE_CPP_THREADS -fopenmp
blaze_cond = -std=gnu++20 -O2 -g -DNDEBUG -mavx -pthread -Iblaze

MKLROOT = /opt/intel/oneapi/mkl/latest
mkllink = -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_lp64.a ${MKLROOT}/lib/intel64/libmkl_gnu_thread.a ${MKLROOT}/lib/intel64/libmkl_core.a -Wl,--end-group -liomp5 -lpthread -lm -ldl

output: $(MainObjects) $(OptObjects)
    ${cc} ${blaze_cond} $(MainObjects) $(OptObjects) ${CFLAGS} ${mkllink} -o output ${parallized}

main.o: main.cpp 
    ${cc} -c ${blaze_cond} ${parallized} main.cpp ${CFLAGS}

What about the setting threads?

2021-11-08T17:20:29+00:00

Klaus Iglberger
Hi Esmail!

Yes, it is possible to use the Intel MKL with Blaze. Blaze even provides some special handling to adapt to the implementation details of the MKL. It should also pose no problem to use the parallel version of the MKL together with Blaze. However, I usually recommend to prefer either parallelization by means of Blaze or by means of the MKL. That is because I don’t expect that using both would help.

As a side remark: With the current setting of parallized Blaze would use the C++ threads. That setting has preference to using OpenMP.

Best regards,

Klaus!
- 2021-11-10T06:34:31+00:00
Esmail Abdul Fattah reporter
Hi Klaus,

Thank you for all previous responses.

One last question about this topic: nested parallelization. Do I expect BLAZE to work slower when using MPI?
Like running eigen for a matrix of size 5k takes almost 15 seconds on one process, whereas running the same on two processes: it takes 33 seconds each. I was expecting an increase in time but not that much.

Best Regards,

Esmail
- 2021-11-17T07:28:11+00:00
Esmail Abdul Fattah reporter
with g++ for one process works well: 14sec.

g++ -std=c++14 -O3 -DNDEBUG -mavx -pthread -fopenmp main.cpp GLP_libraries.cpp GLP GLP_functions.cpp GLP_Data.cpp -L/opt/intel/mkl/lib/intel64 -Wl,--start-group /opt/intel/oneapi/mkl/latest/lib/intel64/libmkl_intel_lp64.a /opt/intel/oneapi/mkl/latest/lib/intel64/libmkl_gnu_thread.a /opt/intel/oneapi/mkl/latest/lib/intel64/libmkl_core.a -Wl,--end-group -lgomp -lpthread -lm -ldl -o output -lstdc++

where as when g++ is replaced by mpicc: it takes 24sec. I have used: mpirun -np 1 ./output
- 2021-11-17T17:15:46+00:00
Klaus Iglberger
Hi Esmail!

That question is unfortunately outside my current experience. I never tried to use Blaze in a MPI context and never investigated any potential bottleneck with MPI. I wouldn’t expect a Blaze issue, though, but I suspect a resource sharing problem between the different parallelization strategies.

Best regards,

Klaus!
- 2021-11-18T06:10:10+00:00
Klaus Iglberger
- changed status to resolved
- 2021-11-18T06:10:26+00:00
Esmail Abdul Fattah reporter
Hi Klaus,

Blaze works well in MPI context!

Best Regards,

Esmail
- 2021-11-22T15:37:37+00:00
Log in to comment

Assignee: –

Type: enhancement

Priority: minor

Status: resolved

Votes: 0

Watchers: 1