The first shared memory parallelization provided with Blaze is based on HPX.
In order to enable the HPX-based parallelization, the following steps have to be taken: First, the
BLAZE_USE_HPX_THREADS command line argument has to be explicitly specified during compilation:
... -DBLAZE_USE_HPX_THREADS ...
Second, the HPX library and depending libraries such as Boost, hwloc, etc. have to be linked. And third, the HPX threads have to be initialized by a call to the
hpx::init() function (see the HPX tutorial for further details). These three actions will cause the Blaze library to automatically try to run all operations in parallel with the specified number of HPX threads.
Note that the HPX-based parallelization has priority over the OpenMP-based, C++11 thread-based, and Boost thread-based parallelizations, i.e. is preferred in case multiple parallelizations are enabled in combination with the HPX thread parallelization.
The number of threads used by the HPX backend has to be specified via the command line:
... --hpx:threads 4 ...
Please note that the Blaze library does not limit the available number of threads. Therefore it is in YOUR responsibility to choose an appropriate number of threads. The best performance, though, can be expected if the specified number of threads matches the available number of cores.
In order to query the number of threads used for the parallelization of operations, the
getNumThreads() function can be used:
const size_t threads = blaze::getNumThreads();
In the context of HPX threads, the function will return the actual number of threads used by the HPX subsystem.
As in case of the other shared memory parallelizations Blaze is not unconditionally running an operation in parallel (see for instance OpenMP Parallelization). Only in case a given operation is large enough and exceeds a certain threshold the operation is executed in parallel. All thresholds related to the HPX-based parallelization are contained within the configuration file
Please note that these thresholds are highly sensitiv to the used system architecture and the shared memory parallelization technique. Therefore the default values cannot guarantee maximum performance for all possible situations and configurations. They merely provide a reasonable standard for the current CPU generation. Also note that the provided defaults have been determined using the OpenMP parallelization and require individual adaption for the HPX-based parallelization.