Provide support for shared-memory parallelization based on HPX

Issue #5 resolved
Klaus Iglberger created an issue

Description

The Blaze library already supports three different kinds of shared-memory parallelization: OpenMP, C++11 threads, and Boost threads. However, due to its architecture Blaze can support any number of parallelization paradigms. Therefore it is appealing to integrate a shared-memory parallelization based on HPX. This will enable a direct comparison between OpenMP and HPX. Also, since HPX can also be used beyond shared-memory systems this will facilitate the step towards distributed-memory parallelization.

Tasks

  • introduce the shared-memory parallelization based on HPX
  • provide a full documentation for the feature
  • guarantee maximum performance for the HPX parallelization
  • run all test cases with the HPX parallelization

Comments (6)

  1. Hartmut Kaiser

    In one of our projects we would like to use Blaze for all internal matrix operations. For this we would however need the integration of Blaze with HPX such that all the parallelization is performed using HPX threads.

    We'd be happy to lend a helping hand with resolving this ticket but this would require some guidance on where to start and what is required for Blaze to be retargeted to a different threading subsystem. Where should we start?

  2. Klaus Iglberger reporter

    Hi Hartmut!

    Thanks a lot, this is great news. No one could resolve this issue better than the HPX experts.

    In Blaze the shared memory parallelization (SMP) is orthogonal to the operations. The entire SMP functionality is located in blaze/math/smp. In order to get acquainted with the implementation details, please take a closer look at the OpenMP parallelisation in blaze/math/smp/openmp. A great starting point is to take a detailed look at <blaze/math/smp/openmp/DenseVector.h>. This header contains the entire OpenMP parallel assignment to dense vectors. All that needs to be done is an implementation of the functions smpAssign(), smpAddAssign(), smpSubAssign(), smpMultAssign(), and smpDivAssign(). The parallel assignment to dense matrices is very similar (see <blaze/math/smp/openmp/DenseMatrix.h>). Supporting functionality (as for instance getNumThreads(), setNumThreads(), etc.) is located in <blaze/math/smp/openmp/Functions.h>.

    I would expect that the HPX-based parallelization is realised in blaze/math/smp/hpx. In addition, the wrapping header files in blaze/math/smp need to be adapted. You can use the BLAZE_HPX_PARALLEL_MODE symbol to activate the HPX parallelization from the command line and select the according header files.

    I hope this gives you an idea where to start. If you have questions I'm happy to assist. Of course I'll do also my part and for instance adapt the documentation and wiki accordingly.

    Best regards,

    Klaus!

  3. Klaus Iglberger reporter

    Summary

    The feature has been implemented (see pull request #15), tested, and documented as required. It is immediately available via cloning the Blaze repository and will be officially released in Blaze 3.3.

    HPX Parallelization

    The fourth and final shared memory parallelization provided with Blaze is based on HPX.

    HPX Setup

    In order to enable the HPX-based parallelization, the following steps have to be taken: First, the BLAZE_USE_HPX_THREADS command line argument has to be explicitly specified during compilation:

    ... -DBLAZE_USE_HPX_THREADS ...
    

    Second, the HPX library and depending libraries such as Boost, hwloc, etc. have to be linked. And third, the HPX threads have to be initialized by a call to the hpx::init() function (see the HPX tutorial for further details). These three actions will cause the Blaze library to automatically try to run all operations in parallel with the specified number of HPX threads.

    Note that the OpenMP-based, C++11 thread-based, and Boost thread-based parallelizations have priority, i.e. are preferred in case either is enabled in combination with the HPX thread parallelization.

    The number of threads used by the HPX backend has to be specified via the command line:

    ... --hpx:threads 4 ...
    

    Please note that the Blaze library does not limit the available number of threads. Therefore it is in YOUR responsibility to choose an appropriate number of threads. The best performance, though, can be expected if the specified number of threads matches the available number of cores.

    In order to query the number of threads used for the parallelization of operations, the getNumThreads() function can be used:

    const size_t threads = blaze::getNumThreads();
    

    In the context of HPX threads, the function will return the actual number of threads used by the HPX subsystem.

    HPX Configuration

    As in case of the other shared memory parallelizations Blaze is not unconditionally running an operation in parallel (see for instance OpenMP Parallelization). Only in case a given operation is large enough and exceeds a certain threshold the operation is executed in parallel. All thresholds related to the HPX-based parallelization are contained within the configuration file <blaze/config/Thresholds.h>.

    Please note that these thresholds are highly sensitiv to the used system architecture and the shared memory parallelization technique. Therefore the default values cannot guarantee maximum performance for all possible situations and configurations. They merely provide a reasonable standard for the current CPU generation. Also note that the provided defaults have been determined using the OpenMP parallelization and require individual adaption for the HPX-based parallelization.

  4. Log in to comment