Allow Customizing Blaze Config

Issue #103 wontfix
Nils Deppe created an issue

If I'm understanding the current documentation correctly there is no reasonable way to change the configurations in Blaze. By this I mean if Blaze is installed as a module on an HPC system I cannot go in and actually change the config/* headers. This can be partly resolved by replacing statements such as

#define BLAZE_USE_STRONG_INLINE 1

with

#ifndef BLAZE_USE_STRONG_INLINE
#define BLAZE_USE_STRONG_INLINE 1
#endif

The constexpr variables (like constexpr bool usePadding = true;) I'm not sure how to deal with. Maybe the easiest solutions is to move all these to macros so they can easily be customized.

Comments (8)

  1. Klaus Iglberger

    Hi Nils!

    Thanks for raising the issue. Since Blaze is focused on performance we want to make sure that it is also suited for HPC installations.

    We understand your general problem of configurability. However, we would not expect a user of an HPC system to change the configuration of the module (e.g. usePadding, BLAZE_INLINE_STRONG_INLINE, ...). Instead, we would expect that it is possible and reasonable to set the configuration once for all involved HPC systems and thus provide the users with the best configuration for the given system(s). This is the approach already taken by several computing centers.

    The configuration files are supposed to provide some customizability for the desired target platform (e.g. HPC system vs. low resource system), mode of operation (debug vs. release), and several convenience switches (defaultStorageOrder, ...). For the purpose of a module, most of these switches should have a canonical setting and don't need to be adapted by users. For instance, we would expect the mentioned usePadding switch to be set to true on your system since on an HPC system one should be willing to trade a little more memory for much more performance (up to a factor of 5 for small vectors and matrices). We would also expect inlining to be enabled.

    From our point of view there are two switches that are more debatable. The first switch is the cacheSize value in the <blaze/config/CacheSize.h> header. In case you have a single system the value is adapted to the specifics of the system and never changes. In case you have several systems, it is reasonable to set cacheSize to the smallest L2 (!) cache level size of all systems. The performance impact on systems with larger cache sizes is negligible.

    The second switch is the BLAZE_BLAS_INCLUDE_FILE macro in the <blaze/config/BLAS.h> header, which allows you to choose a specific BLAS library. By default, Blaze only uses BLAS for large matrix multiplications. From our experience it does hardly matter which BLAS library is used for that task, since for large matrices the performance of all libraries is at the same high level. For that reason you can choose which BLAS library is most suited and/or convenient on your systems. Users merely have to link the according BLAS library (or libraries) to their executables.

    Since you have chosen the issue type Bug and Blocker we are bound to resolve this issue immediately. Could you please provide us with more informations which configuration values/files you feel users have to change in order to be able to work with Blaze?

    Best regards,

    Klaus!

  2. Nils Deppe reporter

    Hi Klaus,

    Thanks for the detailed reply!

    I definitely understand where you're coming from, but the issue for us is that exactly none of the HPC systems we use here in the US have Blaze installed. I added Blaze to spack so it's easy for people to install in their local directory, which is the approach we are taking with our code. That is, rather than packaging libraries that aren't available on the systems in the source code, we add them to be distributed via Spack. Because your talks and benchmarks have really impressed me I added Blaze to Spack to be one of these libraries. Hopefully this will increase user adoption as well. We do have what might be a slightly "different" use for Blaze in that we want to use it for the non-vector-matrix operations, e.g. addition, element-wise multiplication, etc. and so being able to tune some of the parameters for this application is something we'd like to be able to easily do. Blaze is really cool with the ability to vectorize all the expressions, something that I had actually been planning on working out myself before finding Blaze :)

    The flags that we need to be able to make Blaze useful being distributed via Spack that I see immediately are:

    • BLAZE_BLAS_INCLUDE_FILE
    • cacheSize
    • BLAZE_MPI_PARALLEL_MODE (seems like something that is user-application specific...)
    • usePadding
    • BLAZE_USE_SHARED_MEMORY_PARALLELIZATION (we use measurement based dynamic load balancing and cannot have a second threading portion happening in our code)
    • BLAZE_USE_VECTORIZATION (users, definitely we do, care about being able to easily compare performance between vectorized and non-vectorized code. This is useful when writing grant applications)

    I believe most of what I've mentioned would be just adding the #ifndef to allow use configuration. However, I disagree with your approach as a whole. Yes, the admins should configure Blaze so that it generally performs best on the system but given how specific HPC users' needs are in my experience there is almost always some user tweaking necessary to get the best performance for the particular application so I hope that in the long term all Blaze configs that a user could care about can be overridden by a #define before including Blaze. One approach to the constexpr variables could be to wrap them in an #ifndef and then have the user specify both #define BLAZE_OVERRIDE_DEFAULT_SETTING_NAME and the constexpr variable they want to use instead. Thanks for looking at this so quickly!

    Best,

    Nils

  3. Klaus Iglberger

    Hi Nils!

    Many thanks for the explanations. Now it's easier to understand your motivation and intentions.

    We agree that there is value to improving the configurability of Blaze. However, we don't see this as a bug (in the best case this is an unfortunate shortcoming) and it also isn't a blocker (a major problem that keeps most people from working with Blaze). So let's agree on the following approach: First, we relabel this issue as Proposal with a Major priority. Second, we will promise to focus on this issue as quickly as possible. Third, we will figure out a way to configure the following values/settings via command line and/or #define:

    • BLAZE_BLAS_INCLUDE_FILE: We believe that this value is not strictly required and can be removed entirely. This may take some time, though.
    • cacheSize: It should be easily possible to configure this value via command line or #define. In case nothing is defined, the configuration from file is used.
    • usePadding: Same approach as for cacheSize.
    • BLAZE_USE_VECTORIZATION: Can be done, but the naming will change.
    • BLAZE_MPI_PARALLEL_MODE: Same as above, but the naming might stay the same.

    From your explanation, the following setting doesn't seem to be an issue:

    • BLAZE_USE_SHARED_MEMORY_PARALLELIZATION: This setting alone has no direct effect on the parallelization. If set to false it merely prohibits all kinds of shared memory parallelization. In order to activate any kind of shared memory parallelization it is necessary to set BLAZE_USE_SHARED_MEMORY_PARALLELIZATION to true and additionally specify something on the command line (-fopenmp for OpenMP, BLAZE_USE_CPP_THREADS for C++ threads, etc.). Under these conditions, is it ok from your perspective to leave this value as is?

    We hope that it is ok from your point of view that we relabel this issue. Also, we would like to learn your opinion on the BLAZE_USE_SHARED_MEMORY_PARALLELIZATION setting (now that we have given a more detailed explanation).

    In return for addressing this issue quickly we would like to get some feedback on problems with Blaze on US HPC installations. This should provide us with insight in order to further improve the library. Hopefully, this sounds like a good deal.

    Best regards,

    Klaus!

  4. Nils Deppe reporter

    Hi Klaus,

    I guess the blocker status is debatable. Having to modify library internals to change "configuration options" I think is a blocker, but it is not my library :) I was hoping an easy workaround would be to exploit include guards, but it appears as though the config header files do not have include guards (this should also probably be a bug).

    I don't understand why anything needs to be configured by the command line. What I was hoping to be able to do is have a wrapper header that looks something like:

    #pragma once
    
    #define BLAZE_MPI_PARALLEL_MODE 1
    
    // Use header guard manipulation to override optimization options
    // (should work for now, but file has no header guards)
    #define _BLAZE_CONFIG_OPTIMIZATION_H_
    constexpr bool usePadding = false;
    constexpr bool useStreaming = true;
    constexpr bool useOptimizedKernels = true;
    
    #include <blaze/Blaze.h>
    

    This would allow users to easily customize Blaze and be able to find all the customizations in one file. This is what I've done with other header-only libraries and really like how it works. What do you think?

    With regards to BLAZE_USE_SHARED_MEMORY_PARALLELIZATION, no, I don't think it is acceptable to have a library internally dictate whether it will parallelize with OpenMP or not depending on if OpenMP was linked. Parallelization with threads should certainly be a user configurable option. For example, we link with OpenMP in our code but have no desire to have any matrix manipulations ever be parallelized, there's enough work already to keep all cores busy. Basically, I think if you do not allow this to be easily configured by the user (by easily configured I mean a #define BLAZE_USE_SHARED_MEMORY_PARALLELIZATION 0 before including Blaze) you are marketing yourself to be a library that cannot be integrated into a large project because the library will only be useful for small executables that only do one or two matrix multiplications, not for a complex heterogeneous computing environment.

    Yes, I'd be very happy to provide feedback as we integrate Blaze into our code. We have access to a quite large variety of systems here in the US, ranging from 5 year-old Sandy Bridge machines to new KNL machines so having success across a variety of different architectures is crucial to us and we're hopeful Blaze will make this a lot easier :)

    Best, Nils

  5. Klaus Iglberger

    Hi Nils!

    Thanks for the code example. If this is what you have in mind, then all you need is already in place. All that needs to be done in order to make your example work is to replace ..._CONFIG_... with ..._SYSTEM_.... The following code example shows the necessary code to override the settings of the mentioned six switches (and several more since all settings from the according files have to be set):

    #pragma once
    
    // Overriding the settings from the <config/CacheSize.h> header
    #define _BLAZE_SYSTEM_CACHESIZE_H_
    constexpr size_t cacheSize = 6291456UL;
    
    // Overriding the settings from the <config/Optimizations.h> header
    #define _BLAZE_SYSTEM_OPTIMIZATIONS_H_
    constexpr bool usePadding = false;
    constexpr bool useStreaming = true;
    constexpr bool useOptimizedKernels = true;
    
    // Overriding the settings from the <config/BLAS.h> header
    #define _BLAZE_SYSTEM_BLAS_H_
    #define BLAZE_BLAS_MODE 1
    #define BLAZE_USE_BLAS_MATRIX_VECTOR_MULTIPLICATION 0
    #define BLAZE_USE_BLAS_MATRIX_MATRIX_MULTIPLICATION 1
    #define BLAZE_BLAS_IS_PARALLEL 0
    #define BLAZE_BLAS_INCLUDE_FILE <cblas_openblas.h>
    
    // Overriding the settings from the <config/Vectorization.h> header
    #define _BLAZE_SYSTEM_VECTORIZATION_H_
    #define BLAZE_USE_VECTORIZATION 1
    
    // Overriding the settings from the <config/SMP.h> header
    #define _BLAZE_SYSTEM_SMP_H_
    #define BLAZE_USE_SHARED_MEMORY_PARALLELIZATION 0
    
    // Overriding the settings from the <config/MPI.h> header
    #define _BLAZE_SYSTEM_MPI_H_
    #define BLAZE_MPI_PARALLEL_MODE 0
    
    // Include the Blaze header
    #include <blaze/Blaze.h>
    

    We hope that for now this is sufficient for your purposes.

    Best regards,

    Klaus!

  6. Nils Deppe reporter

    Hi Klaus,

    This works wonderfully! Thanks! Did I miss it in the documentation or does it need to be added? I was expecting to find this info in the config files section but I can't find it. Maybe it could be added there or a link to where it currently is added to the config files section? A word of caution that overriding any sys admin chosen configurations could adversely affect performance is probably good :)

    Thank you very much for the help!

    Best,

    Nils

  7. Klaus Iglberger

    Hi Nils!

    We are glad this solution fits your needs :-)

    We have to thank you for pointing out a major problem of Blaze on HPC systems. We will definitely update the documentation in this regard (don't worry, you didn't miss anything), but we will also try to provide a more advanced configuration approach (see issue #104).

    In case you encounter other problems with Blaze in the future, please get in touch with us again,

    Best regards,

    Klaus!

  8. Nils Deppe reporter

    Hi Klaus,

    You're very welcome and issue #104 states what I had in mind in a much clearer manner. I'll be sure to ask any questions and make any feature requests as I integrate Blaze into our code :)

    Thanks again!

    Best,

    Nils

  9. Log in to comment