Adding runtime configuration to blaze

Issue #273 wontfix
Thorsten Schmitz created an issue

Hi

At the moment blaze can, as far as I know, only be configured at compile time using #define's to set configurations like padding or cache size.

However, the machine that is used for compilation isn't always the same as the machine running the application. It could be moved to a server or deployed to many people.

Therefore the configuration might not only be not optimal, but potentially even counterproductive to the performance.

Thus, I suggest making these values configurable at runtime using functions (like it's already possible for the number of threads). I don't know enough about blaze to say whether this is possible or what impact it might have on the performance. But a small impact would likely be compensated by more optimal configurations on the actual target machine.

Also, it would allow an autotuning function to be implemented that tries different configurations on the target machine rather than the machine used for compilation.

Best regards

Thorsten

Comments (1)

  1. Klaus Iglberger

    Hi Thorsten!

    Thanks for the suggestion. We understand the rational behind the request. However, we will not implement it for two reasons. First, some of the values have to be compile time constants since they are used in contexts where only compile time constant expressions can be used (e.g. BLAZE_USE_PADDING in the <blaze/config/Optimizations.h> header). Second, some other values would incur an significant runtime overhead if they would be changed to runtime values, especially for small vectors and matrices (e.g. BLAZE_CACHE_SIZE in the <blaze/config/CacheSize.h>).

    Please note that the cache size has only an impact on performance if the operations you use need almost exactly the amount of memory that fits in the largest cache available. As Blaze is essentially a code generator, which is supposed to generate compute kernels that are maximally efficient, this setting provides the means to maximize the performance for a very small size range.

    If you have to compile for different target architectures and you are indeed using operations that need almost exactly the cache size then please use the largest cache size of all the architectures you are working with. This will guarantee that the code will work very efficient on both the machines with less and the machines with more memory.

    Thanks again for taking the time to create this issue,

    Best regards,

    Klaus!

  2. Log in to comment