blaze vs. eigen

Issue #266 resolved
Thorsten Schmitz created an issue

Hi,

I’m trying to decide which library I’m going to use for future projects, but it’s difficult to find information to compare blaze and eigen3.

The benchmark here is the only one I could find so far. It’s documentation says “All libraries are benchmarked as given, but configured such that maximum performance can be achieved.“ What exactly does that mean? Is eigen measured with a BLAS backend? If so, did you use the same backend with blaze?

Aside from the benchmark, do you have other information about the differences between blaze and eigen?

Regards

Thorsten

Comments (13)

  1. Mikhail Katliar

    Hi Thorsten,

    I am a user of Blaze since 2016. I had the same question a while ago, and would like to share my experience.

    For me it was not an obvious decision. I even was supporting two back-ends (Eigen3 and Blaze) in my software, such that I could switch between the two. Although Eigen3 and Blaze are similar in terms of what they can do, after about 2 years, I have finally made my choice in favor of Blaze. My reasons were:

    1. I hate the way matrices are initialized in Eigen3 (using operator<<). This looks very unnatural. In Blaze, you can do it nicely with initializer lists.
    2. In general, I like the Blaze syntax more than the Eigen3 syntax (free functions vs member functions). Also, Blaze uses latest C++ standard and makes an impression of being more “modern”.
    3. The Blaze developer(s) react(s) super fast on issues/questions. Bug fixes are quick. I don’t know how it is with Eigen3, I never reported bugs of requested features in Eigen3.
    4. For my purposes (embedded real-time control), Blaze has better performance compared to Eigen3. This can be however highly platform- and task-specific.

    For the last year or so, I did not think about going back to Eigen3. Until I realized Blaze doesn’t have matrix exponentiation 🙂 #74

    Hope it helps.

  2. Klaus Iglberger

    Hi Thorsten!

    Thanks a lot for your interest in Blaze.

    @Mikhail Katliar : Thanks a lot for sharing your experiences!

    Unfortunately there is no direct comparison between Eigen and Blaze, but I can fill in details about the performance comparison. Eigen is benchmarked with its own matrix multiplication kernel(s) (the preferred mode), Blaze is used in combination with a BLAS library (also preferred). However, Blaze switches between its several custom kernels for small matrices to the provided BLAS library for large matrices. You can freely configure the threshold for this.

    Blaze is focused on performance, ease of use, and extensibility. That means that it should be easy for you to integrate your own functionality in a consistent and natural way. Additionally, as @Mikhail Katliar remarked, we try to provide help and fixes as quickly as possible.

    In the end you’ll have to decide on the available feature set, the performance provided for the operations you require, and your preference to syntax (which is an entirely subjective matter).

    I hope this helps. We would be very happy if you decide to use Blaze,

    Best regards,

    Klaus!

  3. Thorsten Schmitz reporter

    Thanks for your information. It does help a lot, although I haven’t decided yet but came up with a few more questions.

    1. I would like to use the library on Android. Would there be any limitations with blaze I need to consider (other than needing a compatible blas/lapack backend) on other systems than win/mac/linux ?
    2. From what I have seen blaze does not support gpu’s, and there are no plans to add this in the near future, is that correct?
    3. Is there a field (not just a single method/problem) where you would say that blaze is not a good choice but eigen might be and vice versa?

    At the moment I tend slightly towards blaze, I will probably try it out myself.

    Best regards

    Thorsten

    ps: You should consider adding these information e.g. to the faq’s. I once asked on StackOverflow, they immediately deleted it as “to broad“, and there isn’t much information on this topic anywhere else.

    pps: Looking through the wiki I saw that there are four ways to use parallelization. Is one way preferable to the others or are they all roughly equal in performance?

  4. Klaus Iglberger

    Hi Thorsten!

    1. Blaze should also work on Android. Since it is not actively tested on Android, though, it would be great to get feedback about any problem so that we can resolve the it.
    2. At this point Blaze does not support GPUs, but there is a Blaze project in the making that provides GPU support (see https://github.com/STEllAR-GROUP/blaze_cuda/)
    3. It probably entirely depends on the available features and what you need for your application.
    4. I will consider adding some additional information about the benchmarks to the FAQ, but I will likely not add an entire comparison about both libraries.
    5. They are roughly similar and you can choose the one that is available to you. The fact that there are four different kinds of parallelization shows that Blaze has a sound architecture that allows extensions of any kind.

    Best regards,

    Klaus!

  5. Johannes Czech

    Hi @Thorsten Schmitz ,

    for my use-case I was looking for a fast numerical library in C++ for porting our chess engine ( https://github.com/queensgambit/crazyara), previously developed in pure python to C++, so I was looking for a library which provides similar functionality as numpy. (The new engine including its source code will be released soon.)

    In the application only vector operations with a variable length mostly between 30-130 are used.

    I created several benchmarks to compare different libraries: numpy 1.14.6, Xtensor 0.20.8, Eigen library 3.3.7 and the current Blaze-Dev-Master.

    Here’s an example of running vector additions of size 80 for 1e7 iterations.

    Code was built in release mode and run on an Intel® Core™ i5-8250U CPU @ 1.60GHz × 8, Ubuntu 18.04.2 LTS

    typedef float real;
    
    size_t size = 80;
    size_t it = 1e7;
    
    blaze::DynamicVector<real> res_blaze(size);
    
    std::chrono::steady_clock::time_point start_blaze = std::chrono::steady_clock::now();
    for (size_t i = 0; i < it; ++i) {
    res_blaze += blaze_vec;
    }
    
    std::chrono::steady_clock::time_point end_blaze = std::chrono::steady_clock::now();
    std::cout << "Elapsed time blaze: " << std::chrono::duration_cast<std::chrono::milliseconds>(end_blaze - start_blaze).count() << "ms" << std::endl;
    

    The benchmark results are clearly in favour of the blaze library on all operations I tested so far.

    Results:

    Variable Type Floating:
    
    Addition:
    Elapsed time blaze:     88ms
    Elapsed time eigen:     123ms
    Elapsed time xtensor:   828ms
    Elapsed time numpy:     7095ms
    
    Multiplication
    Elapsed time blaze:     88ms
    Elapsed time eigen:     248ms
    Elapsed time xtensor:   809ms
    Elapsed time numpy:     7073ms
    
    
    Variable Type Double:
    
    Addition:
    Elapsed time blaze:     172ms
    Elapsed time eigen:     239ms
    Elapsed time xtensor:   787ms
    Elapsed time numpy:     7095ms
    
    Multiplication
    Elapsed time blaze:     172ms
    Elapsed time eigen:     365ms
    Elapsed time xtensor:   808ms
    Elapsed time numpy:     7073ms
    

    (If you’re interested, I can provide the full source code for this benchmark.)

    Besides that, I was lacking the argmax() and argmin() operator for Blaze, so I implemented a naive function for it and was shocked that my simple implementation outperformed Eigen 3.3.7.

    https://bitbucket.org/blaze-lib/blaze/issues/256/is-argmax-and-argmin-missing-for-vector

    Now, argmax()and argmin() have been integrated in the Blaze library using an even better implementation.

    All in all, I can highly recommend Blaze so far. Having a softmax() operator and the flexibility to choose between DynamicVector, HybridVector, StaticVector is also quite useful.

    Regarding GPU usage, you should consider that it’s normally only more efficient to apply calculations there when you reach a certain matrix size or vector length, even in the case you’re writing pure CUDA C-Code. This is because for every GPU operation, you must initially transfer the input data to the GPU and later the result back to CPU, resulting in a more or less constant overhead.

    I suggest to try out some simple dummy calculations which are likely to occur in your use-case and then make your decision based on these results.

    Best regards,

    Johannes Czech

  6. Thorsten Schmitz reporter

    Thanks for both of your feedbacks.

    Interestingly, I looked at the blaze cuda project when you posted it here and it were already several month since the last commit. Looking right now there were a lot of new features, so it looks promising.

    I intend to use eigen/blaze amongst other things for finite element methods and neural networks (both just for my own interest, otherwise I would e.g. use something like TensorFlow). I would like to be able to use a gpu at some point because I expect there will be a time when my matrices become very big. But at the moment it’s not my number one concern. It will most likely be more than a year before I will actually encouter these cases.

    One thing that bothers me a little is that blaze needs a LAPACK backend to work, unlike eigen, which provides a fallback. I’m a little worried because I don’t know if e.g. OpenBLAS will compile for Android without any issues. On PC I intend to use Intel MKL as backend, and they don’t have a version for Android.

    The benchmark is very interesting. Though, like I said, I intend to use a BLAS backend. As far as I know, both eigen and blaze use BLAS for matrix-matrix multiplication. Seeing de big difference there I assume you used eigen either without or with a different backend?

    Like you suggested, I’m currently trying to set up eigen and blaze with MKL as backend to test both. Though CMake is really annoying, the FindBLAS module just won’t find MKL… 😠

  7. Johannes Czech

    You are correct that I forgot to link Eigen to the MKL backend,

    but apparently its performance improved for “Addition” but even got slighly worse on “Multiplication”, when enabling EIGEN_USE_MKL_ALL

    Variable Type Double:
    
    Addition:
    Elapsed time blaze:     172ms
    Elapsed time eigen:     239ms -> 192ms (with EIGEN_USE_MKL_ALL)
    Elapsed time xtensor:   787ms
    Elapsed time numpy:     7095ms
    
    Multiplication
    Elapsed time blaze:     172ms
    Elapsed time eigen:     365ms -> 381ms (with EIGEN_USE_MKL_ALL)
    Elapsed time xtensor:   808ms
    Elapsed time numpy:     7073ms
    

    You should take into consideration that I’m dealing with rather small vectors here and blaze allocates additional memory in some cases to improve performance which is likely the reason why it’s faster.

    ”In order to achieve the maximum possible performance the Blaze library tries to enable SIMD vectorization even for small vectors. For that reason Blaze by default uses padding elements for all dense vectors and matrices to guarantee that at least a single SIMD vector can be loaded. Depending on the used SIMD technology that can significantly increase the size of a StaticVector, StaticMatrix, HybridVector or HybridMatrix:” – https://bitbucket.org/blaze-lib/blaze/wiki/FAQ

  8. Johannes Czech

    An easy way to directly include the MKL-library into your project is to set it in the CMakeList.txt file:

    include_directories("/home/user/libs/intelMKL/mklml_lnx_2019.0.5.20190502/include")
    link_directories("$/home/user/libs/intelMKL/mklml_lnx_2019.0.5.20190502/lib")
    

    Alternatively, you can set the C_PATH environment variable or LD_LIBRARY_PATHaccordingly.

  9. Thorsten Schmitz reporter

    Thanks for your tipps. I did not expect these differences with Eigen with and without MKL.

    I managed to write a script to import MKL (to make it reusable). I also had to rewrite the CMakeLists.txt from Blaze because the recommended way in the docs didn’t work for me.

    Right now I can use Blaze and Eigen. I don’t know if Blaze would use MKL correctly yet, haven’t testet it. But when I set the define for Eigen it complains that mkl.h is missing. When I add the include directory in CMake I get collisions between mkl_lapack.h and some blaze headers in math\lapack\clapack.

    I also get a weird error from blaze, I will open a new issue about it. I have to use Clang to compile (instead of msc) because of this, and Clang doesn’t find OpenMP now…

  10. Thorsten Schmitz reporter

    For now I will use Blaze. Eigen only supports OpenMP. With the conflicts I get it’s easier to just use Blaze with C++11 or Boost Threads.

    Two questions:

    1: When using MKL or OpenBlas as backend, are there any configurations that need to or should be made, like eigens #define EIGEN_USE_MKL_ALL or something similar?

    2: For suggestions, do you prefer a seperate issue being opend (together or separately) or should I just put this here?

  11. Klaus Iglberger

    Hi Thorsten!

    1. In order to activate C++11 threads you will have to specify BLAZE_USE_CPP_THREADS on the command line. In order to activate Boost threads you will have to specify BLAZE_USE_BOOST_THREADS. In case you activate both, the C++11 threads are given priority.
    2. If you find any (apparent) bug, please open a new bug issue. If you have a suggestion for a new feature or want to extend an existing feature, please create a new proposal or enhancement issue. You can use our Issue Creation Guidelines to decide which kind of issue it should be. Additionally, you are welcome to provide pull requests yourself.

    I hope that Blaze works very well for you. Since this issue apparently helped to answer your questions, I consider it as Resolved.

    Best regards,

    Klaus!

  12. Log in to comment