Provide SIMD support for the ARM architecture

Issue #49 new
Klaus Iglberger created an issue

Description

In recent years, the ARM architecture has become one of the most prevalent processor architectures. However, despite its widespread use the Blaze library currently only provide SIMD support for current and upcoming x86 architectures. Blaze should also provide SIMD vectorization for the ARM architecture.

Tasks

  • extend the existing SIMD module with support for the ARM architecture
  • verify that the expected performance is achieved
  • add new and extend existing test cases as necessary

Comments (14)

  1. Emil Fresk

    Are there any updates on this? I saw your talk on CppCon 2016 where you said ARM support was in the works and it would be awesome to have!

  2. Klaus Iglberger reporter

    Hi Emil!

    Unfortunately this feature is still not available. However, since adding a different kind of vectorization is pretty straightforward, this could be your chance to create a pull request. Pull request #11 gives you an impression of what needs to be done to provide SIMD support for ARM architectures. If you have questions, you are always welcome to ask.

    Best regards,

    Klaus!

  3. Emil Fresk

    Hi Klaus!

    Thanks for the hint! I took a look at the SIMD implementation, and it was quite straight forward, should be no real problem to add the intrinsics. It can be a fun evening project!

    One question, as I do not have an overview of the testing framework, is there a way to test the SIMD of one feature at a time to see that each is working (Abs ...)?

    Thanks! BR Emil

  4. Klaus Iglberger reporter

    Hi Emil!

    Take a look at the blazetest directory. It contains the entire test suite of the Blaze library. In order to test the SIMD functionality, you can go to blazetest/src/mathtest/simd, which contains the SIMD tests for all relevant data types (int, float, double, ...).

    The test suite is primarily written for Linux and MacOS. The first step is to fill out the Configfile in the blazetest directory. Then run the configure script```:

    ./configure
    

    It will create the necessary files and will enable you to use make in the blazetest/src/mathtest/simd directory. If you happen to use Windows I can send you an according CMakeLists.txt file.

    Please use BLAZE_NEON_MODE as the compilation switch for the neon vectorization. Also, please try to adhere to the existing formatting rules. Then I don't see any problems for a pull request.

    One more request: Please wait till the next push as this will touch a lot of the functionality that you will have to modify. Merging this later might prove to be difficult. The push should happen today or latest tomorrow.

    Best regards,

    Klaus!

  5. Emil Fresk

    Thanks for the instructions! I have the testing running, just waiting for all the commits until I start testing a little.

    BR Emil

  6. Klaus Iglberger reporter

    Hi Emil!

    The last two pushes have introduced the major changes of our 3.4 refactoring . Please feel free to fork now. Thanks for your patience,

    Best regards,

    Klaus!

  7. Klaus Iglberger reporter

    Hi Emil!

    With the last push we have added SIMD equality comparison functions (i.e. operator==() and equal(); see the commits 522cfc4 and ac45faf). We would be very grateful if you would also consider these functions when adding ARM support. Thanks a lot,

    Best regards,

    Klaus!

  8. Emil Fresk

    Hi Klaus!

    Sorry for the delay, I am currently finalizing my PhD thesis and will be away a bit. I have started the implementation but sadly have to pause until my thesis is done.

    I will be following this issue meanwhile if there are any comments/questions. Also, I got myself an nVidia TX1 for testing on, is there any other hardware you'd like to test on? It has NEON, but perhaps we need to test on something more as well.

    BR Emil

  9. Amin Yahyaabadi

    Any progress on this? I am looking for a mathematics library for ARM architecture. I've seen the Blaze has a great performance, is this applicable to ARM? Can I use Blaze for ARM?

  10. Klaus Iglberger reporter

    Hi Amin!

    Unfortunately there hasn't been any progress yet. But we are be willing to accept the contribution from a volunteer with ARM experience. Our expectation is that it would take approx. one day of work to introduce ARM support if you are familiar with the ARM intrinsics.

    Best regards,

    Klaus!

  11. Nils Deppe

    Hi Klaus!

    ARM support isn’t something we need right now, but while looking at using Sleef+Blaze for SIMD I remembered this issue. It looks like Sleef has support for ARM (and POWER9 too) and thought that might be the easiest way to ultimately add ARM support to Blaze. Either by you or somebody else 🙂 Anyway, just me “thinking out loud”

    Best wishes,

    Nils

  12. Klaus Iglberger reporter

    Hi Nils!

    Indeed, Sleef would be perfectly suited for that job. The disadvantage would be that in order to use ARM vectorization a user would have to install Sleef. But I believe that the advantage of having ARM support available definitely outweighs this disadvantage. Unfortunately, my problem at the moment is that I couldn’t test the implementation due the lack of a suited ARM CPU. But that will change as sooner than later when I upgrade my MacBook to one of the new releases with M1 processor.

    Best regards,

    Klaus!

  13. geoffrey4444

    Hi Klaus! Just wondering if you ever got that upgraded MacBook and if so, if adding Sleef+ARM might be in the cards at some point? I’m still just a beginner with ARM assembly, so I don’t think I yet would have the skillset to add this myself (but would certainly be happy to help test!)

    I also wonder whether Apple’s Accelerate framework might ever be viable for Blaze? Also just thinking out loud…I know that on ARM Macs this framework makes use of Apple’s secret AVX coprocessor for about a 2x performance boost over NEON on ARM Macs, but I don’t know enough about Blaze’s internals to know whether the Accelerate framework (the only supported way to make use of the Apple AVX coprocessor) has what you would need.

  14. Log in to comment