- changed status to open
- removed comment
Improve OpenMP parallelisation of SummationByParts
Issue #1023
resolved
The Intel compiler does not handle workshare constructs well. The attached patch replaces them by explicit loops, which execute faster. This makes a measurable difference on Hopper with 24 OpenMP threads.
This only modifies one operator; other operators could be treated in the same way.
Keyword:
Comments (3)
-
reporter -
- changed status to open
- removed comment
The patch looks ok. I didn't check all the indices really carefully (due to the length of the patch) and didn't run testsuites. Assuming tests show no difference between both versions using multiple threads I think it is ok to commit this. I'll leave testing to Erik. :)
-
reporter - changed status to resolved
- removed comment
Applied.
- Log in to comment