- removed comment
Carpet: Disable OpenMP parallelization of transport operators
This disables the OpenMP parallelization of Carpet's transport operators. I have observed that this leads to a significant speedup when many threads are used.
The likely reason is that the regions which are parallelized are typically small. A typical reason would e.g. be the lower x-boundary of one component of one grid variable. The OpenMP thread startup overhead and the cache misses caused by parallelizing this are then larger than any benefit.
See https://bitbucket.org/eschnett/carpet/pull-requests/18/carpet-disable-openmp-parallelization-of/diff.
In a next step (not proposed here), we can parallelize transport operators again, but at a much higher level, e.g. at the level of the loop over all variables that need to be prolongated. However, Carpet currently (and quite unfortunately) uses static variables to hold pointers to timers, and these are not thread-safe. (Neither the static variables nor the timer implementation itself are.) This needs to be either corrected or disabled, which will be the topic of a further pull request.
At this time, I ask some of those who are interested in performance in trying this pull request on a few iterations of a production simulation that uses many OpenMP threads and report back here.
Keyword:
Comments (11)
-
-
- changed status to open
- removed comment
-
- removed comment
@sbrandt: I believe this is the ticket Erik mentioned in the call on Monday 2017-08-28 .
-
- removed comment
Ian: did you have time to run a simulation?
-
- removed comment
No. I will try to do so today.
-
- removed comment
ping
-
- removed comment
I am unable to compile this branch. I am using master of the ET, and the eschnett/no-openmp branch of Carpet. I am getting errors
COMPILING arrangements/CactusBase/Boundary/src/Check.c COMPILING arrangements/CactusBase/Fortran/src/cctk_Timers.F90 COMPILING configs/noopenmp/bindings/build/AEILocalInterp/cctk_ThornBindings.c COMPILING arrangements/CactusBase/Fortran/src/cctk_Types.F90 Creating /home/ianhin/Cactus/Optimisation/configs/noopenmp/lib/libthorn_IOUtil.a COMPILING arrangements/CactusBase/Fortran/src/cctk_Version.F90 /bin/sh: /home/ianhin/Cactus/Optimisation/configs/noopenmp/lib/libthorn_IOUtil.a.objectlist: No such file or directory make[3]: *** [/home/ianhin/Cactus/Optimisation/configs/noopenmp/lib/libthorn_IOUtil.a.objectlist] Error 1 make[2]: *** [/home/ianhin/Cactus/Optimisation/configs/noopenmp/lib/libthorn_IOUtil.a] Error 2 make[1]: *** [/home/ianhin/Cactus/Optimisation/configs/noopenmp/lib/libthorn_IOUtil.a] Error 2 COMPILING arrangements/CactusBase/Fortran/src/cctk_WarnLevel.F90 COMPILING arrangements/CactusBase/Fortran/src/util_Table.F90 COMPILING arrangements/CactusBase/Fortran/src/paramcheck.F90 Creating /home/ianhin/Cactus/Optimisation/configs/noopenmp/lib/libthorn_TensorTypes.a /bin/sh: /home/ianhin/Cactus/Optimisation/configs/noopenmp/lib/libthorn_TensorTypes.a.objectlist: No such file or directory make[3]: *** [/home/ianhin/Cactus/Optimisation/configs/noopenmp/lib/libthorn_TensorTypes.a.objectlist] Error 1 make[2]: *** [/home/ianhin/Cactus/Optimisation/configs/noopenmp/lib/libthorn_TensorTypes.a] Error 2 make[1]: *** [/home/ianhin/Cactus/Optimisation/configs/noopenmp/lib/libthorn_TensorTypes.a] Error 2 Creating /home/ianhin/Cactus/Optimisation/configs/noopenmp/lib/libthorn_CycleClock.a /bin/sh: /home/ianhin/Cactus/Optimisation/configs/noopenmp/lib/libthorn_CycleClock.a.objectlist: No such file or directory make[3]: *** [/home/ianhin/Cactus/Optimisation/configs/noopenmp/lib/libthorn_CycleClock.a.objectlist] Error 1
Do I need a different version of the flesh/build system? The same Cactus tree builds fine on the master branch of Carpet.
-
- removed comment
This error went away when I deleted the configuration and rebuild it. There must be a bug in the build system, because an interrupted build (I may have cancelled it at one point) should not cause this sort of problem.
Anyway, I have timing results for the BBH runs with and without the patch. There is no appreciable difference, either in the total evolution time, or the prolongation timer.
-
- removed comment
According to Erik (private communication), the branch does not do anything unless certain parameters are set, which explains why the performance is unaffected. However, the parameters he mentioned are not in this version of the code, suggesting that this is not really the branch that should be tested. I suggest taking this ticket out of "review" state, since it looks like the code is not quite ready yet? (Trying to pull discussion back into this ticket and out of private email).
-
- changed status to resolved
- removed comment
Th code as is should not be committed due to the comments in the pull request. This same functionality will be included in a more comprehensive, future pull request.
-
- edited description
- changed status to closed
- Log in to comment
I will perform such a test on a BBH simulation.