- removed comment
NaNChecker should not use integer division
I profiled the NaNChecker, and it seems as if it spent two thirds of its time performing integer division. I assume that these are the integer divisions where the code re-calculates the (i,j,k)
triple from a linear index. This part of the code could easily be rewritten.
Keyword:
Comments (7)
-
-
reporter - removed comment
Here is an excerpt of the most expensive routines of a unigrid Cowling benchmark run:
+ 18.10% 17.83% cactus_sim cactus_sim [.] ML_ADMConstraints::ML_ADMConstraints_evaluate_Body + 13.39% 0.35% cactus_sim cactus_sim [.] void HydroToyOpenMP::tiled_task_loop + 6.40% 6.21% cactus_sim cactus_sim [.] NaNChecker::CHECK_DATA<double> + 6.34% 2.08% cactus_sim libc-2.17.so [.] __memset_sse2
- ADMConstraints is much more expensive than it should be. I don't know yet why, but I also see it is not being vectorized.
- The call to memset comes mostly from within Carpet, and is likely due to poisoning that I activated.
- I don't show I/O here that is also taking significant time, but that is fine since the benchmark run lasts only for ten iterations, so output is relatively more expensive. Ditto for setting up initial conditions.
- The second column shows how much time is spent in the particular routine itself. Since the hydro implementation calls subroutines, that time is very small.
-
ADMConstraints is likely slow b/c we reduce its optimization setting to avoid Intel compiler failures. See https://trac.einsteintoolkit.org/ticket/1995
-
This pull request https://bitbucket.org/cactuscode/cactusutils/pull-requests/22/nanchecker-reduce-cost-of-integer-division/diff improves on the current situation in that it only computes the divisions if a NaN is found (which should be seldom).
Note that b/c of the omp parallel (and the fact that the dimensionality of the grid array is unknown) one cannot keep a running ijk[] array to trade divisions for a couple of if statements.
Please review.
-
- changed status to open
- edited description
-
Unless objected I will apply this after 2019-09-26.
-
- changed status to resolved
Applied as git hash 97aa6a0 "NaNChecker: reduce cost of integer division during check" of cactusutils
- Log in to comment
Do you have numbers to show for a typical run how much time is spend in NaNchecker (compared to eg the McLachlan RHS)? Not that spending 2/3 of the time doing integer division is a good thing of course.