This disables the OpenMP parallelization of Carpet's transport operators. I have observed that this leads to a significant speedup when many threads are used.
The likely reason is that the regions which are parallelized are typically small. A typical reason would e.g. be the lower x-boundary of one component of one grid variable. The OpenMP thread startup overhead and the cache misses caused by parallelizing this are then larger than any benefit.
In a next step (not proposed here), we can parallelize transport operators again, but at a much higher level, e.g. at the level of the loop over all variables that need to be prolongated. However, Carpet currently (and quite unfortunately) uses static variables to hold pointers to timers, and these are not thread-safe. (Neither the static variables nor the timer implementation itself are.) This needs to be either corrected or disabled, which will be the topic of a further pull request.
At this time, I ask some of those who are interested in performance in trying this pull request on a few iterations of a production simulation that uses many OpenMP threads and report back here.