when running with both openmp and sufficiently many threads that hyperthreading threads are used, many tests using LoopControl (Cartoon, RotatingSymmety180, RotatingSymmetry90) fail.
This can be tracked down to disabling hyperthreading support in LoopControl (ie. turning of hyperthreading makes things work).
In particular on bethe with smt and 8 physical cores:
The Cartoon/test_cartoon_2.par test shows differences from the recorded results when run with 16 threads (but not with 8 threads). If I then go ahead and disable OMP in all ML source files but ML_BSSN_enforce and comment out the #include "loopcontrol.h", then the difference goes away. Adding back #include "loopcontrol.h" brings back the error. Some further experimenting with LoopControl's options shows that indeed the use_smt_threads option is what causes problems. If I turn it off things work fine even with a vanilla source tree. Otherwise relative differences are on the order 1e-7 and absolute 1e-11 (in momx_z_.xg). Without smt the results are identical to the stored values.
The issue only occurs in combination of OpenMP, vectorization and hyperthreading. The issue is independent of the compiler (both intel 13 and gcc 4.4 show the same behaviour), and vectorization (sse2) and many threads (up to 4 times the number of physical cores) works fine on non-smt machines.
I propose to disable LoopControl::use_smt_threads by default. Note that we cannot completely remove it since apparently for Vesta (a Blue Gene/Q) smt is required to get and multi-threading at all.