With the ET_2015_11 release, there is a severe performance problem on Stampede. This is when hwloc and SystemTopology are not activated. Activating these thorns causes simulations to run 8 times faster. This suggests that the affinity settings in simfactory for stampede are wrong. stampede-mvapich2.run has
export KMP_AFFINITY=norespect,compact # verbose
Is this correct? Looking at the output of "top", we see the expected 16 threads, but each is running at only 50%. There is no migration between cores, as far as we can tell. This 50% should be 100%, and this doesn't explain the factor of 8 slowdown, but it shows that there is something wrong.