- changed title to comet files in simfactory use one MPI rank per node
- removed comment
comet files in simfactory use one MPI rank per node
The current (https://bitbucket.org/simfactory/simfactory2/src/master/mdb/machines/comet.ini) uses 1 MPI rank per node:
max-num-threads = 24
num-threads = 24
This is usually not the best way to set things up, I would eg have expected that the default choice would be something like 1 MPI rank per NUMA domain. Given that, unless limited by communication overhead, we seem to obtain fastest per-node performance when using only MPI and no OpenMP (about a factor of 50% speedup on my 12 core workstation with 2 NUMA domains) if anyone is using Comet for production work and wants to contribute their machine description file that would be great.
Keyword: None
Comments (6)
-
reporter -
reporter Private conversation with users on Comet that are using it for production runs (in 2018 so I am a bit tardy reporting this) indicate that best performance was achieved when using 4 threads per MPI rank and 4 MPI ranks per node ie leaving 8 cores per node empty (Comet has 24 cores per node) gave best results.
-
reporter Changed to use 6 threads per rank in git hash 5ea0f7b "comet.ini: use 6 threads by default" of simfactory2 as stopgap measure. This needs to be properly measured with a couple of test runs to find a good setting for runs typical for comet.
-
reporter - edited description
- changed status to open
-
reporter Related / superseeded by
#2436 -
reporter - changed status to resolved
Resolved in #24356 in git hash 2c7baa92 "comet: manually choose sensible thread binding in ibrun" of simfactory2
- Log in to comment