Simfactory job parameters are not consistent
According to simlib.py:
NUM_THREADS = threads per mpi proc (thread/mpi_proc)
PPN is supposed to be the number of processors, or cores requested from the scheduler per node. (core/node)
PPN_USED is supposed to be the number of cores actually used per node. (core/node)
NUM_SMT is supposed to be threads per core, and has a value of either 1 or 2 on all machines. (thread/core)
Thus
NODE_PROCS := PPNUSED * NUM_SMT/ NUM_THREADS
This follows since: NODE_PROCS = (cores/node)*(threads/core)/(threads/mpi proc) = mpi procs/node
Now here’s the problem.
NUM_PROCS = PROCS / NUM_THREADS
Now both --procs and --cores are two options for the same thing in simfactory. Thus “procs” is “processors” and “num_procs” is “number of processes.” That’s confusing, but that’s not the problem this ticket is about.
NUM PROCS is supposed to be the number of mpi processes. However, since --procs and --cores are the same thing:
NUM_PROCS = CORES / NUM_THREADS
= cores / (threads / mpi proc)
This is inconsistent. One would expect:
NUM_PROCS = NUM_SMT*CORES/NUM_THREADS
= (threads/core)*cores/(threads/mpi proc).
What if we define NUM_THREADS as cores/mpi proc? Well, apart from being confusing, that makes the NODE_PROCS calculation wrong.
So, unless I’m missing something, these parameters are not consistent, regardless of how you define them. They only work if NUM_SMT is one and cores and threads are interchangeable.
Is that always true?
The following machines have: max-num-smt = 2 are bethe, cori, philip, and supermucng. Looking at simlib.py, this parameter is not accessed! Instead, simlib.py only attempts to get ‘num-smt’, a parameter no ini file ever sets. Thus, the NUM_SMT is, essentially, always 1.
What to do?
My suggestion is that the definition of NUM_PROCS be ammended to be
NUM_PROCS = CORES * NUM_SMT / NUM_THREADS
so that cores*(threads/core)/(threads/mpi_proc)
And then I suggest that the feature is tried out on one of the above 4 machines by changing max-num-smt to num-smt (note, however, that philip no longer exists).
Comments (6)
-
reporter -
reporter -
Not sure if
--total-threads
is more useful since that is not a quantity people normally think about. If one takes a look at what thempirun
/srun
/aprun
etc commands actually use: “processes” (= MPI ranks) and “allocated (logical ie the thing the OS counts is /proc/cpuinfo) cpus per process”. Ie they startn
processes which are then free to spread out ont
threads each (or not, if they only need more memory).
So a more useful new option might be--ranks
or even--mpi-ranks
.“cores” or “nodes” may be what the queuing system allocates usually though the
mpirun
like options are more directly understandable to users and the queuing system options are derived from them (since each queuing system is set up slightly differently, even if they use eg SLURM). -
reporter I chose the term “total threads” because that’s what all of the documentation uses. You have stated previously that you think you don’t want to support the terminology in the documentation. There is a PR (the `fixsub` branch of simfactory) which implements
--total-threads
. It also introduces--nodes
and--node-procs
which (I think) is the functionality you are suggesting with--ranks
or--mpi-ranks
. It also introduces--test
to see whether the simfactory options you provide do what you think they do.I would be perfectly happy to name all these options something different, but I would like to see this PR accepted in some form, as both
--procs
and--cores
are not the things that they are named after. I have frequently been confused by the behavior of these parameters.An alternative to the
--total-threads
would be to simply get rid of the smt stuff, which was the source of much confusion for me. -
I suspect that
--total-threads
will still require smt information since--total-threads = --procs * --num-threads
so there is no extra information available. The smt stuff is really used for the queuing system since those sometimes need to know whether the code uses smt or not (since they want a ppn like value and that differs between smt and non-smt).I am not trying to suggest nodes and nodes procs, no. I am suggesting to use the names items used by
mpirun
which is “number of MPI ranks” and OS threads (OMP_NUM_THREADS
). smt will still be required (same reason as for--total-threads
). Ie re-use names and nomenclature that people who are using a cluster are already familiar with. -
reporter “--total-threads” is just a new name for the number of parallel thingies, which aren’t really “cores” and aren’t really “procs.”
As for “--nodes” and “--node-procs” it offers a way to directly specify what you want mpi to do, rather than doing a calucation based on cores/procs/total-threads. I suspect I’m not the only one who would find that more convenient.
- Log in to comment
I guess this also “works” if --cores and --procs both really mean total threads. In that case, I’m not sure which name is more misleading. We could mark both as deprecated and allow a --total-threads option. That would be the least invasive change.