Simfactory job parameters are not consistent

Create issue
Issue #2483 new
Steven R. Brandt created an issue

According to simlib.py:

NUM_THREADS = threads per mpi proc (thread/mpi_proc)

PPN is supposed to be the number of processors, or cores requested from the scheduler per node. (core/node)

PPN_USED is supposed to be the number of cores actually used per node. (core/node)

NUM_SMT is supposed to be threads per core, and has a value of either 1 or 2 on all machines. (thread/core)

Thus

NODE_PROCS := PPNUSED * NUM_SMT/ NUM_THREADS

This follows since: NODE_PROCS = (cores/node)*(threads/core)/(threads/mpi proc) = mpi procs/node

Now here’s the problem.

NUM_PROCS = PROCS / NUM_THREADS

Now both --procs and --cores are two options for the same thing in simfactory. Thus “procs” is “processors” and “num_procs” is “number of processes.” That’s confusing, but that’s not the problem this ticket is about.

NUM PROCS is supposed to be the number of mpi processes. However, since --procs and --cores are the same thing:

NUM_PROCS = CORES / NUM_THREADS

= cores / (threads / mpi proc)

This is inconsistent. One would expect:

NUM_PROCS = NUM_SMT*CORES/NUM_THREADS

‌ = (threads/core)*cores/(threads/mpi proc).

What if we define NUM_THREADS as cores/mpi proc? Well, apart from being confusing, that makes the NODE_PROCS calculation wrong.

So, unless I’m missing something, these parameters are not consistent, regardless of how you define them. They only work if NUM_SMT is one and cores and threads are interchangeable.

Is that always true?

The following machines have: max-num-smt = 2 are bethe, cori, philip, and supermucng. Looking at simlib.py, this parameter is not accessed! Instead, simlib.py only attempts to get ‘num-smt’, a parameter no ini file ever sets. Thus, the NUM_SMT is, essentially, always 1.

What to do?

My suggestion is that the definition of NUM_PROCS be ammended to be

NUM_PROCS = CORES * NUM_SMT / NUM_THREADS

‌ so that cores*(threads/core)/(threads/mpi_proc)
And then I suggest that the feature is tried out on one of the above 4 machines by changing max-num-smt to num-smt (note, however, that philip no longer exists).

Comments (6)

  1. Steven R. Brandt reporter

    I guess this also “works” if --cores and --procs both really mean total threads. In that case, I’m not sure which name is more misleading. We could mark both as deprecated and allow a --total-threads option. That would be the least invasive change.

  2. Roland Haas

    Not sure if --total-threads is more useful since that is not a quantity people normally think about. If one takes a look at what the mpirun/ srun / aprun etc commands actually use: “processes” (= MPI ranks) and “allocated (logical ie the thing the OS counts is /proc/cpuinfo) cpus per process”. Ie they start n processes which are then free to spread out on t threads each (or not, if they only need more memory).
    So a more useful new option might be --ranks or even --mpi-ranks.

    “cores” or “nodes” may be what the queuing system allocates usually though the mpirun like options are more directly understandable to users and the queuing system options are derived from them (since each queuing system is set up slightly differently, even if they use eg SLURM).

  3. Steven R. Brandt reporter

    I chose the term “total threads” because that’s what all of the documentation uses. You have stated previously that you think you don’t want to support the terminology in the documentation. There is a PR (the `fixsub` branch of simfactory) which implements --total-threads. It also introduces --nodes and --node-procs which (I think) is the functionality you are suggesting with --ranks or --mpi-ranks. It also introduces --test to see whether the simfactory options you provide do what you think they do.

    I would be perfectly happy to name all these options something different, but I would like to see this PR accepted in some form, as both --procs and --cores are not the things that they are named after. I have frequently been confused by the behavior of these parameters.

    An alternative to the --total-threads would be to simply get rid of the smt stuff, which was the source of much confusion for me.

  4. Roland Haas

    I suspect that --total-threads will still require smt information since --total-threads = --procs * --num-threads so there is no extra information available. The smt stuff is really used for the queuing system since those sometimes need to know whether the code uses smt or not (since they want a ppn like value and that differs between smt and non-smt).

    I am not trying to suggest nodes and nodes procs, no. I am suggesting to use the names items used by mpirun which is “number of MPI ranks” and OS threads (OMP_NUM_THREADS). smt will still be required (same reason as for --total-threads). Ie re-use names and nomenclature that people who are using a cluster are already familiar with.

  5. Steven R. Brandt reporter

    “--total-threads” is just a new name for the number of parallel thingies, which aren’t really “cores” and aren’t really “procs.”

    As for “--nodes” and “--node-procs” it offers a way to directly specify what you want mpi to do, rather than doing a calucation based on cores/procs/total-threads. I suspect I’m not the only one who would find that more convenient.

  6. Log in to comment