need for the updating of the submit script template for Cori
Hello, I use the nersc machine Cori to run the ET simulaiton. I am wondering how to change the number of nodes. I use the command like :
simfactory/bin/sim submit mysimulation --parfile par/file.par --procs 64 --walltime 00:30:00
when I check the status of my job by "sqs", it shows that only 1 node is used not 2 . Is there some possible way to change the number of nodes? Or the submit script template for Cori should be updated ?
Thanks. Best regards, Chia-Hui
Comments (14)
-
-
- attached illinois11_wB_540_test.out
-
- attached SubmitScript
-
- changed status to open
This has been confirmed. @eschnett and @rhaas80 have access to Cori. My own guess is that this is a fallout of NERSC switching to SLURM (https://www.nersc.gov/assets/Uploads/SLURM-NUG-Nov2015.pdf) which counts each hyperthread as a cpu so the
--procs 64
being translated into--mppwidth 64
is translated into the wrong number nodes by the PBS -> SLURM translation at NERSC. See https://docs.nersc.gov/jobs/examples/#hybrid-mpiopenmp-jobs . -
-
assigned issue to
Assigning to @eschnett due to email conversation on the mailing list.
-
assigned issue to
-
- changed component to SimFactory
Hello, I use the nersc machine Cori to run the ET simulaiton. I am wondering how to change the number of nodes. I use the command like :
simfactory/bin/sim submit mysimulation --parfile par/file.par --procs 64 --walltime 00:30:00
when I check the status of my job by "sqs", it shows that only 1 node is used not 2 . Is there some possible way to change the number of nodes? Or the submit script template for Cori should be updated ?
Thanks. Best regards, Chia-Hui
-
- attached cori.sub
The revised script of cori which is based on edison.sub
Best,
Chia-Hui
-
Thank you. I have pushed the changes you provided into the master branch of simfactory. Do you know if one also needs an option
--threads-per-core @NUM_SMT@
in cori.run? This only matters if core affinity is chosen badly by srun (though chances are that Carpet, using hwloc, fixes this at runtime).I will push the same fix to the release branch as well (ET_2019_03) after that.
-
Sorry that I am not quite sure, but I did not add that option in cori.run and it works fine.
Also thanks for your help.Best regards,
Chia-Hui
-
Ok. I will try out what happens and merge the current changes from master into the release branch for now. Just before the release Erik and mine’s Cori accounts lapsed and we are still in the process of getting them back (waiting for Erik’s still) so am just testing this now on Cori.
-
I am not quite sure yet if this is working as expected. The SMT thread assignment may be off. I did two test runs one using
srun -n 4 --threads-per-core 2 -c 16
and one usingsrun -n 4 --threads-per-core 1 -c 16
both for a 2 node submission. In both cases I getThis process runs on 16 cores: 0-7, 32-39 Thread 0 runs on 16 cores: 0-7, 32-39
which given the usual logical-cpu to thread mapping are (I think) 16 hardware threads on 8 cores which is not quite what was intended (namely it should have been 16 cores) and leaves some cores empty. In particular the
srun -n 4 --threads-per-core 1 -c 16
version should have used only 1 thread per core.These runs did not use SystemTopology. I will try with it next which may give some more insight.
-
SystemTopology helps and produces sane layouts. This is for
--procs 128 --num-smt 2
which used 2 nodes (as it should) and withsrun -n 8 --threads-per-core 2 -c 8
gives:INFO (Carpet): This process contains 16 threads, this is thread 0 INFO (Carpet): There are 128 threads in total INFO (Carpet): There are 16 threads per process INFO (Carpet): This process runs on host nid00671, pid=48257 INFO (Carpet): This process runs on 16 cores: 16-23, 48-55 INFO (Carpet): Thread 0 runs on 1 core: 16 INFO (Carpet): Thread 1 runs on 1 core: 48 INFO (Carpet): Thread 2 runs on 1 core: 17 INFO (Carpet): Thread 3 runs on 1 core: 49 INFO (Carpet): Thread 4 runs on 1 core: 18 INFO (Carpet): Thread 5 runs on 1 core: 50 INFO (Carpet): Thread 6 runs on 1 core: 19 INFO (Carpet): Thread 7 runs on 1 core: 51 INFO (Carpet): Thread 8 runs on 1 core: 20 INFO (Carpet): Thread 9 runs on 1 core: 52 INFO (Carpet): Thread 10 runs on 1 core: 21 INFO (Carpet): Thread 11 runs on 1 core: 53 INFO (Carpet): Thread 12 runs on 1 core: 22 INFO (Carpet): Thread 13 runs on 1 core: 54 INFO (Carpet): Thread 14 runs on 1 core: 23 INFO (Carpet): Thread 15 runs on 1 core: 55
while submitting just
--procs 64
results insrun -n 4 --threads-per-core 2 -c 16
(threads per core being hardwired in the runscript) andINFO (Carpet): This process contains 16 threads, this is thread 0 INFO (Carpet): There are 64 threads in total INFO (Carpet): There are 16 threads per process INFO (Carpet): This process runs on host nid00995, pid=8369 INFO (Carpet): This process runs on 16 cores: 0-15 INFO (Carpet): Thread 0 runs on 1 core: 0 INFO (Carpet): Thread 1 runs on 1 core: 1 INFO (Carpet): Thread 2 runs on 1 core: 2 INFO (Carpet): Thread 3 runs on 1 core: 3 INFO (Carpet): Thread 4 runs on 1 core: 4 INFO (Carpet): Thread 5 runs on 1 core: 5 INFO (Carpet): Thread 6 runs on 1 core: 6 INFO (Carpet): Thread 7 runs on 1 core: 7 INFO (Carpet): Thread 8 runs on 1 core: 8 INFO (Carpet): Thread 9 runs on 1 core: 9 INFO (Carpet): Thread 10 runs on 1 core: 10 INFO (Carpet): Thread 11 runs on 1 core: 11 INFO (Carpet): Thread 12 runs on 1 core: 12 INFO (Carpet): Thread 13 runs on 1 core: 13 INFO (Carpet): Thread 14 runs on 1 core: 14 INFO (Carpet): Thread 15 runs on 1 core: 15
which makes sense when using 1 SMT.
So it seems as if srun's
--threads-per-core
argument is not doing anything. -
Thanks for your efforts so that cori.run can be updated and be more completed.
Best regards,
Chia-Hui -
- changed status to resolved
This turned out to be mostly an issue with how the
-c
option was used.-c
is the number of logical cores per MPI rank and was set as@PPN@ / @NODE_PROCS@
. Howerver PPN in simfactory’s mdb files is (incorrectly if one believes the documentation string, but correctly when one checks how it is used) set to the number of physical cores per node. Somewhat sanely (in case one was to use hyperthreading) requesting 6 logical cores uses 3 physical cores along with their 3 hyperthreading partners.Corrected in git hash 8278a78 "cori: allocate thread to logical cores rather than cores used" of simfactory2 along with binding OMP threads to physical cores.
- Log in to comment
Email conversation about this is on the users mailing list: http://lists.einsteintoolkit.org/pipermail/users/2019-May/006863.html