add option to run tests in parallel
Currently Cactus' testsuite mechanism will run tests one after the other even if there are very many cores available on the node.
Pull request https://bitbucket.org/cactuscode/cactus/pull-requests/114/cactus-run-tests-in-parallel-if-requested adds options to the testsuite system to run multiple tests in parallel.
Using it goes like this:
# prevent OpenMP from using all cores
export OMP_NUM_THREADS=2
# prevent OpenMPI from ties multiple MPI jobs to the same cores
export OMPI_MCA_hwloc_base_binding_policy=none
# run up to 6 tests in parallel
make CCTK_TESTSUITE_PARALLEL_TESTS=6 sim-testsuite PROMPT=no
- for each parallel test it collects all screen output and outputs it at once to avoid mixing screen output from multiple tests.
- in its current state it counts tests but not eg MPI ranks or cores used by tests. Making it count MPI ranks should be easy, making it count OpenMP threads is impossible with the information at hand since Cactus does not know if a given thorn will use OpenMP or not (but does know the number of MPI ranks that are launchend for a test)
- this will most likely not work on clusters which often prevent multiple
mpiruns
from targetting the same compute node
Comments (10)
-
reporter -
An idea: It might make sense to use
make
to run things in parallel. We’re usingmake
everywhere else, and we know how to choose the number of processes etc. -
reporter I had not quite thought of that. Quite clearly using
fork
andpipe
makes use of a lot of low level calls that people are unlikely to be familiar with (and it is kind of ugly).Part of why I am used
fork
andpipe
was to avoid having to create temporary files for each test that runs to shuttle information intoRunTestUtils.pl
.This may make it harder to provide “live” reports on which test failed, though one might remove those for parallel tests (one should be able to check for parallel makes by looking for
-j
inMAKEFLAGS
https://www.gnu.org/software/make/manual/html_node/Options_002fRecursion.html#Options_002fRecursion).Will require a bit of work to create the appropriate makefile and commands for the makefile, though nothing complicated it would seem.
-
reporter -
assigned issue to
-
assigned issue to
-
I would really like to see this feature in the next release, regardless of how it’s implemented.
-
reporter - changed status to open
-
reporter Ok, so I will add it to master for now and remove it should it cause issues on the clusters.
-
reporter -
reporter - changed status to resolved
-
reporter I accidentally pushed the version without fallback when no parallel tests are requested. Fixed (and an inconsistency when reporting exit codes addressed) in git hash c9922081 "Cactus: report correct error code of child shell in parallel tests" of cactus
On the plus side, this showed that the parallel test code works on all clusters we support since I ran the last set of tests on the clusters without the no-parallel-tests fallback.
- Log in to comment
It requires some Perl POSIX functionality (all in the core language though), namely POSIX
fork
,pipe
andwait
. Those exists on all POSIX systems, which is almost all systems Cactus supports (except Windows when used natively ie not via WSL). If these are missing then at worst this simply renders the functionality non-functional but does not prevent regular (serialized) tests from running which does not usefork
. Parallel runs may actually work if Perl’s or MSYS / Cygwin’sfork
emulation is sufficient, theuse
statements work fine in MSYS though I have not tested this.