add option to run tests in parallel

Issue #2463 resolved

Roland Haas created an issue 2020-09-22

Currently Cactus' testsuite mechanism will run tests one after the other even if there are very many cores available on the node.

Pull request https://bitbucket.org/cactuscode/cactus/pull-requests/114/cactus-run-tests-in-parallel-if-requested adds options to the testsuite system to run multiple tests in parallel.

Using it goes like this:

# prevent OpenMP from using all cores
export OMP_NUM_THREADS=2
# prevent OpenMPI from ties multiple MPI jobs to the same cores
export OMPI_MCA_hwloc_base_binding_policy=none
# run up to 6 tests in parallel
make CCTK_TESTSUITE_PARALLEL_TESTS=6 sim-testsuite PROMPT=no

for each parallel test it collects all screen output and outputs it at once to avoid mixing screen output from multiple tests.
in its current state it counts tests but not eg MPI ranks or cores used by tests. Making it count MPI ranks should be easy, making it count OpenMP threads is impossible with the information at hand since Cactus does not know if a given thorn will use OpenMP or not (but does know the number of MPI ranks that are launchend for a test)
this will most likely not work on clusters which often prevent multiple mpiruns from targetting the same compute node

‌

Comments (10)

Roland Haas reporter
It requires some Perl POSIX functionality (all in the core language though), namely POSIX fork, pipeand wait. Those exists on all POSIX systems, which is almost all systems Cactus supports (except Windows when used natively ie not via WSL). If these are missing then at worst this simply renders the functionality non-functional but does not prevent regular (serialized) tests from running which does not use fork. Parallel runs may actually work if Perl’s or MSYS / Cygwin’s forkemulation is sufficient, the use statements work fine in MSYS though I have not tested this.
- 2020-09-22T15:56:26+00:00
Erik Schnetter
An idea: It might make sense to use make to run things in parallel. We’re using make everywhere else, and we know how to choose the number of processes etc.
- 2020-09-22T16:25:05+00:00
Roland Haas reporter
I had not quite thought of that. Quite clearly using fork and pipe makes use of a lot of low level calls that people are unlikely to be familiar with (and it is kind of ugly).

Part of why I am used fork and pipe was to avoid having to create temporary files for each test that runs to shuttle information into RunTestUtils.pl.

This may make it harder to provide “live” reports on which test failed, though one might remove those for parallel tests (one should be able to check for parallel makes by looking for -j in MAKEFLAGS https://www.gnu.org/software/make/manual/html_node/Options_002fRecursion.html#Options_002fRecursion).

Will require a bit of work to create the appropriate makefile and commands for the makefile, though nothing complicated it would seem.
- 2020-09-22T17:25:23+00:00
Roland Haas reporter
- assigned issue to
  
  Roland Haas
- 2020-09-23T13:08:44+00:00
Steven R. Brandt
I would really like to see this feature in the next release, regardless of how it’s implemented.
- 2020-10-08T15:12:11+00:00
Roland Haas reporter
- changed status to open
- 2020-10-08T15:36:38+00:00
Roland Haas reporter
Ok, so I will add it to master for now and remove it should it cause issues on the clusters.
- 2020-10-08T15:45:23+00:00
Roland Haas reporter
Applied as git hash 94def809 "Cactus: change perl quotations to improve syntax highlight" of cactus.

Tentative so far.
- 2020-10-15T00:51:15+00:00
Roland Haas reporter
- changed status to resolved
- 2020-10-15T00:51:22+00:00
Roland Haas reporter
I accidentally pushed the version without fallback when no parallel tests are requested. Fixed (and an inconsistency when reporting exit codes addressed) in git hash c9922081 "Cactus: report correct error code of child shell in parallel tests" of cactus

On the plus side, this showed that the parallel test code works on all clusters we support since I ran the last set of tests on the clusters without the no-parallel-tests fallback.
- 2020-10-22T01:25:36+00:00
Log in to comment

Assignee: Roland Haas

Type: enhancement

Priority: minor

Status: resolved

Component: –

Milestone: –

Version: –

Votes: 0

Watchers: 1