add option to run tests in parallel

Create issue
Issue #2463 resolved
Roland Haas created an issue

Currently Cactus' testsuite mechanism will run tests one after the other even if there are very many cores available on the node.

Pull request adds options to the testsuite system to run multiple tests in parallel.

Using it goes like this:

# prevent OpenMP from using all cores
# prevent OpenMPI from ties multiple MPI jobs to the same cores
export OMPI_MCA_hwloc_base_binding_policy=none
# run up to 6 tests in parallel
  • for each parallel test it collects all screen output and outputs it at once to avoid mixing screen output from multiple tests.
  • in its current state it counts tests but not eg MPI ranks or cores used by tests. Making it count MPI ranks should be easy, making it count OpenMP threads is impossible with the information at hand since Cactus does not know if a given thorn will use OpenMP or not (but does know the number of MPI ranks that are launchend for a test)
  • this will most likely not work on clusters which often prevent multiple mpiruns from targetting the same compute node

Comments (10)

  1. Roland Haas reporter

    It requires some Perl POSIX functionality (all in the core language though), namely POSIX fork, pipeand wait. Those exists on all POSIX systems, which is almost all systems Cactus supports (except Windows when used natively ie not via WSL). If these are missing then at worst this simply renders the functionality non-functional but does not prevent regular (serialized) tests from running which does not use fork. Parallel runs may actually work if Perl’s or MSYS / Cygwin’s forkemulation is sufficient, the use statements work fine in MSYS though I have not tested this.

  2. Erik Schnetter

    An idea: It might make sense to use make to run things in parallel. We’re using make everywhere else, and we know how to choose the number of processes etc.

  3. Roland Haas reporter

    I had not quite thought of that. Quite clearly using fork and pipe makes use of a lot of low level calls that people are unlikely to be familiar with (and it is kind of ugly).

    Part of why I am used fork and pipe was to avoid having to create temporary files for each test that runs to shuttle information into

    This may make it harder to provide “live” reports on which test failed, though one might remove those for parallel tests (one should be able to check for parallel makes by looking for -j in MAKEFLAGS

    Will require a bit of work to create the appropriate makefile and commands for the makefile, though nothing complicated it would seem.

  4. Steven R. Brandt

    I would really like to see this feature in the next release, regardless of how it’s implemented.

  5. Roland Haas reporter

    Ok, so I will add it to master for now and remove it should it cause issues on the clusters.

  6. Roland Haas reporter

    I accidentally pushed the version without fallback when no parallel tests are requested. Fixed (and an inconsistency when reporting exit codes addressed) in git hash c9922081 "Cactus: report correct error code of child shell in parallel tests" of cactus

    On the plus side, this showed that the parallel test code works on all clusters we support since I ran the last set of tests on the clusters without the no-parallel-tests fallback.

  7. Log in to comment