running on wheeler fails with error messages from MPI

Create issue
Issue #2188 closed
Roland Haas created an issue

Trying to run on wheeler fails with lengthy error messages from MPI even when using a single MPI rank.

I attach a sample error file from the gaussian test in the testsuite.

Wheeler is a private machine at Caltech so if this cannot be fixed I would remove it from the list of machines shown on http://einsteintoolkit.org/testsuite_results/index.php .

Erik since you maintained wheeler's files and the machine files point to an MPI stack in your $HOME, do you want to look into this? Otherwise I can set up a machine setup using the software stack used by SpEC (which sees much more regular use on wheeler) but that will likely not use the same set of modern compilers and other software that your setup uses, or we can remove wheeler from the list of officially supported machines for this release.

Keyword: None

Comments (4)

  1. anonymous
    • removed comment

    So... dumb question. Did you make sure that the mpirun and mpic++ that thorn MPI configured with were from the same MPI? Because, it seems, the configure script does not look in the PATH first.

  2. Roland Haas reporter
    • removed comment

    Good question. I had a look just now. The log file shows that the mpirun used was /home/eschnett/src/spack-view/bin/mpirun and we set MPI_DIR to be /home/eschnett/src/spack-view however MPI_LIB_DIRS ends up being configured (by MPI's detect.pl script) as /home/eschnett/src/spack/opt/spack/linux-rhel7-x86_64/gcc-7.3.0-spack/openmpi-3.0.1-6lcorydh3sxme5eu3pculmgzo2nefolv/lib . We do not (other than on Crays) use mpicc but instead use the normal compiler and explicitly link against the MPI library. The mpirun in in /home/eschnett/src/spack-view/bin/mpirun is identical to the one in /home/eschnett/src/spack/opt/spack/linux-rhel7-x86_64/gcc-7.3.0-spack/openmpi-3.0.1-6lcorydh3sxme5eu3pculmgzo2nefolv/bin so it should be compatible.

    So, it does not seem to be quite so simple.

    If I had to bet money I would bet on the system infiniband library having changed and one has to recompile OpenMPI to account for this.

  3. Log in to comment