Fatal error in PMPI_Comm_rank: Invalid communicator

Create issue
Issue #2372 closed
Jason Kodish created an issue

I have set up the Einstein Toolkit on an Ubuntu 18 computer. The compiliation went ok and has completed. When I run the helloworld example it fails

Fatal error in PMPI_Comm_rank: Invalid communicator, error stack:
PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xe1466d00, rank=0x7ffdf5c14c78) failed
PMPI_Comm_rank(68).: Invalid communicator
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=403262725
system msg for write_line failure : Bad file descriptor
Sun Apr 26 19:54:17 MDT 2020

This is after a clean compilation based on the documentation found found here:

https://github.com/nds-org/jupyter-et/blob/master/CactusTutorial.ipynb

from what I can dig up from google it seems like some kind of mpi mismatch, but isn’t this supposed to compile it’s own MPI?
Can I force it to do that?

I don’t know how to fix this. If anyone has any ideas that would be great. I can delete everything and start over if needed

Thank you

Comments (5)

  1. Roland Haas

    Please see the discussion in http://lists.einsteintoolkit.org/pipermail/users/2020-April/007388.html

    While the Einstein Toolkit can build its own MPI stack as a fallback solution, it will preferentially use an already installed MPI. However if multiple MPI stacks are installed (and they are not set up correctly) the automatic detection will fail.

    This happens to you on a freshly installed Ubuntu 18.04 system, did you install anything beyond the packages requested in https://github.com/nds-org/jupyter-et/blob/master/CactusTutorial.ipynb ?

    If you want to force the toolkit to built its own MPI stack, then you will have to edit the OptionList file to include MPI=BUILD and add PATH=@SOURCEDIR@/exe/@CONFIGURATION@:$PATH to the RunScript file that is configs/sim. The build from scratch using:

    make sim-clean
    ./simfactory/bin/sim build --reconfig
    

    Please keep in mind that compiling one’s own MPI stack is mostly untested.

  2. Jason Kodish reporter

    Thank you again for all your help. What finally fixed it was adding the following to the <machinename>.ini file (machine name being the one created by simfactory/bin/sim setup-silent.

    This file is located at Cactus/simfactory/mdb/machines

    I added this at the bottom of the file, then ran the compile instructions per the tutorial. It compiled and works beautifully. I’m leaving this information in the hopes that if someone with the same problem finds it and it helps them.

    Thank you again for the fast response and all the help. (unusual on some projects). Please feel free to close this ticket.

    MPI_DIR = /usr/lib/x86_64-linux-gnu/openmpi
    MPI_INC_DIRS = /usr/lib/x86_64-linux-gnu/openmpi/include
    MPI_LIB_DIRS = /usr/lib/x86_64-linux-gnu/openmpi/lib
    MPI_LIBS = mpi
    

  3. Ian Hinder

    Are you sure about that? Those lines belong in an optionlist (mdb/optionlists/*.cfg), not a machine definition file (mdb/machines/*.ini). I would have expected SimFactory to complain if you put those in the machine definition, but maybe it ignores them. In which case, it might just have been doing it again that helped.

  4. Jason Kodish reporter

    Not sure. I did try recompiling several times with it failing though. This seemed to work. Maybe itbgets coppied over?

  5. Log in to comment