add OpenMPI env vars to notebook to avoid warning messages du to vader library in containers
Running in a Docker container based on Ubuntu 18.04 I am now getting error message like this
[ekohaes8:26785] Read -1, expected 313632, errno = 1
by the hundreds (this is using Ubuntu 18.04 rather than 16.04 so may only happen for new versions of OpenMPI. The OpenMPI ticket referenced below mentions this to happen on at least OpenMPI 4.0 and 3.1.3). This is apparently known: https://github.com/open-mpi/ompi/issues/4948 with the workaround being to set an env variables (or setting in the .conf file in $HOME):
export OMPI_MCA_btl_vader_single_copy_mechanism=none
This can also depend on the docker version used it would seem as docker run --cap-add=SYS_PTRACE ...
is offered as a host-side workaround.
Note that the simulation itself is unaffected (other than producing very many warnings).
This happened to at least one person who reported this on the mailing list: http://lists.einsteintoolkit.org/pipermail/users/2019-September/007046.html
This was reported in https://bitbucket.org/einsteintoolkit/tickets/issues/2234/trouble-in-the-tutorial-server#comment-51221094 but reverted as not directly applicable to the jupyter tutorial server. However since this now seems to happen to regular users I would like to add this setting to the tutorial notebook.
Comments (5)
-
reporter -
reporter Adding the vader env setting
export OMPI_MCA_btl_vader_single_copy_mechanism=none
also helps prevent MPI hangs in OSX VirtualBox containers (OSX being the client) when using MacPorts as described in
#2290. The same hang still occurs in OSX Catalina VMs. This may be related to an actual bug on OpenMPI: https://github.com/open-mpi/ompi/issues/6568 and affects only MPI communication with large packages (which likely explains why not every single MPI using test would hang). The OpneMPI ticket provides what looks like a reproducer.I would like to add this env variable setting (with a comment about Docker and OSX virtual machines) to the tutorial notebook (I will also use it for my own VMs to run the testsuite but that is somewhat unrelated).
-
reporter - changed status to open
-
reporter Unless objected I will apply this after 2020-06-01.
-
reporter - changed status to resolved
Applied as git hash 83b61a8 "CactusTutorial.ipynb: disable vader single copy" of jupyter-et
- Log in to comment
Any comments?