- changed status to resolved
OpenMPI in tutorial containers binds multiple user's code to the same cores
Issue #2281
resolved
OpenMPI by default binds MPI ranks to specific cores and on a shared system like the tutorial server it binds multiple users' MPI ranks to the same core. Eg
16625 ubuntu 20 0 7588 1012 928 R 50.0 0.0 0:38.28 awk 16670 ubuntu 20 0 7588 888 804 R 50.0 0.0 0:31.65 awk 16671 ubuntu 20 0 7588 968 884 R 50.0 0.0 0:31.66 awk 16624 ubuntu 20 0 7588 1064 980 R 43.8 0.0 0:38.32 awk
when running (in two logins in jupyter notebooks)
!mpirun -n 2 /usr/bin/awk 'BEGIN{while(1);}' /dev/null
A fix is to add the following environment variable to the tutorial notebook:
OMPI_MCA_hwloc_base_binding_policy=none
Comments (1)
-
reporter - Log in to comment
Applied as git hash 1af1dfc "CactusTutorial: disable HWLOC's core binding" of jupyter-et