- changed status to resolved
Problems with new upcxx-run script
Nightly CI has revealed some problems with the new upcxx-run merged in 62eaa69, in order of increasing importance:
- upcxx-run
-ssh-servers
and-localhost
options should be mutually exclusive, because passing both is meaningless. Currently the second one "wins", but it should just be a usage error. upcxx-run -ssh-servers
should be print a warning for aries/gemini/pami/mpi, because those conduits don't have an ssh spawner so the option is meaninglessupcxx-run -ssh-servers
should setGASNET_SPAWNFN=S
(on udp) orGASNET_IBV_SPAWNFN=ssh
(on ibv, and similar for ofi/mxm/psm/.. once we have those) to ensure ssh spawning is actually used (ie possibly overwriting a different default mechanism set by the environment or configure)- upcxx-run should provide more trusting behavior when neither option is passed, because the environment or configure defaults may provide the relevant information. Specifically, if neither -ssh-servers nor -localhost options were passed, upcxx-run should assume things are ok and just run it (worst case scenario the underlying spawner will report the error if something is missing).
This last one is highest priority because that's what's breaking the CI, which sets GASNET_SPAWNFN=L in en enclosing script to make localhost spawn the default) eg:
$ env GASNET_SPAWNFN=L upcxx-run -np 2 hostname-udp
usage: upcxx-run [-h] [-n NUM] [-shared-heap HEAPSZ] [-backtrace] [-show]
[-info] [-ssh-servers HOSTS] [-localhost] [-v] [-vv]
command ...
Error: For udp conduit, need to specify -ssh-servers or -localhost, or set environment variable GASNET_SSH_SERVERS
$ env GASNET_SPAWNFN=L amudprun -np 2 hostname-udp
Hello world from rank 0: pcp-d-5
Hello world from rank 1: pcp-d-5
we could of course change the CI script to pass -localhost
explicitly, but I think we want to support this usage case.
This problem also breaks ibv-conduit when configure or GASNET_IBV_SPAWNER has set MPI as the default spawner, eg:
$ upcxx-run -np 2 testgasnet-ibv
usage: upcxx-run [-h] [-n NUM] [-shared-heap HEAPSZ] [-backtrace] [-show] [-info] [-ssh-servers HOSTS] [-localhost] [-v] [-vv] command ...
Error: For ibv conduit, need to specify -ssh-servers or set environment variable GASNET_SSH_SERVERS
$ gasnetrun_ibv -np 2 testgasnet-ibv
...
node 0/2 hostname is: pcp-d-5 (supernode=0 pid=22709)
node 1/2 hostname is: pcp-d-6 (supernode=1 pid=7130)
...
in this case it doesn't make sense to request -ssh-servers
from the user, because mpirun is handling the spawn (using MPI server identification mechanisms, which are potentially site-specific), so there are potentially no "ssh servers".
CC: @PHHargrove
Comments (1)
-
reporter - Log in to comment
Resolved in 25b7d74