Problems with new upcxx-run script

Issue #103 resolved
Dan Bonachea created an issue

Nightly CI has revealed some problems with the new upcxx-run merged in 62eaa69, in order of increasing importance:

  1. upcxx-run -ssh-servers and -localhost options should be mutually exclusive, because passing both is meaningless. Currently the second one "wins", but it should just be a usage error.
  2. upcxx-run -ssh-servers should be print a warning for aries/gemini/pami/mpi, because those conduits don't have an ssh spawner so the option is meaningless
  3. upcxx-run -ssh-servers should set GASNET_SPAWNFN=S (on udp) or GASNET_IBV_SPAWNFN=ssh (on ibv, and similar for ofi/mxm/psm/.. once we have those) to ensure ssh spawning is actually used (ie possibly overwriting a different default mechanism set by the environment or configure)
  4. upcxx-run should provide more trusting behavior when neither option is passed, because the environment or configure defaults may provide the relevant information. Specifically, if neither -ssh-servers nor -localhost options were passed, upcxx-run should assume things are ok and just run it (worst case scenario the underlying spawner will report the error if something is missing).

This last one is highest priority because that's what's breaking the CI, which sets GASNET_SPAWNFN=L in en enclosing script to make localhost spawn the default) eg:

$ env GASNET_SPAWNFN=L upcxx-run -np 2 hostname-udp                                                                                      
usage: upcxx-run [-h] [-n NUM] [-shared-heap HEAPSZ] [-backtrace] [-show]
                 [-info] [-ssh-servers HOSTS] [-localhost] [-v] [-vv]
                 command ...

Error: For udp conduit, need to specify -ssh-servers or -localhost, or set environment variable GASNET_SSH_SERVERS

$  env GASNET_SPAWNFN=L amudprun -np 2 hostname-udp                     
Hello world from rank 0: pcp-d-5
Hello world from rank 1: pcp-d-5

we could of course change the CI script to pass -localhost explicitly, but I think we want to support this usage case.

This problem also breaks ibv-conduit when configure or GASNET_IBV_SPAWNER has set MPI as the default spawner, eg:

$  upcxx-run -np 2 testgasnet-ibv 
usage: upcxx-run [-h] [-n NUM] [-shared-heap HEAPSZ] [-backtrace] [-show] [-info] [-ssh-servers HOSTS] [-localhost] [-v] [-vv] command ...

Error: For ibv conduit, need to specify -ssh-servers or set environment variable GASNET_SSH_SERVERS

$ gasnetrun_ibv -np 2 testgasnet-ibv 
...
node 0/2 hostname is: pcp-d-5 (supernode=0 pid=22709)
node 1/2 hostname is: pcp-d-6 (supernode=1 pid=7130)
...

in this case it doesn't make sense to request -ssh-servers from the user, because mpirun is handling the spawn (using MPI server identification mechanisms, which are potentially site-specific), so there are potentially no "ssh servers".

CC: @PHHargrove

Comments (1)

  1. Log in to comment