Running upcxx programs in parallel across multiple nodes (in 2017.9.0 release)

Issue #192 resolved
Amin M. Khan created an issue

How to optimally configure, set up and run UPC++ programs in parallel across multiple nodes on SLURM-based clusters? Some general guidelines in the UPC++ guide can be useful.

We're currently using the 2017.9.0 release (can't immediately switch to the latest), and debugging to get a UPC++ program to use all the allocated nodes/cores in a cluster using SLURM.

With either using upcxx-run or (srun, sbatch, salloc, mpirun etc), we notice that UPC++ gets all the ranks just on a single node with our configuration. Here is one command for example that we tried:

cd $UPCXX_SOURCE/example/prog-guide
salloc --nodes=4 --mem-per-cpu=4000 --time=00:15:00 --account=staff \
srun $UPCXX_INSTALL/bin/upcxx-run 16 ./hello-world

Recent releases like 2018.9.0 describe how to execute UPC++ programs on SLURM-based clusters. e.g.

For multiple nodes, specify the node count with -N <nodes>.
UPC++ Programmer’s Guide (v2018.9.0), page 4

Or, for instance mpirun has been mentioned in issue #109.


Why we think the UPC++ ranks are not launched in parallel across multiple nodes in our scenario?

Well, we put in the calls to gethostname(hostname, sizeof(hostname)); and got the following output:

Running with 8 ranks ...
[Rank 6] running on compute-14-31.local...
[Rank 0] running on compute-14-31.local...
[Rank 7] running on compute-14-31.local...
[Rank 4] running on compute-14-31.local...
[Rank 3] running on compute-14-31.local...
[Rank 1] running on compute-14-31.local...
[Rank 5] running on compute-14-31.local...
[Rank 2] running on compute-14-31.local...
Running with 8 ranks ...
[Rank 3] running on compute-14-36.local...
[Rank 0] running on compute-14-36.local...
[Rank 6] running on compute-14-36.local...
[Rank 7] running on compute-14-36.local...
[Rank 1] running on compute-14-36.local..

Comments (5)

  1. Dan Bonachea

    Hi @aminmkhan :

    First I'd like to strongly encourage updating to the latest release - it includes a large number of fixes and improvements, notably including a complete rewrite of the upcxx-run script that is used for job spawning. Out of curiosity, may I ask why you're using an obsolete version? Are you using some other software forcing this dependence?

    Next thing is to ensure you've built your executable for the correct distributed-memory network backend. With a recent version the easiest way to do this is the info argument: upcxx-run -i ./hello-world The old version lacks this feature, so you'd instead need to use one of the following UNIX commands: ident hello-world | grep GASNetExtendedLibraryName or strings hello-world | grep GASNetExtendedLibraryName. If this reports a backend of "SMP" then that is the problem - the smp backend only supports single-node operation. To build for a network backend, you need to rebuild your UPC++ program with UPCXX_GASNET_CONDUIT=<backend> where backend is the network appropriate for your system hardware (eg ibv,aries or udp).

    Once you are certain you have an executable built for a distributed memory backend, then onto the job spawn command. Your ancient version of upcxx-run lacks support for the -N option used to explicitly control job layout, another good reason to update. Without that, your most likely solution is job launch directly with mpirun or srun, assuming your install was built with MPI and/or PMI job spawn support. Ie: mpirun -N 4 -n 16 hello-world or srun -N 4 -n 16 hello-world. However if you are using udp-conduit or ibv-conduit with ssh-spawner, you'll need to use amudprun or gasnetrun_ibv respectively.

    Hope this helps..

  2. Amin M. Khan reporter

    Are you using some other software forcing this dependence?

    No, not any third-party software dependency. Just our code which needs to be modified to be compatible with the later versions, apparently should get that done sooner!

    Anyway I have switched to testing using the latest upcxx-2018.9.0 to get helloworld to work on multiple nodes.

    If this reports a backend of "SMP" then that is the problem.

    Yes, that was the cause, $GASNetExtendedLibraryName: SMP $. The guide mentions that in general, the network conduit is automatically set and shouldn’t have to be changed, however this doesn't happen in our case.

    By default, helloworld.cpp gets built against smp. So $UPCXX_INSTALL/bin/upcxx-run -n 8 -N 2 ./hello-world was still restricted to just the local node.

    you need to rebuild your UPC++ program with UPCXX_GASNET_CONDUIT=<backend>

    Yes, working on that, and for now sticking to upcxx-2018.9.0.

    • UPCXX_GASNET_CONDUIT=udp compiles fine, need to correctly set up -ssh-servers HOSTS in my sbatch scripts. But, yes, $UPCXX_INSTALL/bin/upcxx-run automatically calls amudprun -v -np 8 ./hello-world so this would work.

    • UPCXX_GASNET_CONDUIT=ibv and UPCXX_GASNET_CONDUIT=mpi compile fine with mpicc/mpicxx, so able to get executable with GASNetCoreLibraryName: IBV.

    I am also referring to GASNet documentation, while CHAPEL project also has some details for the different back-ends.

    For reference, here is my environment:

    module load python2/2.7.10.gnu
    module load gcc/7.2.0
    module load openmpi.gnu/2.1.0
    
    ----------------------------------------------------------------------
    GASNet configuration:
    
     Portable conduits:
     -----------------
      Portable SMP-loopback conduit (smp)                ON     (auto)
      Portable UDP/IP conduit (udp)                      ON     (auto)
      Portable MPI conduit (mpi)                         ON     (auto)
    
     Native, high-performance conduits:
     ---------------------------------
      IBM BlueGene/Q / Power775 PAMI conduit (pami)      OFF    (not found)
      InfiniBand IB Verbs conduit (ibv)                  ON     (auto)
      Cray XE/XK Gemini conduit (gemini)                 OFF    (not found)
      Cray XC Aries conduit (aries)                      OFF    (not found)
    
    Misc Settings
     -------------
      MPI compatibility:      yes
      Pthreads support:   yes
      Segment config:         fast
      PSHM support:           posix
      FCA support:            no
      BLCR support:           no
      Atomics support:        native
    ----------------------------------------------------------------------
    

    And thanks @bonachea for the insightful comments, helped a lot.

  3. Dan Bonachea

    UPCXX_GASNET_CONDUIT=ibv and UPCXX_GASNET_CONDUIT=mpi fail to compile, so I must fix my environment configuration. (errors like gasnet_bootstrap_mpi.c:(.text+0x12e): undefined reference to 'MPI_Abort')

    Assuming you have Mellanox-compatible InfiniBand hardware, ibv-conduit should definitely be preferred over mpi-conduit which is a low-performance backend for portability only.

    The fix to the linker issues is probably to install UPC++ with CXX=mpicxx

    Alternatively if you don't want to use MPI interop or the MPI job spawner, you can set GASNET_CONFIGURE_ARGS=--without-mpicxx and then use the ssh-spawner with ibv-conduit.

  4. Amin M. Khan reporter

    Okay, got it working with both upcxx-2017.9.0 and upcxx-2018.9.0. Building upcxx with mpicc/mpicxx, and compiling programs with ibv conduit were the missing ingredients.

    cd <upcxx-source-path>
    CC=mpicc CXX=mpicxx ./install <upcxx-install-path>
    cd <upcxx-source-path>/example/prog-guide
    CC=mpicc CXX=mpicxx UPCXX_GASNET_CONDUIT=ibv make
    
    salloc --time=00:15:00 --account=staff \
    --nodes=8 \
    --tasks-per-node=4 \
    --mem-per-cpu=8G \
    $UPCXX_INSTALL/bin/upcxx-run -np 32 \
    <upcxx-source-path>/example/prog-guide/hello-world
    

    One question about the recommended settings for the memory setup (for upcxx-2017.9.0). For instance, in above configuration, with 32 GB physical memory per node, what are the good maximum configurations for?

    export GASNET_PHYSMEM_MAX=30G
    export GASNET_MAX_SEGSIZE=30G
    export UPCXX_SEGMENT_MB=4096
    

    And if I still get memory errors, then my program is to blame, right? (I will try when installing UPC++ GASNET_CONFIGURE_ARGS='--enable-pshm --disable-pshm-posix --enable-pshm-sysv' from issue #109).

    I am testing here with just a modified version of the old SpMV example, so memory requirements aren't huge.

  5. Dan Bonachea

    Sounds like the main issue is resolved. Feel free to open additional issues for other problems/questions.

    One question about the recommended settings for the memory setup (for upcxx-2017.9.0).

    This is another area that has significantly improved in the latest UPC++ releases.

    Ideally all you should need is upcxx-run -shared-heap=4GB (or whatever amount of shared memory each process wants). In the latest release you can also specify as a fraction of physical memory (eg upcxx-run -shared-heap=50%). Note there is a bug in the current release (issue #100, already fixed in develop and will be officially released next month) that affects the operation of this option when using some spawners, however if your underlying system spawner is srun then it should hopefully be unaffected.

    Also note that whatever value you pass (either via upcxx-run -shared-heap or UPCXX_SEGMENT_MB) is reserved for the UPC++ shared heap by each process at startup, meaning that portion of the physical memory is unavailable to service private memory allocation (malloc/new). So ideally you choose a value roughly large enough to encompass your expected shared memory utilization.

  6. Log in to comment