Wiki
Clone wikiupcxx / docs / system / perlmutter
Using UPC++ on NERSC Perlmutter
This document is a continuous work-in-progress, intended to provide up-to-date information on a public install maintained by (or in collaboration with) the UPC++ team. However, systems are constantly changing. So, please report any errors or omissions in the issue tracker.
Typically installs of UPC++ are maintained only for the current default versions of the system-provided environment modules such as for PrgEnv, CUDA and compiler.
This document is not a replacement for the documentation provided by the centers, and assumes general familiarity with the use of the system.
General
Stable installs are available through environment modules. A wrapper is used
to transparently dispatch commands such as upcxx
to an install appropriate to
the currently loaded PrgEnv-{gnu,cray,nvidia,aocc}
and compiler (gcc
,
cce
, nvidia
or aocc
) environment modules.
Environment Modules
In order to access the UPC++ installation on Perlmutter, one must run
$ module load contrib
MODULEPATH
before the UPC++ environment
modules will be accessible. We recommend inclusion of this command in one's
shell startup files, such as $HOME/.login
or $HOME/.bash_profile
.
If not adding the command to one's shell startup files, the module load contrib
command will be required once per login shell in which you need a upcxx
environment module.
Environment modules provide two alternative configurations of the UPC++ library:
upcxx-cuda
This module supports "memory kinds", a UPC++ feature that enables communication to/from CUDA memory when utilizingupcxx::device_allocator
.upcxx
This omits support forupcxx::device_allocator<upcxx::cuda_device>
, resulting in a small potential speed-up for applications which do not require this feature.
By default each module above will select the latest recommended version of the
UPC++ library. One can see the installed versions with a command like module
avail upcxx
and optionally explicitly select a particular version with a
command of the form: module load upcxx/20XX.YY.ZZ
.
On Perlmutter, the UPC++ environment modules select a default network of ofi
.
You can optionally specify this explicitly on the compile line with
upcxx -network=ofi ...
.
Caveats
The installs provided on Perlmutter utilize the Cray Programming Environment,
and the cc
and CC
compiler wrappers in particular. It is possible to use
upcxx
(or CC
and upcxx-meta
) to link code compiled with the "native
compliers" such as g++
and nvc++
(provided they match the PrgEnv-*
module). However, direct use of the native compilers to link UPC++ code is not
supported with these installs.
Job launch
The upcxx-run
utility provided with UPC++ is a relatively simple wrapper,
which in the case of Perlmutter uses srun
via an additional wrapper
upcxx-srun
(see below). To have full control over process placement, thread
pinning and GPU allocation, users are advised to launch their UPC++
applications using upcxx-srun
, which works like srun
with the addition of
providing NIC binding. One should do so with the upcxx
or upcxx-cuda
environment module loaded.
Whenever using srun
in place of upcxx-run
, if you would
normally have passed -shared-heap
to upcxx-run
, then it is particularly
important that both UPCXX_SHARED_HEAP_SIZE
and GASNET_MAX_SEGSIZE
be set
accordingly. The values of those and other potentially relevant environment
variables set (or inherited) by upcxx-run
can be listed by adding -show
to
your upcxx-run
command (which will print useful information but not run
anything).
Additional information is available in the
Advanced Job Launch
chapter of the UPC++ v1.0 Programmer's Guide.
upcxx-srun
Each Perlmutter GPU node contains 64 CPU cores and 4 Slingshot-11 NICs (and 4 GPUs). Currently each UPC++ process can use at most one Slingshot NIC. In order for a job to utilize all four NICs on a Perlmutter GPU node, all of the following are necessary:
- run at least four processes per node
- ensure each process is bound to distinct CPU cores out of the 64 available
- set environment variables directing each process to use the NIC most appropriate to its core binding
The upcxx-srun
launch wrapper helps to automate those three items.
The first purpose of the upcxx-srun
wrapper installed on Perlmutter is to
set the GASNET_OFI_DEVICE*
family of environment variables as
appropriate for the current Perlmutter partition (i.e. GPU nodes vs CPU nodes),
satisfying requirement 3 above.
The second purpose of the script is to ensure the job launch command requests
a suitable core binding, unless one has already been requested by the environment or
command line, thus satisfying requirement 2 above.
Subject to the following differences, the use of upcxx-srun
should be otherwise identical to srun
:
- One must use
--ntasks
or its short form-n
. Cases in whichsrun
would normally compute a task count from other arguments are not supported. - One is required to place
--
between the srun options and the executable name, to prevent application options from being parsed by the wrapper as if they weresrun
options. - The
-shared-heap
and-backtrace
options toupcxx-run
are accepted, but must appear before the required--
.
Single-node runs
On a system like Perlmutter, there are multiple complications related to launch
of executables compiled for -network=smp
such that no use of srun
(or
simple wrappers around it) can provide a satisfactory solution in general.
Therefore, we recommend that for single-node (shared memory) application runs
on Perlmutter, one should compile for the default network (ofi). It is also
acceptable to use -network=mpi
, such as may be required for some hybrid
applications (UPC++ and MPI in the same executable). However, note that in
multi-node runs -network=mpi
imposes a significant performance penalty.
Batch jobs
By default, batch jobs on Perlmutter inherit both $PATH
and the $MODULEPATH
from the environment at the time the job is submitted/requested using sbatch
or salloc
. So, no additional steps are needed to use upcxx-run
if a
upcxx
environment module was loaded when sbatch
or salloc
ran.
Interactive example:
perlmutter$ module load contrib perlmutter$ module load upcxx perlmutter$ upcxx --version UPC++ version 2022.9.0 / gex-2022.9.0-0-gc2b830e Citing UPC++ in publication? Please see: https://upcxx.lbl.gov/publications Copyright (c) 2022, The Regents of the University of California, through Lawrence Berkeley National Laboratory. https://upcxx.lbl.gov nvc++ 21.11-0 64-bit target on x86-64 Linux -tp zen2-64 NVIDIA Compilers and Tools Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved. perlmutter$ upcxx -O hello-world.cpp -o hello-world.x perlmutter$ salloc -C gpu -q interactive --nodes 2 salloc: Granted job allocation 1722947 salloc: Waiting for resource configuration salloc: Nodes nid[002700-002701] are ready for job nid002700$ upcxx-run -n 4 -N 2 ./hello-world.x Hello world from process 0 out of 4 processes Hello world from process 1 out of 4 processes Hello world from process 2 out of 4 processes Hello world from process 3 out of 4 processes
CMake
A UPCXX
CMake package is provided in the UPC++ install on Perlmutter, as
described in README.md. Thus with the upcxx
environment
module loaded, CMake should "just work".
Information about UPC++ installs on other production systems
Please report any errors or omissions in the issue tracker.
Updated