UPC++ Version 1.0
Lawrence Berkeley National Laboratory is hiring!
The Computer Languages and System Software Group (CLaSS) in the Computing Research Division at LBNL is recruiting for the following positions:
- CLaSS Group Lead: upcxx.lbl.gov/class-lead
- C++ Programmer/Software Engineer: upcxx.lbl.gov/2020-cxx-dev
- HPC Application Developer: upcxx.lbl.gov/2020-hpc-dev
July 17, 2020: A new UPC++ 2020.3.2 release is now available!
February 1, 2020: A new UPC++ Training site is now available, including video tutorials!
- UPC++ Implementation 2020.3.2 (tar.gz)
- UPC++ Programmer's Guide, Revision 2020.3.0 (PDF)
- A gentle introduction to UPC++ with examples and descriptions.
- Also available online as a single HTML page
- UPC++ Specification, Revision 2020.3.0 (PDF)
- Formal specification of the UPC++ library interface.
- UPC++ Extras (Repo)
- Optional extensions, including a new
dist_arrayclass template for scalable distributed arrays
- Extended example codes and tutorial materials
- Optional extensions, including a new
- Learning to use the library
- Includes Pagoda group publications and citation information for the documentation.
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming, and is designed to interoperate smoothly and efficiently with MPI, OpenMP, CUDA and AMTs. It leverages GASNet-EX to deliver low-overhead, fine-grained communication, including Remote Memory Access (RMA) and Remote Procedure Call (RPC).
UPC++ exposes a PGAS memory model, including one-sided communication (RMA and RPC). However, there are departures from the approaches taken by some predecessors such as UPC. These changes reflect a design philosophy that encourages the UPC++ programmer to directly express what can be implemented efficiently (ie without a need for parallel compiler analysis).
Most operations are non-blocking, and the powerful synchronization mechanisms encourage applications to design for aggressive asynchrony.
All communication is explicit - there is no implicit data motion.
UPC++ encourages the use of scalable data-structures and avoids non-scalable library features.
What Features Comprise UPC++?
RMA. UPC++ provides asynchronous one-sided communication (Remote Memory Access, a.k.a. Put and Get) for movement of data among processes.
RPC. UPC++ provides asynchronous Remote Procedure Call for running code (including C++ lambdas) on other processes.
Futures, promises and continuations. Futures are central to handling asynchronous operation of RMA and RPC. UPC++ uses a continuation-based model to express task dependencies.
Progress guarantees. Because UPC++ has no internal service threads, the library makes progress only when a core enters an active UPC++ call. However, the "persona" concept makes writing progress threads simple.
Remote atomics use an abstraction that enables efficient offload where hardware support is available.
Distributed objects. UPC++ enables construction of a scalable distributed object from any C++ object type, with one instance on each rank of a team. RPC can be used to access remote instances.
Serialization. UPC++ introduces several complementary mechanisms for efficiently passing large and/or complicated data arguments to RPCs.
Non-contiguous RMA. UPC++ provides functions for non-contiguous data transfers directly on shared memory, for example to efficiently copy or transpose sections of N-dimension dense arrays.
Teams represent ordered sets of processes and play a role in collective communication. Initially we support barrier, broadcast and reductions, including abstractions to enable offload of reductions supported in hardware.
Memory kinds. UPC++ provides uniform interfaces for transfers between memory with different properties. Beginning in the 2019.3.0 release, UPC++ provides a prototype implementation for CUDA GPUs. Future releases will refine this capability, and may expand this to include other forms of non-host memory.
A comparison to the feature set of UPC++ v0.1 is also available.
Notable applications/kernels/frameworks using UPC++:
- HipMer: An Extreme-Scale De Novo Genome Assembler, developed by the ECP ExaBiome Project
- symPACK: A sparse symmetric matrix direct linear solver
- mel-upx: Implements half-approximate graph matching (BoF slides), developed by the ECP ExaGraph Co-Design Center.
- SWE-UPC++: Shallow Water Equations for tsunami simulation, using the UPC++ Actor Library
- Berkeley Container Library (BCL): A cross-platform C++ library of distributed data structures, with backends for GASNet and UPC++
- ConvergentMatrix: A dense matrix abstraction for distributed-memory HPC platforms, used to implement the SEMUCB-WM1 tomographic model.
Other related software:
- upcxx-extras: UPC++ extra examples and optional extensions
- upcxx-utils: Set of utilities layered over UPC++, authored by the HipMer group
- Berkeley UPC: Now supports hybrid UPC/UPC++ applications!
- GASNet-EX: The portable, high-performance communication runtime used by UPC++
- MRG8: An efficient, high-period PRNG with skip-ahead, designed for exascale HPC
UPC++ is developed and maintained by the Pagoda Project at Lawrence Berkeley National Laboratory (LBNL), and is funded primarily by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration.
- 2020.3.0: [Implementation] [Guide] [Specification] [Announcement]
- 2019.9.0: [Implementation] [Guide] [Specification] [Announcement]
- 2019.3.2: [Implementation] [Guide] [Specification] [Announcement 1] [Announcement 2]
- 2018.9.0: [Implementation] [Guide] [Specification] [Announcement]
- 2018.3.2: [Implementation] [Guide] [Specification] [Announcement]
- 2018.1.0: [Implementation] [Guide] [Specification] [Announcement]
- 2017.9.0: [Implementation] [Guide] [Specification] [Announcement]