Clone wiki

upcxx / Home

UPC++ Version 1.0


May 10, 2018: A new v2018.3.2 tarball is now available, to correct a packaging error in v2018.3.0.
March 28, 2018: We are proud to announce a new release of the UPC++ implementation (v2018.3.0), with accompanying updates to the Specification and Programmer's Guide.

Current Downloads:

Previous Releases:


  • Includes archival versions of the Programmer's Guide and Specification suitable for citation.

Transition from v0.1 to v1.0

In November, 2016, we froze the old UPC++ repository as part of a transition phase. The transition phase is over with the current v1.0 release, and the v0.1 UPC++ repository will remain frozen in maintenance-only mode.

The current release of UPC++ v1.0 implements a majority of the Specification (see ChangeLog section of the README for status). UPC++ leverages GASNet-EX to deliver lower-overhead, fine-grained communication. It is a high-productivity communication library designed to interoperate smoothly and efficiently with MPI, OpenMP, CUDA and AMTs.

UPC++ is also a sounding board for new ideas that may be incorporated in C++20 and beyond, or influence the direction of the efforts. UPC++ v1.0 deploys new capabilities, some of which were experimental in v0.1, removes some and modifies others. The table at the end of this document lists the UPC++ features for v0.1 (left) and planned additions, deletions and changes in v1.0.

Design Philosophy

UPC++ exposes a PGAS memory model, including one-sided communication (RMA and RPC). However, there are departures from the approaches taken by some predecessors such as UPC. These changes reflect a design philosophy that encourages the UPC++ programmer to directly express what can be implemented efficiently (ie without a need for parallel compiler analysis).

  1. Most operations are non-blocking, and the powerful synchronization mechanisms encourage applications to design for aggressive asynchrony.

  2. All communication is explicit - there is no implicit data motion.

What New Features are in v1.0?

  1. Futures, promises and continuations. Futures are central to handling asynchronous operations: RMA and RPC. Futures are free-standing in that they do not depend on other parts of the library. Whereas v0.1 used an event-based mechanism for expressing task dependencies, v1.0 relies on a continuation-based model instead.

  2. Progress guarantees. Because UPC++ has no internal service threads, the library makes progress only when a core enters an active UPC++ call. UPC++ v1.0 has more well-defined progress semantics than v0.1, especially in multi-threaded scenarios.

  3. Remote atomics were experimental in v0.1 and did not necessarily utilize available hardware support. Available hardware support can now be leveraged, and the user sees significant performance benefits in certain combinations of hardware and applications. Remote atomics use the C++11 memory model and an abstraction that enables efficient offload support.

  4. Distributed objects. UPC++ v1.0 enables construction of a scalable distributed object from any C++ object type, with one instance on each rank of a team. RPC can be used to access remote instances.

  5. View-based Serialization. UPC++ v1.0 introduces a mechanism for efficiently passing large and/or complicated data arguments to RPCs.

  6. Non-contiguous RMA. UPC++ v1.0 provides functions for non-contiguous data transfers directly on shared memory, for example to efficiently copy or transpose sections of N-dimension dense arrays. UPC++ v1.0 expands and generalizes the support for non-contiguous RMA.

  7. Teams are a mechanism for grouping ranks, and are similar to MPI_Group. Teams play a role in collective communication. Initially, we plan to support barriers and reductions for specialized types supported in hardware. Others (such as the vector ‘v’ variants of alltoall) will be added over time. [Currently under development]

  8. Memory kinds. UPC++ will support global operations on memory with different kinds of access methods or performance properties, such as GPUs, HBM, NUMA and NVRAM, while providing a uniform interface for transfers between such memories. [Not yet implemented]

What has been removed from UPC++?

In developing UPC++ v1.0 we also strove for simplicity and we have removed some obsolete features present in v0.1:

  1. Multidimensional arrays (local only). We plan to interoperate with 3rd party solutions for multidimensional arrays.

  2. Distributed shared arrays - this functionality has been subsumed by generalized distributed objects, which provide a more scalable solution.

  3. Shared scalars

  4. Blocking communication (e.g. implicit global pointer dereference)

Feature Comparison

V0.1 V1.0
Global Pointers
Futures, Continuations, Promises
Events Subsumed by futures, continuations, promises
Put and Get
Non-contiguous transfers Experimental
Distributed 1D Arrays Subsumed by distributed objects
RPC ✔ Serialization improvements
Distributed Objects
Collectives Experimental
Teams Experimental
Global Pointer Dereference ✔ (Implicit blocking)
Memory Kinds (e.g. GPU)
Shared Scalar Variables ✔ (Little use)
Non-Distributed MD Arrays ✔ ndarray prototype
Progress Guarantees ✔ More rigorous
Atomics Experimental

Contact Info