Clone wiki

upcxx / Home

UPC++ Version 1.0


Sept 29, 2017: We are proud to announce the initial release of the UPC++ implementation (v2017.9.0), with accompanying Specification and Programmer's Guide.


Transition from v0.1 to v1.0

In November, 2016, we froze the old UPC++ repository as part of a transition phase. The transition phase is over with the current v1.0 release, and the v0.1 UPC++ repository will remain frozen in maintenance-only mode.

The recent release of UPC++, version 1.0, implements a subset of the specification that was updated to Draft version 4 for this software release. UPC++ leverages GASNet-EX to deliver lower overhead fine grained communication. It is a high-productivity communication library designed to interoperate smoothly and efficiently with MPI, OpenMP, CUDA and AMTs.

UPC++ is also a sounding board for new ideas that may be incorporated in C++20 and beyond, or influence the direction of the efforts. UPC++ v1.0 deploys new capabilities, some of which were experimental in v0.1, removes some and modifies others. The table at the end of this document lists the UPC++ features for v0.1 (left) and planned additions, deletions and changes in v1.0. Features that have not yet been implemented, or only in part, are labelled in the table accordingly, and for more information, consult the release in the ChangeLog.

Design Philosophy

UPC++ exposes a PGAS memory model, including one-sided communication (RMA and RPC). However, there are two major changes. These changes reflect a design philosophy that encourages the UPC++ programmer to directly express what can be implemented efficiently (ie without a need for parallel compiler analysis).

  1. Most operations are non-blocking, and the powerful synchronization mechanisms encourage applications to design for aggressive asynchrony.

  2. All communication is explicit - there is no implicit data motion.

What New Features are in version 1.0?

  1. Futures, promises and continuations. Futures are central to handling asynchronous operations: RMA and RPC. Futures are free-standing in that they do not depend on other parts of the library. Whereas v0.1 used an event-based mechanism for expressing task dependencies, v1.0 relies on a continuation-based model instead. [Certain modes of completion semantics currently under development]

  2. Progress guarantees. Because UPC++ has no internal service threads, the library makes progress only when a core enters an active UPC++ call. UPC++ v1.0 has more well-defined progress semantics than v0.1, especially in multi-threaded scenarios.

  3. Remote atomics were experimental in v0.1 and did not necessarily utilize available hardware support. Any available hardware support will now be leveraged, and the user will see significant performance benefits in certain combinations of hardware and applications. Remote atomics will use the C++11 memory model and free function API. We restrict atomics to fetch and add in the near term, but are evaluating adding others. [Currently under development]

  4. Teams are a mechanism for grouping ranks, and are similar to MPI_Group. Teams play a role in collective communication and also in storage allocation. Initially, we plan to support barriers and reductions for specialized types supported in hardware. Others (such as the vector ‘v’ variants of alltoall) will be added over time. [Currently under development]

  5. Distributed objects. UPC++ 1.0 enables a C++ object of any type to be made into a distributed object, with one instance on every rank of a team. RPC can be used to scalably access remote instances within a team.

  6. Memory kinds. UPC++ will support global operations on memory with different kinds of access methods or performance properties, such as GPUs, HBM, NUMA and NVRAM, while providing a uniform interface for transfers between such memories. [Not yet implemented]

What has been removed from UPC++?

In developing UPC++ v1.0 we also strove for simplicity and we have removed some obsolete features present in v0.1:

  1. Multidimensional arrays (local only). We plan to interoperate with 3rd party solutions for multidimensional arrays.

  2. Distributed shared arrays - this functionality has been subsumed by generalized distributed objects, which provide a more scalable solution.

  3. Shared scalars

  4. Blocking communication (e.g. implicit global pointer dereference)

Feature Comparison

Version 0.1 Version 1.0
Global Pointers
Futures, Continuations, Promises
Events Subsumed by futures, continuations, promises
Put and Get
Non-contiguous transfers Experimental
Distributed 1D Arrays Subsumed by distributed objects
Distributed Objects
Collectives Experimental
Teams Experimental
Global Pointer Dereference ✔ (Implicit blocking)
Memory Kinds (e.g. GPU)
Shared Scalar Variables ✔ (Little use)
Non-Distributed MD Arrays ✔ ndarray prototype
Progress Guarantees ✔ More rigorous
Atomics Experimental