UPC++ Version 1.0
Jan 31, 2018: We are proud to announce the a new beta release of the UPC++ implementation (v2018.1.0), with accompanying updates to the Specification and Programmer's Guide.
- UPC++ Implementation, v2018.1.0 (tar.gz)
- Contains everything you need to start using UPC++ on supported platforms (currently x86_64 Linux, Mac OS and Cray XC)
- Installation automatically downloads the GASNet-EX communication library (internet connection required)
- Note this initial release has not yet been tuned for performance and some specified features remain unimplemented
- Includes all of the documents below
- See README.md (includes ChangeLog) and INSTALL.md
- UPC++ Programmer's Guide v2018.1.0 (PDF)
- A gentle introduction to UPC++ with examples and descriptions.
- UPC++ Specification, v1.0 Draft 5 (PDF)
- Formal specification of the UPC++ library interface.
- Includes archival versions of the Programmer's Guide and Specification suitable for citation.
Transition from v0.1 to v1.0
In November, 2016, we froze the old UPC++ repository as part of a transition phase. The transition phase is over with the current v1.0 release, and the v0.1 UPC++ repository will remain frozen in maintenance-only mode.
The current release of UPC++ v1.0 implements a majority of the Specification (see ChangeLog section of the README for status). UPC++ leverages GASNet-EX to deliver lower-overhead, fine-grained communication. It is a high-productivity communication library designed to interoperate smoothly and efficiently with MPI, OpenMP, CUDA and AMTs.
UPC++ is also a sounding board for new ideas that may be incorporated in C++20 and beyond, or influence the direction of the efforts. UPC++ v1.0 deploys new capabilities, some of which were experimental in v0.1, removes some and modifies others. The table at the end of this document lists the UPC++ features for v0.1 (left) and planned additions, deletions and changes in v1.0.
UPC++ exposes a PGAS memory model, including one-sided communication (RMA and RPC). However, there are departures from the approaches taken by some predecessors such as UPC. These changes reflect a design philosophy that encourages the UPC++ programmer to directly express what can be implemented efficiently (ie without a need for parallel compiler analysis).
Most operations are non-blocking, and the powerful synchronization mechanisms encourage applications to design for aggressive asynchrony.
All communication is explicit - there is no implicit data motion.
What New Features are in v1.0?
Futures, promises and continuations. Futures are central to handling asynchronous operations: RMA and RPC. Futures are free-standing in that they do not depend on other parts of the library. Whereas v0.1 used an event-based mechanism for expressing task dependencies, v1.0 relies on a continuation-based model instead.
Progress guarantees. Because UPC++ has no internal service threads, the library makes progress only when a core enters an active UPC++ call. UPC++ v1.0 has more well-defined progress semantics than v0.1, especially in multi-threaded scenarios.
Remote atomics were experimental in v0.1 and did not necessarily utilize available hardware support. Any available hardware support will now be leveraged, and the user will see significant performance benefits in certain combinations of hardware and applications. Remote atomics will use the C++11 memory model and free function API. We restrict atomics to fetch and add in the near term, but are evaluating adding others. [Currently under development]
Teams are a mechanism for grouping ranks, and are similar to MPI_Group. Teams play a role in collective communication and also in storage allocation. Initially, we plan to support barriers and reductions for specialized types supported in hardware. Others (such as the vector ‘v’ variants of alltoall) will be added over time. [Currently under development]
Distributed objects. UPC++ v1.0 enables a C++ object of any type to be made into a distributed object, with one instance on every rank of a team. RPC can be used to scalably access remote instances within a team.
Memory kinds. UPC++ will support global operations on memory with different kinds of access methods or performance properties, such as GPUs, HBM, NUMA and NVRAM, while providing a uniform interface for transfers between such memories. [Not yet implemented]
What has been removed from UPC++?
In developing UPC++ v1.0 we also strove for simplicity and we have removed some obsolete features present in v0.1:
Multidimensional arrays (local only). We plan to interoperate with 3rd party solutions for multidimensional arrays.
Distributed shared arrays - this functionality has been subsumed by generalized distributed objects, which provide a more scalable solution.
Blocking communication (e.g. implicit global pointer dereference)
|Futures, Continuations, Promises||✔|
|Events||✔||Subsumed by futures, continuations, promises|
|Put and Get||✔||✔|
|Distributed 1D Arrays||✔||Subsumed by distributed objects|
|Global Pointer Dereference||✔ (Implicit blocking)|
|Memory Kinds (e.g. GPU)||✔|
|Shared Scalar Variables||✔ (Little use)|
|Non-Distributed MD Arrays||✔ ndarray prototype|
|Progress Guarantees||✔||✔ More rigorous|