UPC++ Version 1.0 versus Version 0.1
Transition from v0.1 to v1.0
In November, 2016, we froze the old UPC++ repository as part of a transition phase which ended with the September 2017 release of UPC++ v1.0. The UPC++ v0.1 repository will remain frozen with no further maintenance.
UPC++ v1.0 deploys new capabilities, some of which were experimental in v0.1, removes some and modifies others. The table at the end of this document lists the UPC++ features for v0.1 and additions, deletions and changes in v1.0.
What features have been added relative v0.1?
Futures, promises and continuations. Whereas v0.1 used an event-based mechanism for expressing task dependencies, v1.0 relies on a continuation-based model instead.
Progress guarantees. UPC++ v1.0 has more well-defined progress semantics than v0.1, especially in multi-threaded scenarios.
Remote atomics were experimental in v0.1 and did not necessarily utilize available hardware support. Available hardware support can now be leveraged, and the user sees significant performance benefits in certain combinations of hardware and applications.
Distributed objects. UPC++ v1.0 distributed objects have no direct analogue in v0.1, but they subsume v0.1's distributed shared arrays.
Serialization. UPC++ v1.0 introduces several complementary mechanisms for efficiently passing large and/or complicated data arguments to RPCs.
Non-contiguous RMA. UPC++ v1.0 expands and generalizes the support for non-contiguous RMA relative to v0.1.
Teams represent ordered sets of processes, and are similar to MPI_Group. Teams were experimental in v0.1, but are fully supported in v1.0.
Memory kinds. UPC++ v1.0 provides uniform interfaces for RMA transfers among host and device memories, including a reference implementation for CUDA GPUs. The current release implements accelerated GPU memory transfers on compatible hardware.
What has been removed from UPC++ v0.1?
In developing UPC++ v1.0 we also strove for simplicity and we have removed some obsolete features present in v0.1:
Multidimensional arrays (local only). We strive to interoperate with 3rd party solutions for multidimensional arrays.
Distributed shared arrays - this functionality has been subsumed by generalized distributed objects, which provide a more scalable solution. Also see the
dist_arrayclass template provided as a upcxx-extras extension that implements scalable distributed arrays over
Blocking communication (e.g. implicit global pointer dereference)
|Futures, Continuations, Promises||✔|
|Events||✔||Subsumed by futures, continuations, promises|
|Put and Get||✔||✔|
|Distributed 1D Arrays||✔||Subsumed by distributed objects|
|RPC||✔||✔ Serialization improvements|
|Memory Kinds (e.g. GPU)||✔|
|Progress Guarantees||✔||✔ More rigorous|
|Non-Distributed MD Arrays||✔ ndarray prototype||Compatibility with on-node programming models|
|Shared Scalar Variables||✔||Subsumed by distributed objects and/or collectives|
|Implicit Global Pointer Dereference||✔||v1.0 omits implicit communication|