Implement barrier_async
upcxx::barrier_async
is currently unimplemented.
We could implement a limited version now using gasnet_barrier_{wait,notify}
- the limitation being only one barrier_async can actually be outstanding at a time (this is arguably the most important use case anyhow). However this can be made spec-compliant by just queuing the barrier_async when one is already in-progress, and issuing them serially from the internal progress engine.
Implementing a fully overlapped version of the more general semantic currently in the spec will need to wait until we have the updated team-based reductions in GASNet.
Comments (6)
-
-
Upon further consideration, I'd like to endorse Dan's suggestion of using GASNet's split-phase barrier and serializing them w/i the upc++ progress engine if multiple barriers would overlap.
This is not only the least code to write, but it also has the opportunity to utilize fast RDMA-based barrier algorithms that will significantly outperform a 1-byte all-to-all or anything written using AMs. On BG/Q you even get an off-loaded split-phase barrier.
-
reporter - changed milestone to 2018.09.30 release
In the 1/10/18 meeting, it was agreed to defer this to the Sept release.
-
GASNet-EX is delivering an efficient (potentially offloaded) implementation of
allreduce
over teams in either the 2018.06 or 2018.09 release.That means we will have at least 2 implementation options (the other being Dan's suggested internal serialization of calls to GASNet-EX's team-scoped barrier). So, I believe the 2018.09 release of UPC++ should implement the async barrier call.
-
reporter - marked as blocker
This issue was triaged at the 2018-06-13 Pagoda meeting and assigned a new milestone/priority.
We noted this feature is currently in the spec and relevant to the milestone deliverable. Before release this needs to either be implemented or (as a last resort) removed from spec.
-
reporter - changed status to resolved
Fixed in bdd36b0
- Log in to comment
I am not sure the team-based reductions are required (especially assuming no teams).
I believe that John and I had determined that a non-blocking 1-byte Exchange (a.ka. all-to-all) was sufficient.
That could be implemented today using the GASNet-1 collectives implementation, but that code may be short-lived and so we should discuss this before taking that path.
However, I also seem to recall that John wanted to implement an AM-based dissemination barrier.