- changed milestone to 2019.03.31 release
dist_object, garbage collection, and quiescence management
We know we want to give our users access to a fancy auto-quiescing algorithm. My apprehension has always stemmed from not knowing what flavor of quiescence is general enough to cover at least all the use cases I have personally desired. This proposal satisfies my needs.
I propose a variant of dist_object
called dist_gcref<T>
which functions in a very similar way. Like dist_object:
It is constructed collectively, it's magically translated to local instances over rpc
, and therpc
stalls for remote construction if arrived early. Semantically it differs in that its a a refcounted pointer to T
, just like std::shared_ptr<T>
. And, magically, the T
is only destructed after all ranks in the team have dropped all references and there are no rpc
's in flight referencing the object.
I see this as extremely useful for creating temporarily shared data structures that are assigned to once, read by many remotely, and just magically destruct when all reads are done. Consider a ghost zone exchange:
// 1D stencil grid
double mesh[102]; // 100 interior cells, 1 left, 1 right ghost cell
void advance() {
// publish my left and right ghost cells for peers
auto left = upcxx::make_dist_gcref<double>(mesh[1]);
auto rght = upcxx::make_dist_gcref<double>(mesh[100]);
// collect peer ghost values
mesh[0] = rght.fetch(rank_me()-1).wait();
mesh[101] = left.fetch(rank_me()+1).wait();
for(int i=1; i <= 100; i++)
mesh[i] = mesh[i-1] + mesh[i+1];
// no barrier, no counter!!!
}
When a user wants to quiesce a region of code:
// I'm using dist_gcref<std::tuple<>> as the "no-data" object whose only
// meaningful quality is its lifetime. We could consider permitting dist_gcref<void>
// or dist_gcref<>.
void foo1() {
auto region = upcxx::make_dist_gcref<std::tuple<>>({});
// dist_gcref<T>::when_dead() returns future indicating object is globally dead.
future<> done = region.when_dead();
// do some communication, all rpc's must carry `region` reference to keep it live.
upcxx::rpc_ff(rank_me()+1, [](dist_gcref<tuple<>> region, ...) { ... }, region, ...);
// drop our reference as a vote to close region
region = nullptr;
// spin on progress until region quiesced
done.wait();
}
// same thing, different style
void foo2() {
future<> done; {
auto region = upcxx::make_dist_gcref<std::tuple<>>({});
done = region.when_dead();
// do some communication, all rpc's must carry `region` reference to keep it live.
upcxx::rpc_ff(rank_me()+1, [](dist_gcref<tuple<>> region, ...) { ... }, region, ...);
// `region` goes out of local scope here...
}
// spin on progress until region quiesced
done.wait();
}
I think its important that we adopt reference semantics as opposed to the RIAA of dist_object
. Conceivably, rpcs could "awaken" objects that a rank previously killed its local RIAA instance of, for instance by inserting a task into a progress queue. It would be awkward to fabricate new RIAA instances as clones of long dead ones.
I really like this design because it permits:
- The ghost zone example which is aggressively synchronization-free beyond any other quiescing pattern we've investigated.
- Quiescence of named regions, which may overlap.
- A single rpc can participate in multiple regions by carrying multiple references.
- Quiescence over teams since we would support
dist_gcref
over teams, just likedist_object
.
A drawback is that it does not permit software module A to quiesce module B when B wasn't written with an understanding of this idiom. I do not think this is dire. I equate this to good MPI programmers knowing to take a communicator input and not assume MPI_COMM_WORLD
.
Comments (2)
-
-
- changed milestone to Deferred indefinitely
This issue was triaged at the 2019-07-24 Pagoda issue meeting and assigned a new milestone.
- Log in to comment
This issue was triaged at the 2018-06-13 Pagoda meeting and assigned a new milestone/priority.