dist_object should provide an accessor for copies of other ranks' portion of the dist_object
The following is likely to be a common bootstrapping paradigm:
dist_object<global_ptr<T>> pointers(my_gptr);
auto f = rpc(some_rank, [](global_ptr<T> gptr) {
return gptr;
});
...
wait(f);
global_ptr<T> remote = f.result();
We should provide a method for doing the RPC to obtain a copy of another rank's piece of the dist_object. Suggestion:
template<typename T>
future<T> dist_object<T>::get(intrank_t rank) const; // or operator[] or at()
Preconditions: rank
must be a valid ID in the team associated with the dist_object
. T
must be Serializable. The dist_object
must not have been destroyed on rank
.
Returns a future representing the value of rank
's portion of the distributed object.
We should also consider adding an accessor to obtain a reference to the team
over which the dist_object
was created.
Comments (6)
-
reporter -
All of this sounds worth providing to users, but we should consider whether it belongs in a different class. dist_object methods currently never require communication and can be guaranteed to use constant space (per process) and run in constant time, which seems like a nice property to maintain. The class is minimal but fast and can be used to efficiently build higher-level constructs that may be more expensive.
Perhaps there should be a dist_array<T> which is implemented over the dist_object API (probably as a dist_object<global_ptr<T>>) that adds communication operations and caching? This class's methods would often require communication, and could consume non-trivial space, depending on the caching mechanism. It could even expose algorithmic options to adjust the space/time/scalability tradeoff (eg use a collectively-constructed full directory for jobs under P processes, and a cache with Q entries for larger jobs). It should also provide the global naming capability of dist_object (and RPC tie-in) either via inheritance or wrappers.
-
- changed component to Distributed Objects
-
See also issue
#23 -
Account Deleted So this is fetch(). I think it went unnoticed, but fetch() does exist as a method on dist_object in the implementation. Can be removed if we prefer the free function form upcxx::fetch but that's subject to our
using namespace upcxx
calamity of occupying useful words which other libraries might take offense too (see how future.wait() was preferred to upcxx::wait(future)).The RDMA into a dist_object capability is not on the table since that would require us to internally address the bootstrapping problem of sharing the global_ptrs, which is what we're solving with dist_object. As Dan said, lets keep this thing focused.
-
reporter - changed status to resolved
Document dist_object<T>::fetch(). This fixes Issue
#89and Issue#117.→ <<cset f1172137a6fa>>
- Log in to comment
When discussing
dist_object
s with @yelick, she mentioned that she would like us to provide convenience mechanisms for reading and writing remote pieces of thedist_object
. She would also like this to turn into RDMA ops where possible. So in addition toget()
above (which we should probably rename torget()
), we should provide anrput()
method as well:Other things we should consider:
dist_object
lives in the shared segment and provide a method for obtaining aglobal_ptr
to it.dist_object
can implement a coarray. Then provide versions of rput/rget that can source or target a rank anddist_object
. It's not clear what this should look like.The higher-level point is that we want to make
dist_object
as easy to use as possible and provide good performance, without giving up scalability.