Handling for large static types on the rpc return value path, incl. `dist_object<massive>::fetch()`
This issue is forked from discussion in pull request #245.
Amir and I verbally discussed this PR. We agreed we don't currently have a good story for large-type objects along the RPC return path, including dist_object<massive>::fetch()
. I'm entering an open-ended issue to record that design weakness and collect ideas for improvement.
Ideally we'd get to a point where an RPC callback can return a reference to a large (possibly non-copyable or even non-moveable) type, or a future containing a reference to such a type. Serialization would serialize through the provided reference directly into the network buffer. Back on the initiator, we'd deserialize directly from the network buffer into the heap-allocated promise cell, so that future::result_reference()
points directly at the memory targeted by deserialization. In an ideal scenario, there should be no invocations of the user object's move or copy constructors required to make this happen, and no storing of the object type on the program stack - serialization/deserialization are the only copy we fundamentally need, and happen on growable heap locations that don't suffer from stack overflow.
Currently we fall well short of this goal, performing numerous move operations on the objects, including storing several copies directly on the program stack.
Comments (18)
-
-
Prototype serialization of
shared_ptr
andunique_ptr
(with default Deleter) is in implementation PR 257. Using either results in 0 copies/moves on the return path. -
We also need to think about the case where the user has an existing non-smart pointer to an object and wants to serialize it without handing control to a smart pointer. So something like a
view
that is non-owning on the way in. But it needs to be an owning pointer on the way out (and the user can usereset()
orrelease()
to obtain ownership if they want). So maybe we define something likeshared_ptr_view<T>
andunique_ptr_view<T>
that deserialize asshared_ptr<deserialized_type_t<T>>
andunique_ptr<deserialized_type_t<T>>
, respectively. -
reporter Amir is doing great work on this problem.
However IIUC so far none of it addresses the case of
dist_object<massive>::fetch()
, unless the user changes his data structure to replace this withdist_object<shared_ptr<massive>>::fetch()
, or we introduce adist_object::fetch_boxed()
method that boxes the object it fetches into ashared_ptr
(hopefully with minimal copying on both sides).While we are at it we should probably change the implementation of
dist_object<T>::fetch()
to return a const lvalue ref from the RPC, to avoid a possible copy on the return path.. -
Straw-man proposal for pointer views:
template<typename T>
/*unspecified*/ make_shared_ptr_view(const T *ptr);
Returns a pointer view over
ptr
. This pointer view is serialized by serializing*ptr
, and the result of deserializing is managed by astd::shared_ptr<deserialized_type_t<T>>
.template<typename T>
/*unspecified*/ make_unique_ptr_view(const T *ptr);
Returns a pointer view over
ptr
. This pointer view is serialized by serializing*ptr
, and the result of deserializing is managed by astd::unique_ptr<deserialized_type_t<T>>
.Prototype implementation:
template<template<typename...> typename Pointer, typename T> struct ptr_view { using element_type = T; ptr_view(const T *ptr_) : ptr(ptr_) {} const T* get() const { return ptr; } private: const T* ptr; }; template<typename T> ptr_view<std::shared_ptr, T> make_shared_ptr_view(const T *ptr) { return ptr_view<std::shared_ptr, T>(ptr); } template<typename T> ptr_view<std::unique_ptr, T> make_unique_ptr_view(const T *ptr) { return ptr_view<std::unique_ptr, T>(ptr); } template<template<typename...> typename Pointer, typename T> struct serialization<ptr_view<Pointer, T>>: serialization_smart_pointer<ptr_view<Pointer, T>, Pointer<T>> {};
Example test case:
upcxx::rpc(target, [](std::unique_ptr<T> x) -> upcxx::ptr_view<std::shared_ptr, T> { return upcxx::make_shared_ptr_view(&global); }, upcxx::make_unique_ptr_view(&global) ).wait_reference(); T::show_stats("ptr_view<std::unique_ptr, T> -> ptr_view<std::shared_ptr, T>", 2, 0);
-
reporter The strawman looks interesting, we should probably discuss in real-time.
My initial thought is we'd need to be very clear about ownership on both sides:
- IIUC you're proposing that
*unspecified*
created bymake_{shared,unique}_ptr_view(T *ptr)
is a non-owning view over*ptr
- implying the user is responsible for keeping it alive until after Serialization. - On the deserialized size, the result is an owning view. We'd need to be very clear about this ownership dichotomy to avoid confusion.
- This is reasonable but might be insufficient for cases where the RPC constructs an object for the sole purpose of returning it - as they'd have no convenient way to reap the object; although I guess in that case they probably should have just returned a
unique_ptr<T>
? Does this mean the only use case forptr_view
is for returning an object that is persistent (or lifetime managed elsewhere) but doesn't already live in ashared_ptr
?
- IIUC you're proposing that
-
Does this mean the only use case for
ptr_view
is for returning an object that is persistent (or lifetime managed elsewhere) but doesn't already live in ashared_ptr
?Yes, that is exactly the intended use case. We currently recommend using a plain
view
for this when sending an RPC but do not have an answer for the return path. -
This doesn’t depend on
shared_ptr
orunique_ptr
being Serializable. So if we’re worried about the implications of that, we can just provide the pointer views, and the user can manually serialize a smart pointersp
withmake_{shared,unique}_ptr_view(sp.get())
. -
reporter - changed milestone to 2021.3.0 release
Mass roll-over of open issues to next release milestone
-
reporter - changed milestone to 2021.9.0 release
-
assigned issue to
- changed version to Development branch (master)
Mass roll-over of open issues to next release milestone
-
@Dan Bonachea asked:
Does this mean the only use case for
ptr_view
is for returning an object that is persistent (or lifetime managed elsewhere) but doesn't already live in ashared_ptr
?If we decide that smart pointers are not serializable, we can also provide
ptr_view
s over smart pointers. Something like:template<typename InPointer, template<typename...> typename OutPointer, typename T> struct ptr_view { using element_type = T; ptr_view(InPointer ptr) : ptr(std::move(ptr_)) {} const T& get_ref() const { return *ptr; } private: InPointer ptr; }; template<typename T> ptr_view<const T*, std::shared_ptr, T> make_shared_ptr_view(const T *ptr) { return ptr_view<const T*, std::shared_ptr, T>(ptr); } template<typename InPointer, typename T = typename InPointer::element_type> ptr_view<InPointer, std::shared_ptr, T> make_shared_ptr_view(InPointer ptr) { return ptr_view<InPointer, std::shared_ptr, T>(std::move(ptr)); } template<typename T> ptr_view<const T*, std::unique_ptr, T> make_unique_ptr_view(const T *ptr) { return ptr_view<const T*, std::unique_ptr, T>(ptr); } template<typename InPointer, typename T = typename InPointer::element_type> ptr_view<InPointer, std::unique_ptr, T> make_unique_ptr_view(InPointer ptr) { return ptr_view<InPointer, std::unique_ptr, T>(std::move(ptr)); }
The user can then serialize a shared pointer
sp
withmake_shared_ptr_view(sp)
, and a unique pointerup
withmake_unique_ptr_view(std::move(up))
if they are done with the object (or withup.get()
if they are not). -
@Dan Bonachea said:
However IIUC so far none of it addresses the case of
dist_object<massive>::fetch()
.Agreed. Let me put forth a completely different proposal. In impl PR 345, I suggested for a completely unrelated reason:
Another option would be to add overloads of the fetching ops that take a target
T*
and return an empty future. This does increase the size of the interface, but it should be easy to implement.Let me expand this to suggest that we provide
T*
versions of all of fetching atomics,rpc
, anddist_object<T>::fetch
. (Atomics are not relevant to this issue, but this proposal would solve two problems with one stone.) I suggest adding the suffix_into
to the names of these variants for clarity, even though it may not be strictly necessary for overload resolution.Example atomic:
template <typename T> template<typename Cx=/*unspecified*/> RType atomic_domain <T>::load_into( global_ptr<const T> src, T *dest, std::memory_order order, Cx &&completions=operation_cx::as_future()) const;
Additional preconditions:
dest
must point to a valid object of typeT
Example for
rpc
/fetch
:template<typename T, typename Func, typename ...Args> RType rpc_into(intrank_t recipient, T *dest, Func &&func, Args &&...args);
Additional preconditions:
RetType
must not bevoid
T
must be assignable fromdeserialized_t<RetType>&&
, ordeserialized_t<U>&&
ifRetType
is of the formfuture<U>
dest
must point to a valid object of typeT
For all of these
_into
ops, operation completion does not produce a value.Not sure about the ordering of arguments. We can make the
_into
ops consistent by havingdest
be the first argument. But then the atomic ops would have inconsistent ordering with respect to RMA/copy.This proposal would more significantly increase the interface and likely entail more implementation work than the
ptr_view
proposal above. But it would solve more problems.I don’t think, however, that this proposal handles the case where the user wants to return a massive object whose lifetime is ending from rpc. We may still need to provide an owning view for that case.
Thoughts?
-
I’ve been thinking some more about the pointer view proposal. I think we have a good answer for the interface for returning from an rpc. I’m less sure about what deserialized result makes the most sense. Let’s first consider the case of a single object
T
, whereU = deserialized_type_t<T>
. The options I see are:-
Operation completion results in
U
.- PRO: In the common case of a single operation-completion notification, the result gets deserialized directly into a promise/lpc cell.
- PRO:
future<U>
is already ref-counted, so the user getsshared_ptr
-like semantics for free. - CON: Stack storage is always allocated for the result in
lpc_dormant<T…>::awaken()
, even if it’s not used. This is a dealbreaker for large types.
-
Operation completion results in
shared_ptr<U>
:- PRO: No stack allocation for
T
orU
. - PRO: The user can keep around a copy of the shared pointer to keep the result object alive, even after dropping the future/promise/etc.
- CON: Additional shared-pointer overheads (control block, extra level of indirection).
- CON: The user cannot take ownership of the underlying pointer –
future<shared_ptr<U>>::result()
returns the shared pointer by reference to const, which doesn’t allowreset()
orswap()
to be called on it.
- PRO: No stack allocation for
-
Operation completes results in
unique_ptr<U>
:- PRO: No stack allocation for
T
orU
. - PRO: Less overhead than
shared_ptr<U>
. - CON: The user has to keep around the future/promise as long as they want to access the underlying object. (Not an issue for LPC completion, which receives the result as an rvalue ref.)
- CON: The user cannot take ownership of the underlying pointer.
- PRO: No stack allocation for
Construction/destruction of the underlying object is also an issue in the latter two cases. We can either:
- Default construct
U
, manually destruct it, then deserialize into the now-uninitialized storage. The default deleter will correctly delete the underlying object. However, this requiresU
to be DefaultConstructible, is inefficient, and probably results in memory-laundering issues. - Allocate uninitialized storage and deserialize into it. We would need to provide our own deleter that correctly destructs and deletes the object. Not a big problem for
shared_ptr<U>
, which stores a type-erased deleter. More of a problem forunique_ptr<U, Deleter>
, where the deleter type is a template argument and must be exposed to the user.
For arrays of
T
, we need to convey length information to the user. There are lots of choices for the result –shared_ptr<pair<U, int>>
,pair<shared_ptr<U>, int>
,vector<U>
, etc. Otherwise, the issues are similar to the above. -
-
Proposed implementation in impl PR 351. Only deserializing into a shared_ptr is currently supported, but both individual objects and array of objects are allowed.
-
reporter - changed milestone to 2022.3.0 release
Mass roll-over of open issues to next release milestone
-
reporter - changed milestone to 2022.9.0 release
Mass roll-over of open issues to next release milestone
-
reporter - changed milestone to 2023.3.0 release
Mass roll-over of open issues to next release milestone
-
reporter - changed milestone to 2023.9.0 release
Mass roll-over of open issues to next release milestone
- Log in to comment
Our story for the forward path is to use a
view
over the large object. This doesn’t work for the return path due to lifetime issues. We don’t allow an RPC to return aview
– returning a non-owningview
from an RPC would leak its memory, as the user would have no mechanism for reclaiming the memory post-serialization. Similarly, there isn’t an obvious solution for a non-owningview
returned to the initiator by a completion notification.Perhaps the best solution right now is to return a
vector
containing the large object? Moving avector
is cheap, and it cleans up its own memory. We could also makeshared_ptr
/unique_ptr
serializable to accomplish the same thing.A crazier idea would be to expose
serialization_reader
/serialization_writer
types to the user. The user could write their objects into aserialization_writer
, return that, get aserialize_reader
on the other end, and manually read out the objects. Not sure if this would be better than avector
/shared_ptr
/unique_ptr
+tuple
containing the objects.