Handling for large static types on the rpc return value path, incl. `dist_object<massive>::fetch()`

Issue #165 new
Dan Bonachea created an issue

This issue is forked from discussion in pull request #245.

Amir and I verbally discussed this PR. We agreed we don't currently have a good story for large-type objects along the RPC return path, including dist_object<massive>::fetch(). I'm entering an open-ended issue to record that design weakness and collect ideas for improvement.

Ideally we'd get to a point where an RPC callback can return a reference to a large (possibly non-copyable or even non-moveable) type, or a future containing a reference to such a type. Serialization would serialize through the provided reference directly into the network buffer. Back on the initiator, we'd deserialize directly from the network buffer into the heap-allocated promise cell, so that future::result_reference() points directly at the memory targeted by deserialization. In an ideal scenario, there should be no invocations of the user object's move or copy constructors required to make this happen, and no storing of the object type on the program stack - serialization/deserialization are the only copy we fundamentally need, and happen on growable heap locations that don't suffer from stack overflow.

Currently we fall well short of this goal, performing numerous move operations on the objects, including storing several copies directly on the program stack.

Comments (18)

  1. Amir Kamil

    Our story for the forward path is to use a view over the large object. This doesn’t work for the return path due to lifetime issues. We don’t allow an RPC to return a view – returning a non-owning view from an RPC would leak its memory, as the user would have no mechanism for reclaiming the memory post-serialization. Similarly, there isn’t an obvious solution for a non-owning view returned to the initiator by a completion notification.

    Perhaps the best solution right now is to return a vector containing the large object? Moving a vector is cheap, and it cleans up its own memory. We could also make shared_ptr/unique_ptr serializable to accomplish the same thing.

    A crazier idea would be to expose serialization_reader/ serialization_writer types to the user. The user could write their objects into a serialization_writer, return that, get a serialize_reader on the other end, and manually read out the objects. Not sure if this would be better than a vector/shared_ptr/unique_ptr + tuple containing the objects.

  2. Amir Kamil

    We also need to think about the case where the user has an existing non-smart pointer to an object and wants to serialize it without handing control to a smart pointer. So something like a view that is non-owning on the way in. But it needs to be an owning pointer on the way out (and the user can use reset() or release() to obtain ownership if they want). So maybe we define something like shared_ptr_view<T> and unique_ptr_view<T> that deserialize as shared_ptr<deserialized_type_t<T>> and unique_ptr<deserialized_type_t<T>>, respectively.

  3. Dan Bonachea reporter

    Amir is doing great work on this problem.

    However IIUC so far none of it addresses the case of dist_object<massive>::fetch(), unless the user changes his data structure to replace this with dist_object<shared_ptr<massive>>::fetch(), or we introduce a dist_object::fetch_boxed() method that boxes the object it fetches into a shared_ptr (hopefully with minimal copying on both sides).

    While we are at it we should probably change the implementation of dist_object<T>::fetch() to return a const lvalue ref from the RPC, to avoid a possible copy on the return path..

  4. Amir Kamil

    Straw-man proposal for pointer views:

    template<typename T>
    /*unspecified*/ make_shared_ptr_view(const T *ptr);

    Returns a pointer view over ptr. This pointer view is serialized by serializing *ptr, and the result of deserializing is managed by a std::shared_ptr<deserialized_type_t<T>>.

    template<typename T>
    /*unspecified*/ make_unique_ptr_view(const T *ptr);

    Returns a pointer view over ptr. This pointer view is serialized by serializing *ptr, and the result of deserializing is managed by a std::unique_ptr<deserialized_type_t<T>>.

    Prototype implementation:

      template<template<typename...> typename Pointer, typename T>
      struct ptr_view {
        using element_type = T;
        ptr_view(const T *ptr_) : ptr(ptr_) {}
        const T* get() const {
          return ptr;
        }
      private:
        const T* ptr;
      };
    
      template<typename T>
      ptr_view<std::shared_ptr, T> make_shared_ptr_view(const T *ptr) {
        return ptr_view<std::shared_ptr, T>(ptr);
      }
      template<typename T>
      ptr_view<std::unique_ptr, T> make_unique_ptr_view(const T *ptr) {
        return ptr_view<std::unique_ptr, T>(ptr);
      }
    
      template<template<typename...> typename Pointer, typename T>
      struct serialization<ptr_view<Pointer, T>>:
        serialization_smart_pointer<ptr_view<Pointer, T>,
                                    Pointer<T>> {};
    

    Example test case:

      upcxx::rpc(target,
        [](std::unique_ptr<T> x) -> upcxx::ptr_view<std::shared_ptr, T> {
          return upcxx::make_shared_ptr_view(&global);
        },
        upcxx::make_unique_ptr_view(&global)
      ).wait_reference();
      T::show_stats("ptr_view<std::unique_ptr, T> -> ptr_view<std::shared_ptr, T>", 2, 0);
    

  5. Dan Bonachea reporter

    The strawman looks interesting, we should probably discuss in real-time.

    My initial thought is we'd need to be very clear about ownership on both sides:

    • IIUC you're proposing that *unspecified* created by make_{shared,unique}_ptr_view(T *ptr) is a non-owning view over *ptr - implying the user is responsible for keeping it alive until after Serialization.
    • On the deserialized size, the result is an owning view. We'd need to be very clear about this ownership dichotomy to avoid confusion.
    • This is reasonable but might be insufficient for cases where the RPC constructs an object for the sole purpose of returning it - as they'd have no convenient way to reap the object; although I guess in that case they probably should have just returned a unique_ptr<T>? Does this mean the only use case for ptr_view is for returning an object that is persistent (or lifetime managed elsewhere) but doesn't already live in a shared_ptr?
  6. Amir Kamil

    Does this mean the only use case for ptr_view is for returning an object that is persistent (or lifetime managed elsewhere) but doesn't already live in a shared_ptr?

    Yes, that is exactly the intended use case. We currently recommend using a plain view for this when sending an RPC but do not have an answer for the return path.

  7. Amir Kamil

    This doesn’t depend on shared_ptr or unique_ptr being Serializable. So if we’re worried about the implications of that, we can just provide the pointer views, and the user can manually serialize a smart pointer sp with make_{shared,unique}_ptr_view(sp.get()).

  8. Amir Kamil

    @Dan Bonachea asked:

    Does this mean the only use case for ptr_view is for returning an object that is persistent (or lifetime managed elsewhere) but doesn't already live in a shared_ptr?

    If we decide that smart pointers are not serializable, we can also provide ptr_views over smart pointers. Something like:

      template<typename InPointer, template<typename...> typename OutPointer, typename T>
      struct ptr_view {
        using element_type = T;
        ptr_view(InPointer ptr) : ptr(std::move(ptr_)) {}
        const T& get_ref() const {
          return *ptr;
        }
      private:
        InPointer ptr;
      };
    
      template<typename T>
      ptr_view<const T*, std::shared_ptr, T> make_shared_ptr_view(const T *ptr) {
        return ptr_view<const T*, std::shared_ptr, T>(ptr);
      }
      template<typename InPointer, typename T = typename InPointer::element_type>
      ptr_view<InPointer, std::shared_ptr, T> make_shared_ptr_view(InPointer ptr) {
        return ptr_view<InPointer, std::shared_ptr, T>(std::move(ptr));
      }
      template<typename T>
      ptr_view<const T*, std::unique_ptr, T> make_unique_ptr_view(const T *ptr) {
        return ptr_view<const T*, std::unique_ptr, T>(ptr);
      }
      template<typename InPointer, typename T = typename InPointer::element_type>
      ptr_view<InPointer, std::unique_ptr, T> make_unique_ptr_view(InPointer ptr) {
        return ptr_view<InPointer, std::unique_ptr, T>(std::move(ptr));
      }
    

    The user can then serialize a shared pointer sp with make_shared_ptr_view(sp), and a unique pointer up with make_unique_ptr_view(std::move(up)) if they are done with the object (or with up.get() if they are not).

  9. Amir Kamil

    @Dan Bonachea said:

    However IIUC so far none of it addresses the case of dist_object<massive>::fetch().

    Agreed. Let me put forth a completely different proposal. In impl PR 345, I suggested for a completely unrelated reason:

    Another option would be to add overloads of the fetching ops that take a target T* and return an empty future. This does increase the size of the interface, but it should be easy to implement.

    Let me expand this to suggest that we provide T* versions of all of fetching atomics, rpc, and dist_object<T>::fetch. (Atomics are not relevant to this issue, but this proposal would solve two problems with one stone.) I suggest adding the suffix _into to the names of these variants for clarity, even though it may not be strictly necessary for overload resolution.

    Example atomic:

    template <typename T>
    template<typename Cx=/*unspecified*/>
    RType atomic_domain <T>::load_into(
        global_ptr<const T> src, T *dest, std::memory_order order,
        Cx &&completions=operation_cx::as_future()) const;
    

    Additional preconditions:

    • dest must point to a valid object of type T

    Example for rpc/fetch:

    template<typename T, typename Func, typename ...Args>
    RType rpc_into(intrank_t recipient, T *dest, Func &&func, Args &&...args);
    

    Additional preconditions:

    • RetType must not be void
    • T must be assignable from deserialized_t<RetType>&&, or deserialized_t<U>&& if RetType is of the form future<U>
    • dest must point to a valid object of type T

    For all of these _into ops, operation completion does not produce a value.

    Not sure about the ordering of arguments. We can make the _into ops consistent by having dest be the first argument. But then the atomic ops would have inconsistent ordering with respect to RMA/copy.

    This proposal would more significantly increase the interface and likely entail more implementation work than the ptr_view proposal above. But it would solve more problems.

    I don’t think, however, that this proposal handles the case where the user wants to return a massive object whose lifetime is ending from rpc. We may still need to provide an owning view for that case.

    Thoughts?

  10. Amir Kamil

    I’ve been thinking some more about the pointer view proposal. I think we have a good answer for the interface for returning from an rpc. I’m less sure about what deserialized result makes the most sense. Let’s first consider the case of a single object T, where U = deserialized_type_t<T>. The options I see are:

    1. Operation completion results in U.

      1. PRO: In the common case of a single operation-completion notification, the result gets deserialized directly into a promise/lpc cell.
      2. PRO: future<U> is already ref-counted, so the user gets shared_ptr-like semantics for free.
      3. CON: Stack storage is always allocated for the result in lpc_dormant<T…>::awaken(), even if it’s not used. This is a dealbreaker for large types.
    2. Operation completion results in shared_ptr<U>:

      1. PRO: No stack allocation for T or U.
      2. PRO: The user can keep around a copy of the shared pointer to keep the result object alive, even after dropping the future/promise/etc.
      3. CON: Additional shared-pointer overheads (control block, extra level of indirection).
      4. CON: The user cannot take ownership of the underlying pointer – future<shared_ptr<U>>::result() returns the shared pointer by reference to const, which doesn’t allow reset() or swap() to be called on it.
    3. Operation completes results in unique_ptr<U>:

      1. PRO: No stack allocation for T or U.
      2. PRO: Less overhead than shared_ptr<U>.
      3. CON: The user has to keep around the future/promise as long as they want to access the underlying object. (Not an issue for LPC completion, which receives the result as an rvalue ref.)
      4. CON: The user cannot take ownership of the underlying pointer.

    Construction/destruction of the underlying object is also an issue in the latter two cases. We can either:

    1. Default construct U, manually destruct it, then deserialize into the now-uninitialized storage. The default deleter will correctly delete the underlying object. However, this requires U to be DefaultConstructible, is inefficient, and probably results in memory-laundering issues.
    2. Allocate uninitialized storage and deserialize into it. We would need to provide our own deleter that correctly destructs and deletes the object. Not a big problem for shared_ptr<U>, which stores a type-erased deleter. More of a problem for unique_ptr<U, Deleter>, where the deleter type is a template argument and must be exposed to the user.

    For arrays of T, we need to convey length information to the user. There are lots of choices for the result – shared_ptr<pair<U, int>>, pair<shared_ptr<U>, int>, vector<U>, etc. Otherwise, the issues are similar to the above.

  11. Amir Kamil

    Proposed implementation in impl PR 351. Only deserializing into a shared_ptr is currently supported, but both individual objects and array of objects are allowed.

  12. Log in to comment