Redefine Custom Serialization deserialize() for Emplacemnt

Issue #197 resolved
Colin MacLean created an issue

Currently, Custom Serialization requires the user to define custom serialization with a member function in the form:

template <typename Reader>
static UserType* deserialize(Reader& reader, void* storage);

However, the use of void* as the target storage offers limited functionality, especially in the face of the emplace functionality introduced in many C++11 and beyond classes. There isn't uniformity between how emplace works and emplace-type naming in the standard. For instance, some containers have emplace_back or emplace_after.

This limitation with the current deserialization customization point syntax has become an issue with the desire to implement std::optional functionality to allow the user to deserialize into provided storage in order to avoid expensive moves. However, the advantages of in-place construction could also be enabled elsewhere by redefining the customization point to match a different pattern.

Options:

//1
template <typename Reader, typename Storage>
  requires ConstructionHelper<Storage>
static UserType* deserialize(Reader& reader, Storage& storage)
{
   return storage.construct(args...);
}

or

//2
using deserialized_type = ...; //optional type alias to define asymmetry

template <typename Reader, typename Storage>
  requires ConstructionHelper<Storage>
static void deserialize(Reader& reader, Storage&& storage)
{
  storage.construct(args...);
}

The use of a ConstructionHelper provides a mechanism to interface with the backing storage in whatever manner it needs. The old signature would still be allowed, but deprecated, and would not support placement operations. [[deprecated(Custom Serialization uses deprecated signature)]] could be put on a function choosing the old method while attempts to use placement operations with the old signature could be met with a static_assert explaining where to find information on how to update.

Returning pointer to type vs an optional alias to handle asymmetry.

While discussion so far has favored 1), there are several things I don't like about it which I don't think I've conveyed:

  1. Most things users want custom serialization for are going to be symmetric. We already know the address of the storage and can launder void*-type storage.
  2. The user of a type alias makes the intent for asymmetry more explicit.
  3. Most importantly, there are reasons we may want to let users write their own _ConstructionHelper_s and this introduces unnecessary complication.

Allowing users to write their own _ConstructionHelper_s would enable in-place construction whenever the user desires it. For example:

std::vector<A> a;
r.read_into_with([=](auto&&... args) {
  a.emplace_back(std::forward<decltype(args)>(args)...);
  //The user needing to return the address here would be pretty awkward
});
std::vector<A> b = /* ... */;
auto it = begin(b) + 10; // some position within an existing vector
r.read_into_with([=](auto&&... args) {
  b.emplace(it, std::forward<decltype(args)>(args)...);
  ++it;
});

read_into_with would do something like:

template<typename F>
void read_into_with(F&& f) {
  struct local {
    template<typename... Args>
    void construct(Args&&... args) const & {
      func(std::forward<Args>(args)...);
    }

    template<typename... Args>
    void construct(Args&&... args) && {
      std::move(func)(std::forward<Args>(args)...);
    }
    F func;
  };
  /*deserialization_call*/(*this, local{std::forward<F>(f)}); //Keep in mind the lambda may be move-only.
}

If it was just us having to do things like returning nullptr here, the clunkiness required to get 1) to work would be fine. However, allowing a user to write their own helper allows things like emplacing at an iterator position, which is code we can't write for them. Single-direction linked lists have the potential to be particularly troublesome.

A mechanism for this sort of functionality is especially important if we want to enable in-place construction into storage with multiple methods of emplacement, as we would be unable to provide a generic one. If we are going to change Custom Serialization in this way, it offers much more flexibility than just enabling the use of std::optional. It would take very little work on our part to provide.

The main thing 1) has going for it is that it is more similar to the currently required pattern. Even if we don't want to enable broad support for emplacement right now, this is a decision that will impact our ability to deliver a clean interface to do so in the future.

I can get more into the specification specifics after a firm choice is made on the signature required.

Comments (10)

  1. Colin MacLean reporter

    @Dan Bonachea had some concerns in Slack about UB and potential lifetime issues if we don’t return a pointer.

    First, the second parameter, Storage isn’t the actual storage. It would be more accurate to call it a ConstructionHelper. We can create one to do what we need it to, which is the primary advantage of adding this pluggable layer. It is an adapter tailored to the backing storage. In the case of deserializing into void* we would internally have a ConstructionHelper that doesn’t even need to launder from the void*:

    template<typename T>
    struct raw_storage_helper
    {
      raw_storage_helper(void* p)
       : storage(static_cast<T*>(p))
      {}
      template<typename... Args>
      void operator()(Args&&... args)
      {
        storage = ::new (static_cast<void*>(storage)) T(std::forward<Args>(args)...);
      }
      T* storage;
    };
    

    Once the operator() is called, the storage member will have a T* that was assigned from new, not requiring any laundering.

    In the case of deserializing with a void*, instead of:

    T* deserialize_into(void* ptr) {
      return CustomDeserialization::deserialize(*this, ptr);
    }
    

    we would rewrite this as:

    T* deserialize_into(void* ptr) {
      raw_storage_helper<T> helper(ptr);
      CustomDeserialization::deserialize(*this, helper);
      return helper.storage;
    }
    

    The caller of deserialize() can retrieve whatever information it needs from the type of ConstructionHelper it is using after calling deserialize(). It doesn’t have to pass this information out as a return variable but can simply hold the information for once deserialize() is complete.

    In the case of std::optional/upcxx::optional, we would be sending deserialize() a different type of ConstructionHelper. The container type would be providing the lifetime guarantees.

    template<typename T>
    struct optional_construction_helper {
      optional_construction_helper(upcxx::optional<T>& o)
       : optional(std::addressof(o))
      {}
      template<typename... Args>
      void operator()(Args&&... args) {
        optional->emplace(std::forward<Args>(args)...);
      }
      upcxx::optional* optional;
    };
    

    void deserialize_into(upcxx::optional<T>& optional) {
      optional_construction_helper helper(optional);
      CustomDeserialization::(*this, helper);
    }
    

    ConstructionHelper is very powerful. It’s a new customization point that allows whatever glue needs to be inserted to be inserted and to save whatever information is needed to be saved. If someone wanted to do something complicated like insert into a linked and save the iterators at the beginning and end of the inserted data, they could write a ConstructionHelper functor that does this.

  2. Colin MacLean reporter

    Regarding deserialize_into_with, this is functionality has the most utility with a yet-to-be-released public serialization API. I discussed things with @Dan Bonachea and came to the conclusion that deserialize_into_with doesn’t fit within a deserialize(), as it its purpose is to bring in an external context. We shouldn’t publicly provide a deserialize_into_with as a member function of serialization_reader. It is functionality that is desirable in a serialization buffer API, however.

    std::vector<MyHeavyType> existing_populated_vector = /* ... */;
    std::raw_serialization_buffer buf = /* ... */;
    buf.deserialize_into_with<MyHeavyType>([=](auto&&... args) {
      existing_populated_vector.emplace_back(std::forward<decltype(args)>(args)...);
    });
    

  3. Amir Kamil

    Scanning through the current implementation, I think that the current requirements we have on TriviallySerializable types are insufficient to be able to deserialize them into a std::optional. Currently, we use detail::construct_trivial() (defined in utility.hpp), which memcpy's into the target memory and does the appropriate laundering. For types that are DefaultConstructible, it does invoke the default constructor before doing the memcpy.

    It’s not clear to me what the right way to deserialize into a std::optional would be. Here are some possibilities:

    • Invoke emplace() with no arguments to default construct the object, then memcpy into it. This requires the type to be DefaultConstructible, and I’m not sure what if any laundering would be required.
    • Deserialize into a different memory location, then invoke emplace() with the resulting object as an rvalue argument. This requires the type to be MoveConstructible, and it costs an additional move/copy.
    • Reinterpret the raw source bytes as a deserialized object, then invoke emplace() with that object as an rvalue argument. This requires the type to be MoveConstructible. The source bits would also need to have the appropriate alignment for the type. In addition, we’d have to be certain that the source bits are never reused. I’m also not sure how valid this reinterpretation is, but I assume it’s what views over TriviallySerializable types already do.

    There may be other places where the implementation takes shortcuts by assuming we’re deserializing into memory.

  4. Colin MacLean reporter
    1. This has lifetime issues. memcpy ends the lifetime of the old object but doesn’t start the lifetime of the new. optional implementations tend to use the properties of unions, not laundering pointers, so this doesn’t work. Plus it requires trivial destruction.
    2. The compiler could probably optimize out the move, so it’s just a single copy into the final destination.
    3. Creating an rvalue reference from the laundered+reinterpreted source bytes isn’t a move in itself. This would just do a single move into the destination. However, I don’t think a move over a copy gains anything for these trivial types. A copy of bytes into the new object is going to need to happen anyway, it’s just a matter of what happens to the old bytes. Move constructing would just be a bit looser of a requirement.

    I don’t think it’s too much of a burden to require TriviallySerializable types to also be MoveConstructible for std::optional. If a move constructor isn’t available, then custom serialization should be provided that knows how to construct the type in-place. We probably want to handle the case of std::optional<T[N]> ourselves, though.

  5. Amir Kamil

    I don’t think it’s too much of a burden to require TriviallySerializable types to also be MoveConstructible for std::optional.

    Deserializing a non-TriviallySerializable T may end up recursively deserializing a subobject of type U that is TriviallySerializable. So it's not just that U needs to be MoveConstructible to be able to deserialize it directly into an optional. In general, it needs to be MoveConstructible to allow any type that might contain a U to be deserializable into an optional.

    I think the better fix would be to require TriviallySerializable types to be MoveConstructible across the board. We get that already with the subset that are TriviallyCopyable.

  6. Colin MacLean reporter

    T doesn’t require U to be MoveConstructible. All it needs is for the .emplace call to match one of T's constructors. T's constructor is what puts the constraints on U.

    struct internal_t {};
    struct U
    {
      U() : a(0) {};
      U(const U&) = delete;
      U(U&&) = delete;
      U(internal_t, U&& u) : a{u.a} {}
      int a;
    };
    
    struct T
    {
      U u;
    
      T(const T&) = delete;
      T(T&&) = delete;
      T(U&& u) : u(internal_t{}, u) {}
    
      struct upcxx_serialization
      {
        template<typename Reader, typename Storage>
        void deserialize(Reader& r, Storage&& s)
        {
          U u = r.read<U>();
          s.construct(std::move(u));
        }
      };
    };
    
    std::optional<T> o;
    deserialize_into(o);
    

    However, this does bring up a use for deserialize_into_with inside custom serialization. It’s needed for placement construction.

    // Types supporting `std::in_place` should call such constructors rather than assuming MoveConstructible.
    namespace upcxx {
    template<typename... Ts>
    struct serialization<std::variant<Ts...>>
    {
      template<typename Reader, typename Storage>
      void deserialize(Reader& r, Storage&& s)
      {
        size_t index = r.read<size_t>();
        constexpr auto constructor_funcs[] = {+[](Reader& r, Storage&& s) {
          r.deserialize_into_with<Ts>([=](auto&&... args){
            //This would have to be a struct functor for C++11, but is implementable
            //decltype(args) would be U in this case.
            s.construct(std::in_place, std::forward<decltype(args)>(args)...);
          })
        }, ...};
        constructor_funcs[index](r,s);
      }
    };
    }
    
    std::optional<std::variant<T>> v;
    deserialize_into(v);
    

  7. Log in to comment