Serialization: Full Featured

Goals of HPC caliber serialization:

Fast memory movement of trivial types. This means respecting alignment of types so that word granular movement in and out of the byte stream is possible.
No unnecessary indirection, esp function dispatch (i.e. no virtual serialize()/deserialize()). Template facilities should be used so that as much dispatch as possible is done at compile time.
Nearly all cases I have encountered thus far in HPC have a cheap way of computing a tight upper bound on the amount of bytes they need in the stream. When this is available, serialization can have its buffer allocated once up front, and then none of the writes into the buffer during object traversal need to worry about possibly growing the buffer (which would incur a conditional branch at every byte/word appended). We do not want to make this optimization impossible.
But still support types that do not have tight and cheap upper bounds available. This requires dynamically growing serialization buffers.
Allow the cheap upper bound to be specified at compile time if possible, and runtime otherwise. Knowing the bound at compile times means I don't have to allocate from the heap and instead it can go on the stack or inline in some other heap object.
Support std types out of the box.
Provide syntactic sugar for structs of serializable types (like UPCXX_REFLECTED is currently doing for Steve).
Support types which can't be default constructed. Boost fails here.
Support types which can't be modified after construction (like having const member fields set by the constructor). Boost fails here too.
Retain boost's pattern of serialization hooks accepting the byte stream argument as an unconstrained type parameter.
For convenience, allow multiple forms of template dispatch. The two that come to mind:
- Template class specialization of a class in the upcxx namespace. This is the most general as it allows users to supply serialization support for types for which they are not the author.
- Hooks found via ADL, like friend functions of the user's class. Requires authorship of the type or its enclosing namespace. EDIT: I'm changing my proposal to be a nested class that has the same name and expectations as the above specialization.
Asymmetric types. Boost fails. Deserialzing a serialized T may produce a different type U. upcxx::view<T,Iter> works this way: no matter what Iter is going in, the deserialized view always has the special network buffer iterator for Iter.
boost serializable types should work out of the box through some adaptor layer.

Here's a crack at some of the above but missing upper bounds and type asymmetry.

struct UserType {
  Foo foo;
  Bar bar;

  struct serialization {
    // put "me" into "writer"
    template<typename Writer>
    static void serialize(Writer &w, UserType const &me) {
      w.push(me.foo);
      w.push(me.bar);
    }

    // pull "me" out of "reader" by constructing into provided storage which will be
    // guaranteed big enough.
    template<typename Reader>
    static UserType* deserialize(Reader &r, void *spot) {
      // Order of these matters, which is why they can't be done inline in the constructor
      // call below (argument evaluation order is implementation defined in c++).
      Foo foo = r.pop<Foo>();
      Bar bar = r.pop<Bar>();
      return ::new(spot) UserType{std::move(foo), std::move(bar)};
    }
 };
};

// Template specialization based, user opens our namespace
namespace upcxx { 
  template<>
  struct serialization<UserType> {
    // same contents as UserType::serialization example above
  };
}

Let's add upper bound support. The above example didn't specify upper bound so the Writer passed to serialize would be the dynamically resizing buffer. Once an upper bound is guaranteed, upcxx will instantiate with a different Writer that elides the buffer growing logic.

// More helper types for upper-bound support
namespace upcxx {
  template<typename T>
  struct serialization_complete {
    // This class is how someone "calls" serialization. It is NOT specialized by users. For
    // convenience, the "serialization" class can be missing members (which will obtain
    // default behavior). This class is how a user accesses "serialization" but with all
    // defaults filled in.
  };

  // Note: types like storage_size<auto,auto> aren't legal C++, but you know what I mean.

  // storage_size is a pair of size_t's for the size and alignment of a memory block.
  // Has `cat` operation for computing size of concatenated type (this involves
  // padding arithmetic). The neat thing is that we use templates to track if
  // the size and/or alignment is a compile-time known thing, and the `cat`
  // operation propagates as much compile-time info through the type as it can.
  template<std::size_t static_size = std::size_t(-1), /* default is runtime */,
           std::size_t static_align = std::size_t(-1) /* default is runtime */>
  struct storage_size {
    // Each of these is constexpr static iff their corresponding template parameter is
    // not -1. Runtime fields are only needed if the value isn't statically known.
    /* constexpr static */ std::size_t size, align;

    // Computes storage size of concatenation: this + that. Preserve as much static
    // knowledge as possible.
    storage_size<auto,auto> cat(storage_size<auto,auto> that);

    // sugar
    template<typename T>
    constexpr storage_size<auto,auto> cat_size_of() {
      return this->cat(storage_size_of<T>());
    }

    // sugar
    template<typename T>
    storage_size<auto,auto> cat_ubound_of(T const &x) {
      return serialization<T>().ubound(*this, x);
    }

    // Replicate this space for `n` contiguous elements.
    storage_size<auto,auto> array(std::size_t n);
    template<std::size_t n>
    storage_size<auto,auto> array();
  };

  constexpr storage_size<0,1> empty_storage_size = {};

  // Like quiet-NaN for storage_size. Useful when types which don't have a cheap
  // upper-bound return this.
  constexpr storage_size<std::size_t(-2), std::size_t(-2)> invalid_storage_size = {};

  template<typename T>
  constexpr storage_size<sizeof(T),alignof(T)> storage_size_of() {
    return {};
  }
};

// user provided upper-bound example
namespace upcxx {
  template<>
  struct serialization<UserType> { // open our class like before
    // given upper bound size of existing stream (prefix), return that with my upper bound
    // added to it
    static auto ubound(
        storage_size<auto,auto> prefix,
        UserType const &me) {
      return prefix.cat_ubound_of(me.foo).cat_ubound_of(me.bar);
    }

    // ...`serialize()` and `deserialize()` just like before...
  };
}

So here's what std::vector<T> would like with C++14 for deduced auto return types:

namespace upcxx {
  // Provided helper:
  //  - For T where serialization<T>::ubound(...) returns purely-static storage_size then
  //    we just return that.
  //  - Otherwise: return invalid_storage_size
  template<typename T>
  constexpr storage_size<auto,auto> static_serialization_ubound_of();

  template<typename T>
  struct serialization<std::vector<T>> {
    // This is elegant. If T doesn't have a purely-static ubound, then our return type will be
    // the invalid storage_size (thanks to its quite NaN-like behavior), meaning we can't be
    // cheaply bounded. If T does have purely-static ubound then this will return a non-static
    // storage_size since there is a runtime dependency to get vector size. Ultimately effect:
    // vector<char> is ubound'ed, vector<vector<char>> is not.
    template<typename PrefixSize>
    static auto ubound(PrefixSize prefix, std::vector<T> const &me) {
      return prefix
        /*space for length*/.cat_ubound_of<std::size_t>()
        /*space for elements*/.cat(upcxx::static_serialization_ubound_of<T>().array(me.size()));
    }

    template<typename Writer>
    static void serialize(Writer &w, std::vector<T> const &me) {
      // length
      w.push<std::size_t>(me.size());
      // elements
      for(T const &elt: me)
        w.push<T>(elt);
    }

    template<typename Reader>
    static std::vector<T>* deserialize(Reader &r, void *spot) {
      std::vector<T> *vec = ::new(spot) std::vector<T>;
      // length
      std::size_t n = r.pop<std::size_t>();
      vec->reserve(n);
      // elements
      for(std::size_t i=0; i != n; i++)
        vec->push_back(r.pop<T>());
      return vec;
    }
  };
}

Finally, generalize std::vector<T> for asymmetry support.

namespace upcxx {
  template<typename T>
  struct serialization<std::vector<T>> {
    // `ubound` and `serialize` are same as above

    // internal helper
    using deserialized_T = typename upcxx::deserialized_of<T>::type;

    // Special. Announce that deserialized type differs. If user omits then
    // symmetry assumed.
    using deserialized_type = std::vector<deserialized_T>;

    template<typename Reader>
    static deserialized_type* deserialize(Reader &r, void *spot) {
      std::vector<deserialized_T> *vec = ::new(spot) std::vector<deserialized_T>;
      // length
      std::size_t n = r.pop<std::size_t>();
      vec->reserve(n);
      // elements
      for(std::size_t i=0; i != n; i++)
        vec->push_back(r.pop<T>()); // pop a T, not a deserialized_T, return will still be deserialized_T
      return vec;
    }
}

And we also need the default (non-specialized) implementation of upcxx::serialization_complete<T> to query upcxx::is_definitely_trivially_serializable<T>::value and short-circuit to byte copies if user enables it.

The above is almost general enough for the user to implement something like upcxx::view efficiently. What's lacking is the ability to skip over a serialized T in a reader without doing the work of building it and throwing it away. But this isn't a show stopper for upcxx::view<UserType> since I can bake in "skip" support for known types, and other types (like UserType) the view will know to prefix with an extra byte-count on the wire so it knows how much to jump by for the current element, or if I detect that std::is_trivially_destructible<UserType> is true then I can elide the extra byte count and just deserialize and discard without performance concerns.

And for the super simple struct of serializable fields we want something like UPCXX_REFLECTED which already exists in the implementation, I propose a rename:

struct UserType {
  Foo foo;
  Bar bar;

  // Expands to *something impl-defined* that enables the most general junk:
  //   - Upper bounded if possible as sum of field upper bounds.
  //   - Asymmetry *not possible*. All fields must be symmetric.
  // UB if user uses this and also registers more serialization hooks.
  UPCXX_SERIALIZED_FIELDS(foo, bar)
};

Proposed Reader/Writer concepts. The API marked ADVANCED is optional, just needed to implement upcxx::view.

struct WriterConcept {
  // invoke serialization of T
  template<typename T>
  void push(T const&);

  // Push sequence to buffer. Guaranteed to produce same buffer as
  // doing `push<T>()` in a loop. Possible internal optimizations:
  //  - Uses bulk memcpy when T is trivially serializable and iterator is contiguous.
  //  - When T has ubound(), growable buffer allocates bigger than standard chunk
  //    and can do elements with overflow checks elided.
  template<typename Iter>
  void push_sequence(Iter elts, std::size_t n);

  // Allocate contiguous uninitialized storage in buffer. Growing writers must ensure
  // this pointer remains stable for lifetime of writer. Naive buffer-doubling is therefor
  // not a possible implementation for a growing writer.
  template<std::size_t S, std::size_t A>
  void* place(storage_size<S,A> size);

  // *** ADVANCED ONLY (upcxx::view) ***

  // total bytes written so far. 
  std::size_t size() const;
};

struct ReaderConcept {
  // deserialize T1 (given input serialized was T) from buffer and return it. Requires T1 be moveable.
  template<typename T, typename T1 = typename serialization_complete<T>::deserialized_type>
  T1 pop();

  // deserialize T1 (given T), construct into storage (for all cases including when T1 not moveable) and return constructed pointer.
  template<typename T, typename T1 = typename serialization_complete<T>::deserialized_type>
  T1* pop_into(void *storage);

  // deserialize T1 elements into uninitialized storage array just as a loop of
  // `pop_into<T>()` would. Return pointer to first element.
  template<typename T, typename T1 = ...goop...>
  T1* pop_sequence_into(void *storage, std::size_t n);

  template<std::size_t S, std::size_t A>
  void* unplace(storage_size<S,A> size);

  // *** ADVANCED ONLY (upcxx::view) ***

  // Current byte position of reader
  char* head() const;
  // Byte offset to add to current position
  void jump(std::intptr_t delta);
};

Comments (13)