Serialization of C arrays

Issue #163 resolved
Amir Kamil created an issue

I am forking this off from the discussion in PR #40.

The initial text in PR #40 stated that arrays are not Serializable. After discussion in our 2020-7-15 meeting, we agreed that C arrays should be prohibited as arguments to RPC, but that they should otherwise be Serializable if the elements are. The spec text was subsequently modified to state that the serialization properties of C arrays follow the serialization properties of their elements. Implementation PR 232 added a check for C arrays in RPC, resolving implementation issue 375.

After some more poking at the implementation (see implementation PR 240), I’ve identified more issues with serialization of C arrays. In particular, the implementation prior to that PR is broken when it comes to arrays of non-trivially serializable elements embedded in pairs or tuples. That PR implements some hacks to get them working, but support for arrays within pair/tuple appears to vary by compiler. GCC 9 doesn’t seem to have any issue, Clang breaks on multidimensional arrays in pairs, PGI breaks on arrays in tuples, etc.

The core issue, of course, is that C arrays in most cases cannot be rvalues, and they can never be passed or returned by value. So Reader::read<T[n]>() breaks, trying to construct a pair or tuple by invoking the constructor breaks, and so on.

I advocate that we sidestep these issues by declaring C arrays as non-serializable.

The cases where I see arrays coming up in serialization are:

  • A user tries to directly pass an array to RPC. We’ve already prohibited this.
  • A user tries to return an array by reference from RPC. The validity of this is unresolved. The implementation is currently broken with respect to returning references (due to asymmetric types being broken in RPC return, implementation issue 394), so I do not know if this poses additional implementation problems.
  • A user wants to serialize a pair or tuple that holds an array. Such a type is difficult to work with, not only because of the varying compiler support mentioned above, but also because such pairs and tuples essentially must always be default constructed and then modified. I don’t see this is a likely scenario, and users should be using std::array instead.
  • A user wants their type, which has embedded arrays, to be TriviallySerializable. No problem – either ensure that it is TriviallyCopyable or specialize is_trivially_serializable. Whether or not an array is defined to be Serializable is totally irrelevant here.
  • A user wants to define non-trivial serialization for their class, which has embedded arrays. Then:

    • They can use UPCXX_SERIALIZED_FIELDS, which already blesses arrays members
    • It’s unclear whether they can use UPCXX_SERIALIZED_VALUES if arrays are Serializable, due to array-to-pointer decay. Declaring arrays to be non-serializable sidesteps this problem.
    • They can use custom serialization. As mentioned above, they cannot use read<T[n]>() to read an array field. Instead, they should use read_sequence_into<T>(). (If we decide that arrays are Serializable, they may also be able to use read_into<T[n]>().)

Are there cases I’m missing?

It seems that if we spec arrays to be Serializable, we have to put in a lot of exceptions as to where they cannot be used, and potentially put in a bunch of static_asserts to enforce that. This is as opposed to the two likely cases above (trivial serialization and UPCXX_SERIALIZED_FIELDS), where we already allow arrays even if they are not technically Serializable.

What say you?

Comments (5)

  1. Amir Kamil reporter

    We discussed this in our 2020-07-29 meeting but did not settle on a resolution.

    There was general agreement that arrays should not be passed to UPCXX_SERIALIZED_VALUES, since arrays do not have rvalues.

    The main point of disagreement was whether read_into<T[n]>() (and presumably write<T[n]>() as well?) should be supported. @john bachan would like to keep them, @Dan Bonachea and I were less enthusiastic about it.

    In the interests of finding common ground, here is a new proposal:

    • We declare arrays to be non-Serializable.
    • We explicitly permit arrays for read_into and write, much like we do so for UPCXX_SERIALIZED_FIELDS.

    Thoughts on this?

  2. john bachan

    @Amir Kamil I am ok with this proposal especially considering the UB required to deserialized pair/tuples containing T[n].

  3. Log in to comment