Serialization of references and cv-qualified types

Issue #159 resolved

Amir Kamil created an issue 2020-05-24

The spec is currently silent on how serialization interacts with references and cv-qualified types. The implementation treats references as non-serializable. I do not know how it treats cv-qualified types.

There is relevant discussion in implementation issue 367. The discussion of RPC in spec issue ~~#149~~ is also relevant.

References

I see two main options here.

Option 1

References are not serializable. This is the status quo in the implementation.

Option 2

References are serialized by serializing the referent.

To avoid lifetime issues with the deserialized object, deserialization should produce an object by value. This means that serialization of references is asymmetric. The actual rule would be as follows:

A reference type T /* cv qualifiers */ & or T /* cv qualifiers */ && is Serializable when T is Serializable. Such a type is serialized by serializing the referent, and deserialization produces an object of non-reference type T. Thus, deserialized_type_t<T /* cv qualifiers */ &> and deserialized_type_t<T /* cv qualifiers */ &&> are both T.

A reference type is never TriviallySerializable.

Common Constraints

For both options, it should be erroneous to specialize is_trivially_serializable or serialization for reference types. (The latter is possible to support, but I don’t see a good reason to allow it.)

CV Qualifiers

This is a bit thornier.

For background, my reading of the documentation on TriviallyCopyable and scalar types is that cv-qualified scalar types should be TriviallyCopyable (and therefore TriviallySerializable). Less clear is class types that are TriviallyCopyable but with cv-qualification.

Strangely, Clang and GCC on my Mac differ with respect to is_trivially_copyable_v<int volatile> and is_trivially_copyable_v<int const volatile>, with Clang reporting true but GCC false. However, both report true when int is replaced by a trivial class type.

Aside from TrivallyCopyable, a user may declare a type to be TriviallySerializable by specializing is_trivially_serializable. However, specializing this for an unqualified type T does not automatically specialize it for cv-qualified T. This is fixable (if we decide it is the right semantics) by providing our own partial specializations of is_trivially_serializable:

template<class T>
struct is_trivially_serializable<T const> {
  static const bool value = foo<T>::value;
};

template<class T>
struct is_trivially_serializable<T volatile> {
  static const bool value = foo<T>::value;
};

template<class T>
struct is_trivially_serializable<T const volatile> {
  static const bool value = foo<T>::value;
};

We can also follow a similar strategy with serialization, whatever we decide the right semantics for cv-qualified types are.

I don’t really have a strong opinion at the moment on what the right semantics are for cv-qualified types. Some things to consider:

Global pointers cannot currently have cv qualifiers on the element types (Issue ~~#51~~). But if they are allowed, we would need to make sure that rput/rget have the right signatures to allow transfers from/to pointers with cv qualification.
CV qualifiers are not semantically meaningful when copying an object by value. So I don’t see a good argument for maintaining them through serialization.

I think what to do with cv qualifiers is a lower priority, since they are not likely to be used at the top-level in function arguments or return types.

Comments (8)

Dan Bonachea
@Amir Kamil - Thanks for writing this up.

Regarding references, I'm currently in favor of your "option 2" - however I have some minor quibbles on the proposed text:

Thus, deserialized_type_t<T /* cv qualifiers */ &> and deserialized_type_t<T /* cv qualifiers */ &&> are both T.

This is inconsistent with current asymmetric deserialization rules. I assume you actually meant something like:

Thus, deserialized_type_t<T /* cv qualifiers */ &> and deserialized_type_t<T /* cv qualifiers */ &&> are both deserialized_type_t<T>.

I think it's also worth adding a recursive example to emphasize the implications, eg:

For example, deserialized_type_t<std::tuple<char &&, int const &, double volatile &>> is std::tuple<char, int, double>

Regarding cv-qualification, my opinion is that serialization is semantically about object values and cv-qualification is irrelevant to values, it only affects how those values are read and written from storage. So I agree with the proposal for making is_trivially_serializable agnostic to cv-qualification. It's worth noting the C++ committee is slowly moving towards deprecation/removal of volatile and volatile should be irrelevant to best-practice UPC++ programming, so I think we're mostly just talking about const here.

CV qualifiers are not semantically meaningful when copying an object by value.

I agree with this sentence. Top-level qualification is probably already irrelevant - in particular Lvalue-to-rvalue conversion [conv.lval] invoked when passing rpc arguments by value already strips top-level cv-qualifiers from non-class types. We already allow T const & input arguments to rpc to invoke a callback that takes the T by value or rvalue ref T && - for example, all of the following pairings are currently permitted:
```
  int x = 4;
  const int & r = x;
  upcxx::rpc(0,[](int a1, int a2, int a3,
                  int const &a4, int const &a5, int const &a6,
                  int &&a7, int &&a8, int &&a9) {
      }, x, r, std::move(x), x, r, std::move(x), x, r, std::move(x)).wait();
```
The same is not currently true if int const & above is replaced with int const volatile &, a defect that should arguably be fixed (but probably has no practical impact on our users).

So I don’t see a good argument for maintaining them through serialization.

Is this proposing asymmetric deserialization that strips out embedded cv qualfiers? If so I disagree - CV qualifiers are not meaningful to values but embedded cv qualifiers are meaningful to type compatibility of the types generated by deserialiation.

For example, we currently allow the following:
```
    using pair_t = std::pair<const int, int>;
    pair_t x = {1,2};
    const pair_t & r = x;
    upcxx::rpc(0,[](pair_t a1, pair_t a2, pair_t a3,
                  pair_t const &a4, pair_t const &a5, pair_t const &a6,
                  pair_t &&a7, pair_t &&a8, pair_t &&a9) {
      }, x, r, std::move(x), x, r, std::move(x), x, r, std::move(x)).wait();
```
This implies deserialized_type_t<std::pair<const int, int>> is std::pair<const int, int> (no stripping of the embedded const qualifier). I don't have a compelling use case argument for such a type, but it doesn't seem unreasonable and I see no motivation to change it to use asymmetric deserialzation (breaking backwards compat and increasing confusion, for no benefit I can see).
- 2020-05-24T21:32:08+00:00
Dan Bonachea
In our informal meeting today (2020-05-27) John indicated agreement with this general plan (Option 2) and encouraged Amir to pursue a PR.
- 2020-05-27T21:25:41+00:00
Amir Kamil reporter
We may need spec language in rpc and friends concerning the lifetime of the referent when returning by reference. The implementation does not immediately copy/move the referent, so it cannot be a stack variable. But I don’t know enough details about the implementation to be able to determine the required lifetime. We likely need @john bachan 's help on this.
- 2020-07-25T07:05:28+00:00
Dan Bonachea
We may need spec language in rpc and friends concerning the lifetime of the referent when returning by reference.

The spec language probably needs to say something to the effect of: "returning a value which includes a reference to an object requires the lifetime of that object to persist until serialization of the result".

I think this implies not only that references to stack locations are prohibited, but additionally that if the RPC callback returns a non-ready future<T&,U&&,...>, then the referent objects must exist past readying of the future AND a subsequent internal(?) progress that serializes the fulfilled result. This is not a clearly defined point, which I think means the user program cannot safely destroy the referent until after synchronization with the initiator implies the RPC acknowledgement has arrived.

In addition, we've already discussed that we need more than spec language here, in order to achieve full generality. Specifically, if the user returns a reference to an RPC input argument that was passed-in by reference, and the reference is returned either within a ready type or within a non-ready future, we think the implementation currently lacks what's necessary to get this right (see closely related implementation issue #367). The fix is basically we need the runtime to extend the lifetime of all deserialized RPC input arguments to match the behavior of view buffers, ie remaining valid until serialization of the RPC result. These implementation changes are necessary to correctly handle code such as the following:
```
upcxx::rpc(target, [](std::vector<int> &&a1) {
  future <> some_fut = do_something(std::move(a1)); // do something asynchronous
  // return a future that will be readied later, 
  // after readying we serialize the (possibly modified) argument and send it back
  return upcxx::when_all(some_fut, upcxx::make_future<std::vector<int> &&>(std::move(a1)));
}
```
- 2020-07-25T20:37:27+00:00
Amir Kamil reporter
Proposed implementation in implementation PR 237.

It would take non-trivial implementation work to support volatile-qualified types, for no practical benefit, so I do not think we should do so.
- 2020-07-26T04:39:19+00:00
Dan Bonachea
See also tangentially related Impl issue #397
- 2020-07-30T22:18:13+00:00
Amir Kamil reporter
Resolved by PR #40.
- 2020-08-13T04:32:54+00:00
Amir Kamil reporter
- changed status to resolved
- 2020-08-13T04:33:05+00:00
Log in to comment

Assignee: Amir Kamil

Type: bug

Priority: major

Status: resolved

Component: General

Milestone: 2020.9.0 release

Version: 2020.3.0

Votes: 0

Watchers: 1