Define Serializability Concepts and type queries

Issue #113 resolved
Dan Bonachea created an issue

Copied from discussion in pull request #2:

Dan wrote:

(In some places) we need to know a type is trivially Serializable, which I don't think is exactly the same as TriviallyCopyable (despite the prose currently appearing at the start of Ch 6). In particular, a struct containing a raw C++ pointer is_trivially_copyable (because C++ pointers are scalar types and hence TriviallyCopyable), but (once we have real Serialization) it's very likely a user would want to implement non-trivial Serialization for that type -- and if the mechanism for doing so is the one laid out in Ch 6 (1. Declare their type to be a friend of access 2. Implement the visitor function serialize), then I suspect the C++ compiler will still consider that type is_trivially_copyable.

See closely related issue #64 which discusses some of these issues. The PR propagates this subtle error further into the spec.

Amir Wrote:

I don't have an objection to coming up with our own concept of TriviallySerializable. Proposal:

A type T is TriviallySerializable if one of the following holds:

  • T is TriviallyCopyable and either:
    • T is an arithmetic or enumeration type
    • T is a non-union class type and does not implement the UPC++ serialization interface
  • upcxx::is_trivially_serializable<T> is specialized to provide a member constant value that is true

We then define an is_trivially_serializable<T> template with a value member that is true if T is TriviallySerializable. (And maybe is_nontrivially_serializable<T> and is_serializable<T> as well.) We can then replace std::is_trivially_copyable with is_trivially_serializable in this PR.

Note that this does not prevent trivial serialization of TriviallyCopyable types that contain pointers. Unfortunately, C++ does not provide a mechanism for determining whether or not a type contains pointers as members. It may be worth discussing this issue in the spec or the programmer's guide.

Comments (8)

  1. Dan Bonachea reporter

    I like this proposal and think it expresses a concept currently missing from the Serialization chapter and used elsewhere in the spec (eg dist_id is described as "trivially Serializable", without a definition of what that means).

    Spec changes I'd like to see to implement Amir's proposal:

    1. Add formal English definition of TriviallySerializable (probably in Serialization chapter)
    2. Add type query declarations for both is_trivially_serializable<T> and is_serializable<T> (optionally also is_nontrivially_serializable<T>, which is computable from the other two)
    3. Update serializable_view PR to use TriviallySerializable where appropriate
    4. Search the spec for "serializable" and "trivially" and ensure each instance is utilizing the proper term

    Additionally, I propose the following:

    1. We should annotate API types with TriviallySerializable as appropriate (eg dist_id, team_id, global_ptr, ...?)
    2. Somehow we should mention that any variables lambda-captured-by-value in function objects passed to rpc destined for other ranks should be TriviallySerializable -- this is a necessary, but (unfortunately) not sufficient condition for correctness.
    3. RPC injection should actually check is_serializable<T> on each of its arguments, and ideally provide a compile error if any are false
  2. john bachan

    Traits is_serializable and is_trivially_serializable are unfortunately not as useful as their corresponding CamelCase concepts. Specifically, your last point (3) is not possible without seriously compromising productivity. According to c++, It is implementation defined behavior whether a lambda reports as is_trivially_copyable, even if the lambda captures by-value only other is_trivially_copyable types. I have witnessed GCC 5.x report such lambdas as non-trivially-copyable. If we were to use copyability to infer serializability, your proprosed rpc assertions would throw false positives on large class of very useful types (lambdas). This means the only possible default definitions for is_serializable and is_trivially_serializable that don't reject these lambdas are the vacuously true ones. The utility of these traits is thus reduced to merely allowing the user to explicitly mark a type (via specialization) as not serializable.

  3. Dan Bonachea reporter

    @jdbachan : Re last point (3) : based on your data I agree we should not assert is_serializable on the function object argument to RPC, but I think we can still apply it to the RPC arguments being passed to the callback? In particular, don't we have to disallow lambda arguments to RPC callbacks anyhow due to address space randomization making them unusable in general at the target? Or do you have some magic to detect a particular RPC argument is a lambda and "fix" the code offset for the embedded function pointer?

  4. john bachan

    Lambdas can be passed as rpc args so long as you know their type. Doing so does not involve function pointers because the main lambda knows the type of the argument lambda so it issues a direct call to its operator(). See example:

    int a = 1;
    auto foo = [a](int b) { std::cout<<"a="<<a<<" b="<<b; };
    upcxx::rpc(0,
      [](decltype(foo) foo1) {
        foo1(2); // prints: a=1 b=2
      },
      foo // on the wire this is just sizeof(foo), which should be equal to sizeof(int)
    );
    
    // in c++14 this gets nicer
    int a = 1;
    upcxx::rpc(0,
      [](auto foo) {
        foo(2);
      },
      [=](int b) { cout<<a<<b; }
    );
    

    I think this rules out asserting serializability anywhere.

  5. Dan Bonachea reporter

    I think this rules out asserting serializability anywhere.

    Agreed. And we should perhaps state that upcxx::is_serializable and upcxx::is_trivially_serializable may return an implementation-defined (or alternately unspecified) value for lambda types.

    I think this also technically means that if someone created a serialized_view to contain lambdas, the iterator type they get at the target is similarly implementation-defined or unspecified. I'm fine with that given it seems like a perverse corner case.

  6. Amir Kamil

    This is what we agreed on in the 1/24 meeting:

    A type T is TriviallySerializable if one of the following holds:

    1. Define that a type T is TriviallySerializable if one of the following holds:

      • T is TriviallyCopyable and does not implement the UPC++ serialization interface
      • upcxx::is_trivially_serializable<T> is specialized to provide a member constant value that is true
    2. Provide an is_trivially_serializable<T> trait that is exact.

    3. Clarify that it is implementation dependent whether lambda objects (that do not capture non-TriviallyCopyable types) are TriviallyCopyable and therefore TriviallySerializable.

    4. Use the is_trivially_serializable<T> trait to determine which version of serializable_view<T> is selected.

    5. Require that the types used with RMA are TriviallySerializable.

    Open questions:

    1. Should VIS and collectives require TriviallySerializable types?

      • Proposal: yes for now, can be loosened later if needed.
    2. Should we explicitly state that RPC makes byte copies of lambda objects even if they are not TriviallySerializable?

  7. Dan Bonachea reporter

    Should VIS and collectives require TriviallySerializable types?

    I'd argue strongly that this requirement on RMA needs to apply equally to VIS, because VIS is just an explicitly coalesced variant of RMA. The implementation motivation is even stronger for VIS because GASNet's VIS will always be byte-copy and we don't want the hassle of implementing a serializing version of VIS in the UPC++ runtime.

    For collectives I can see how real serialization might be handy (eg to broadcast a std::string or std::vector at startup). However note that GASNet's collectives will always be byte-copy and will probably remain restricted to fixed-contribution, which means they probably cannot be used to implement serializing collectives (because in general the size of the serialized representation may not be known until after serialization, and could easily be rank-dependent).

  8. Log in to comment