Generalized completion

Issue #76 resolved
Former user created an issue

Pseudo-draft of the generalized completion API.

rput(..., <default-value>) // global completion returns future
rput(..., global_cx(promise&)) // returns void, global into promise
rput(..., global_cx(persona&, funcobj)) // returns void, global into continuation

// generalize rput_then_rpc, could also be used to make
// rget_then_rpc
rput(..., source_cx_as_future | remote_cx(func, args...)) // return future
rput(..., source_cx(promise&) | remote_cx(func, args...)) // void
rput(..., source_cx(persona&, func1) | remote_cx(func2, args...)) // void

// and then for the power user
rput(..., source_cx(promise&) |
          remote_cx(func,args...) |
          global_cx_as_future)

// if we're really tricky we can support "naked" promise
// as implicitly global_cx
rput(..., promise&)

// also cool trick would be to support multiple futures ordered
// in correspondence with their parsed position (relative to |)
std::pair<future,future> rput(..., global_cx_as_future | source_cx_as_future);
// the futures in the pairs are flipped
std::pair<future,future> rput(..., source_cx_as_future | global_cx_as_future);

// sketch implementation
struct nil_cx {};
struct future_cx {};
template<typename ...T>
struct promise_cx { promise<T...> &prom; };
template<typename Func>
struct continuation_cx { persona &pers; Func func; };
template<typename SerializableFunc>
struct rpc_cx { SerializableFunc func; };

template<typename SourceCx, // can be any of the ***_cx types
         typename RemoteCx, // can only be rpc_cx or nil_cx
         typename GlobalCx // can be any ***_cx except rpc_cx
         >
struct completions {
  SourceCx source;
  RemoteCx remote;
  GlobalCx global;
};

constexpr completions<future_cx,nil_cx,nil_cx> source_cx_as_future;
constexpr completions<nil_cx,nil_cx,future_cx> global_cx_as_future;

// construct various global-only completions
// have likewise factories for local_cx and remote_cx
template<typename ...T>
completions<nil_cx,nil_cx,promise_cx<T...>>
  global_cx(promise<T...>&);

template<typename Func>
completions<nil_cx,nil_cx,continuation_cx<Func>>
  global_cx(persona&, Func);

// some metaprogram to pick the non-nil_cx type from Cx1 and Cx2
// or nil_cx if theyre both nil_cx. Error if theyre both non-nil
template<typename Cx1, typename Cx2>
using non_nil_cx_of_t = ...;

template<typename L1, typename R1, typename G1,
         typename L2, typename R2, typename G2>
completions<
 non_nil_cx_of_t<L1,L2>,
 non_nil_cx_of_t<R1,R2>,
 non_nil_cx_of_t<G1,G2>>
operator|(completions<L1,R1,G1> a, completions<L2,R2,G2> b) {
  return completions{/*select non-nil fields from from a and b*/};
}

// this does not allow for the implicit promise& -> global_cx trick
template<typename T,
         typename SourceCx = nil_cx,
         typename RemoteCx = nil_cx,
         typename GlobalCx = future_cx>
/*return type = template magic*/
rput(global_ptr<T> dest, T const *src, size_t n,
     completions<SourceCx,RemoteCx,GlobalCx> cxs = global_cx_as_future);

Comments (12)

  1. Amir Kamil

    I've been poking at this and fleshing out the metaprogramming, and I think this approach is feasible.

    I suggest separating out empty promise/continuation completions from the ones that take a type. I think that will facilitate both specifying what kinds of completions are valid for an operation as well as error detection and reporting. So something like the following:

    // completion types
    
    template<bool Here, bool Remote> 
    struct valid_cx {
      static const bool valid_here = Here;
      static const bool valid_remote = Remote;
    };
    
    struct nil_cx : valid_cx<true,true> {};
    struct future_cx : valid_cx<true,false> {};
    struct promise_cx : valid_cx<true,false> {
      promise<> &prom;
    };
    template<typename T>
    struct typed_promise_cx : valid_cx<true,false> {
      promise<T> &prom;
    };
    template<typename Func>
    struct continuation_cx : valid_cx<true,false> {
      persona &pers;
      Func func;
    };
    template<typename Func, typename T>
    struct typed_continuation_cx : valid_cx<true,false> {
      persona &pers;
      Func func;
    };
    template<typename SerializableFunc, typename ...SerializableArgs>
    struct rpc_cx : valid_cx<false,true> {
      SerializableFunc func;
      std::tuple<SerializableArgs...> args;
    };
    
    // completion wrapper
    
    template<typename SourceCx, // can be any ***_cx type except rpc_cx
             typename RemoteCx, // can only be rpc_cx or nil_cx
             typename GlobalCx> // can be any ***_cx type except rpc_cx
    struct completions {
      static_assert(SourceCx::valid_here, "SourceCx cannot be rpc_cx");
      static_assert(RemoteCx::valid_remote, "RemoteCx must be rpc_cx or nil_cx");
      static_assert(GlobalCx::valid_here, "GlobalCx cannot be rpc_cx");
      SourceCx source;
      RemoteCx remote;
      GlobalCx global;
    };
    

    I've included some basic error checking in the above on valid completions, to illustrate what we can do with static asserts.

    I'll leave out the rest of the metaprogramming until I have a chance to test it.

    A peripheral question we need to answer is what kinds of completion to support for each of the communication ops. The current spec lays out the following:

    • rput: global completion, remote completion (via rput_then_rpc)
    • rget: global completion
    • rpc: global completion, but only future and promise (no continuation)
    • atomics: future-based global completion
    • collectives: future-based global completion
    • VIS: same as rput/rget

    Dan suggested we avoid exposing completions that are unlikely to be performant. Similarly, I think we should avoid overgeneralizing what we provide; we can always expose more completions later as the need arises. So here's what I suggest:

    • rput: source, remote, and global completion
    • rget: global completion
    • rpc: see discussion below
    • atomic put: remote and global completion
    • atomic get, fetch/add: global completion
    • collectives: global completion
    • VIS: same as rput/rget

    For rpc, the current spec does not support continuation-based completion. My guess is that this was an oversight. That being said, I don't recommend including melding generalized completion with rpc. The problem is that we need to encode the return type of the rpc, and that this includes some future lifting (e.g. future<future<T>> --> future<T>) that doesn't appear elsewhere. It also translates void return to empty future/promise. Given the amount of description required in the spec for these features, I think it makes sense to specify the options as overloads rather than through generalized completion.

  2. Dan Bonachea

    Rooted non-blocking collectives that handle data (ie currently broadcast) also needs source completion, at least as currently formulated: see Issue #69.

    However this is somewhat an artifact of this freaky splitting of broadcast into root and non-root calls for the same collective operation, so perhaps this is better solved by moving to a more traditional unified interface for rooted collectives, where global completion for nodes that only send is synonymous with source completion.

    Incidentally, collectives is another strong reason to avoid the label "global completion" when we really mean "the operation is complete as far as this persona is concerned". "global completion" applied to a collective suggests something akin to UPC_OUT_ALLSYNC, which I'm pretty sure is not what we want to expose.

  3. Amir Kamil

    Some suggestions for alternate naming for the types of completion:

    • source: initiation, send
    • remote: receive
    • global: final, total, result

    For rooted collectives, I would strongly prefer a unified interface, both for ease of programming and to simplify semantics (including ordering and alignment of collectives). This would likely require us to take the value to be sent by pointer.

  4. Amir Kamil

    Here is a proposal for a completion API. It differs somewhat from what John has implemented, though the interface is similar.

    My main goals in this design were as follows:

    • Extensibility: for example, if we decide to expose metadata completion for VIS, it should be simple to add this without breaking user code.

    • Flexibility: the API allows multiple actions to be specified for a single event. For example, a user can queue several LPCs on different personas, all conditioned on source completion. The template that encodes completions is variadic, allowing any number of actions on any number of events. This also helps with the extensibility goal.

    • Abstraction: most of the types used to handle completion should be abstracted from the user.

    Completion events:

    • Source: All data/metadata provided by pointer to the op has been consumed and may be modified. Can be signaled by future, promise, or LPC.

    • Remote: The op has deposited its data on the remote rank, which can now consume it. Can be signaled by RPC.

    • Operation: The op has completed from the perspective of the initiator. Can be signaled by future, promise, or LPC.

    Interface

    Here is the user interface for obtaining a completion. I use CType to indicate an opaque completion type, which may be a different type in different contexts.

    // Source completion
    struct source_cx {
      static CType as_future();
    
      template<typename ...T>
      static CType as_promise(promise<T...> &pro);
    
      template<typename Func>
      static CType as_lpc(persona &target, Func func);
    };
    
    // Remote completion
    struct remote_cx {
      template<typename Func, typename ...Args>
      static CType as_rpc(Func func, Args... &&args);
    };
    
    // Operation completion
    struct operation_cx {
      static CType as_future();
    
      template<typename ...T>
      static CType as_promise(promise<T...> &pro);
    
      template<typename Func>
      static CType as_lpc(persona &target, Func func);
    };
    

    Completions can be combined with the | operator, and ordering of futures matches the ordering of operands to |:

    template<typename CTypeA, CTypeB>
    CType operator|(CTypeA a, CTypeB b);
    

    A communication op is specified as follows:

    template<typename T,
             typename Completions=decltype(operation_cx::as_future())>
    RType rget(global_ptr<T>, Completions cxs = Completions{});
    

    The signature indicates that the default completion is operation completion as a future. RType is void if Completions does not contain a future-based completion, a single future if Completions contains a single future-based completion, and a tuple of futures if Completions contains multiple future-based completions, with the order of the result matching the order in Completions.

    The API reference should list what completions are available, and the data type of the completion result in the case of source or operation, e.g.:

    Completions:

    • Source: no value
    • Remote
    • Operation: value of type T

    The completions chapter should explain that a value-less completion produces a future<>, takes a promise<>, or enqueues a zero-argument LPC. A completion that produces values of types T... produces a future<T...>, takes a promise<T...>, or enqueues an LPC that can be invoked on a sequence of values of types T....

    Here is an example of using rget() with a very complicated completion:

    int foo() {
      return 0;
    }
    
    int bar(double x) {
      return x;
    }
    
    void baz(double(&)[3]) {
    }
    
    int main() {
      // ...
      global_ptr<int> gp1 = /* some global pointer */;
      promise<int> pro1;
      promise<> pro2;
      promise<double> pro3;
      persona &per1 = /* some persona */;
      auto cxs = (
                  // source_cx::as_promise(pro1) |
                  // source_cx::as_lpc(per1, bar) |
                  // operation_cx::as_promise(pro2) |
                  // operation_cx::as_promise(pro3) |
                  // operation_cx::as_lpc(per1, foo) |
                  // operation_cx::as_lpc(per1, baz) |
                  operation_cx::as_promise(pro1) |
                  source_cx::as_future() |
                  operation_cx::as_future() |
                  source_cx::as_future() |
                  operation_cx::as_future() |
                  source_cx::as_lpc(per1, foo) |
                  source_cx::as_lpc(per1, foo) |
                  operation_cx::as_lpc(per1, bar) |
                  remote_cx::as_rpc(bar, 3)
                  );
      auto result = rget(gp1, cxs);
      // ...
    }
    

    The commented-out completions are invalid due to type mismatch between the given and expected promises/LPCs. Ideally, we should produce a reasonable compile-time error for such a mismatch.

    The example shows that the same completion event can signal multiple things. The futures are ordered as the operands to |. Since source completion produces a value-less future but operation completion produces a future<T> where T is int, the type of result is std::tuple<future<>, future<int>, future<>, future<int>>.

    Implementation

    The following is a strawman implementation of this API. It is primarily focused on the template metaprogramming, with some error checking. It does not actually implement the signaling, and the RPC part of it is not hooked up to serialization (and probably does not handle rvalue references properly; I'll defer to John on both). And most of it should be placed in the upcxx::detail namespace.

    Events

    A scoped enumeration represents completion events:

    enum class completion_event {
      source,
      remote,
      operation
    };
    

    Signaling Actions

    Actions include signaling by future, promise, LPC, or RPC. An action type is parameterized by completion event as well as any data types needed by the action itself:

    // Future completion
    template<completion_event Event>
    struct future_cx {};
    
    // Promise completion
    template<completion_event Event, typename ...T>
    struct promise_cx {
      promise<T...> &pro_;
    };
    
    // LPC completion
    template<completion_event Event, typename Func>
    struct lpc_cx {
      persona &target_;
      Func func_;
      lpc_cx(persona &target, Func func)
        : target_(target), func_(std::move(func)) {}
    };
    
    // RPC completion. This needs to be fixed to appropriately handle serialization.
    template<completion_event Event, typename Func, typename ...Args>
    struct rpc_cx {
      Func func_;
      std::tuple<Args...> args_;
      rpc_cx(Func func, Args... args)
        : func_(std::move(func)), args_{std::move(args)...} {}
    };
    

    Completions

    The completions variadic template encodes any number of completion actions. For now, the data is stored in a tuple:

    template<typename ...Cxs>
    struct completions {
      std::tuple<Cxs...> cxs;
    };
    

    The | operator merely concatenates parameter packs and the associated tuples:

    template<typename ...ACxs, typename ...BCxs>
    completions<ACxs..., BCxs...> operator|(completions<ACxs...> a,
                                            completions<BCxs...> b) {
      return {std::tuple_cat(std::move(a.cxs), std::move(b.cxs))};
    }
    

    Obtaining a Completion

    Definitions for the source_cx, remote_cx, and operation_cx classes are below. Given the similarities between source_cx and operation_cx, their implementation is offloaded to a base template. (We could define them as type aliases instead, but likely at the cost of slightly worse error messages.)

    // Base template for completions at initiator
    template<completion_event Event>
    struct here_cx {
      using as_future_t = completions<future_cx<Event>>;
    
      static constexpr as_future_t as_future() {
        return {};
      }
    
      template<typename ...T>
      using as_promise_t = completions<promise_cx<Event, T...>>;
    
      template<typename ...T>
      static as_promise_t<T...> as_promise(promise<T...> &pro) {
        return {std::make_tuple(promise_cx<Event, T...>{pro})};
      }
    
      template<typename Func>
      using as_lpc_t = completions<lpc_cx<Event, Func>>;
    
      template<typename Func>
      static as_lpc_t<Func> as_lpc(persona &target, Func func) {
        return {std::make_tuple(lpc_cx<Event, Func>{target, func})};
      }
    };
    
    // Source and operation completion
    struct source_cx : here_cx<completion_event::source> {};
    struct operation_cx : here_cx<completion_event::operation> {};
    
    // Remote completion
    struct remote_cx {
      static constexpr completion_event Event = completion_event::remote;
    
      template<typename Func, typename ...Args>
      using as_rpc_t = completions<rpc_cx<Event, Func, Args...>>;
    
      template<typename Func, typename ...Args>
      static as_rpc_t<Func, Args...> as_rpc(Func func, Args&&... args) {
        return
          {std::make_tuple(rpc_cx<Event, Func,
                                  Args...>{func, std::forward<Args>(args)...})};
      }
    };
    

    Computing a Return Type

    A communication op computes its return type based on the Completions provided by the user and the kinds of completions the op provides. A single completion is specified by the completion event and the types of the values it produces, which may be empty:

    template<completion_event Event, typename ...T>
    struct future_return {
      using type = future<T...>;
    };
    

    The cx_return_type template then does the computation. For example, rget() would be declared as follows:

    template<typename T,
             typename Completions=decltype(operation_cx::as_future())>
    cx_return_type<Completions,
                   future_return<completion_event::source>,
                   future_return<completion_event::operation, T>> 
      rget(global_ptr<T>, Completions cxs = Completions{});
    

    The op supports a future return on source completion, which doesn't produce any value. It also supports future return on operation completion, which produces a T.

    In order to actually compute the type, we need to be able to combine void, futures, and tuples appropriately. void combined with anything produces the latter, combining two futures produces a tuple, and combining two tuples or a tuple and a future produces a combined tuple:

    template<typename ...T>
    struct future_tuple_cat;
    
    template<typename T>
    struct future_tuple_cat<T, void> {
      using type = T;
    };
    
    template<typename T>
    struct future_tuple_cat<void, T> {
      using type = T;
    };
    
    template<>
    struct future_tuple_cat<void, void> {
      using type = void;
    };
    
    template<typename ...T1, typename ...T2>
    struct future_tuple_cat<future<T1...>, future<T2...>> {
      using type = std::tuple<future<T1...>, future<T2...>>;
    };
    
    template<typename ...T1, typename ...T2>
    struct future_tuple_cat<future<T1...>, std::tuple<T2...>> {
      using type = std::tuple<future<T1...>, T2...>;
    };
    
    template<typename ...T1, typename ...T2>
    struct future_tuple_cat<std::tuple<T1...>, future<T2...>> {
      using type = std::tuple<T1..., future<T2...>>;
    };
    
    template<typename ...T1, typename ...T2>
    struct future_tuple_cat<std::tuple<T1...>, std::tuple<T2...>> {
      using type = std::tuple<T1..., T2...>;
    };
    

    Then in order to turn a completion into its part of the return type, if it's not a future, then it contributes nothing (i.e. void) to the return type. If it is a future, then we have to walk through the set of future-based completions that the op supports in order to match it with its return type. For example, if the user wants source completion as a future, in the case of rget() we have future_return<completion_event::source>, which means that source completion should result in a future<>. Here's the logic for this walking and matching:

    // Unspecialized case: not a future_cx --> void
    template<typename Cx, typename ...Frs>
    struct match {
      using type = void;
    };
    
    // Specialized: future_cx, but its event does not match the first future_return's
    // --> recurse on rest
    template<completion_event Event,
             typename Fr1,
             typename ...FrRest>
    struct match<future_cx<Event>, Fr1, FrRest...> {
      using type = typename match<future_cx<Event>, FrRest...>::type;
    };
    
    // Specialized; future_cx whose event matches that of the first future_return
    // --> pull the types out of the future_return to compute the future type
    template<completion_event Event,
             typename ...Fr1T,
             typename ...FrRest>
    struct match<future_cx<Event>, future_return<Event, Fr1T...>, FrRest...> {
      using type = future<Fr1T...>;
    };
    

    (NOTE [MOSTLY] TO SELF: add error checking for when a requested completion does not match a provided completion event. This should probably lifted to an independent check, as with check_completions below.)

    Finally, we need an outer loop to scan all the completions, combining the return types for each:

    // Outer template variadic over future_returns
    template<typename ...Frs>
    struct future_returns {
      // Inner template variadic over completions
      // unspecialized
      template<typename ...T>
      struct scan;
    
      // specialized recursive case
      template<typename Cx1, typename ...Cxs>
      struct scan<Cx1, Cxs...> {
        using type =
          typename future_tuple_cat<typename match<Cx1, Frs...>::type,
                                    typename scan<Cxs...>::type>::type;
      };
    
      // specialized base case for a single completion
      template<typename Cx1>
      struct scan<Cx1> {
        using type = typename match<Cx1, Frs...>::type;
      };
    
      // Pull completions out of the parameter list for the completions template
      // and scan them
      template<typename>
      struct scan_completions;
    
      template<typename ...Cxs>
      struct scan_completions<completions<Cxs...>> {
        using type = typename scan<Cxs...>::type;
      };
    };
    
    // Actual interface for computing a return type
    template<typename Completions, typename ...Frs>
    using cx_return_type =
      typename future_returns<Frs...>::template scan_completions<Completions>::type;
    

    Error Checking

    The following implements error checking to make sure a promise or LPC matches the type of the associated completion event. This should be used with a static_assert in the communication op, so that a nicer error message is produced as opposed to a substitution failure deep in template goop. For example:

    template<typename T,
             typename Completions=decltype(operation_cx::as_future())>
    cx_return_type<Completions,
                   future_return<completion_event::source>,
                   future_return<completion_event::operation, T>> 
      rget(global_ptr<T>, Completions cxs = Completions{}) {
      static_assert(check_completions<Completions,
                                      completion_event::source>(),
                    "mismatched promise or lpc type for source completion");
      static_assert(check_completions<Completions,
                                      completion_event::operation, T>(),
                    "mismatched promise or lpc type for operation completion");
      // ...
    }
    

    A separate static_assert is required for each completion event. (I'd prefer to use a C++14 variable template rather than a constexpr function, but oh well.)

    The overall structure is similar to computing a return type, but since we handle only one completion at a time, there is only one loop.

    template<completion_event Event, typename ...T>
    struct check_completions_impl {
      // Loop over completions
      // unspecialized base case: true
      template<typename ...Cxs>
      struct check : std::true_type {};
    
      // specialized recursive case
      template<typename Cx1, typename ...Cxs>
      struct check<Cx1, Cxs...> {
        static constexpr bool value =
          check_completion<Event, Cx1, T...>::value &
          check<Cxs...>::value;
      };
    
      // Pull completions out of the parameter list for the completions template
      // and scan them
      template<typename>
      struct scan_completions;
    
      template<typename ...Cxs>
      struct scan_completions<completions<Cxs...>> {
        static constexpr bool value = check<Cxs...>::value;
      };
    };
    
    // Actual interface for checking
    template<typename Completions, completion_event Event, typename ...T>
    constexpr bool check_completions() {
      return check_completions_impl<Event, T...>::
        template scan_completions<Completions>::value;
    }
    

    To check an individual completion, the default is that it is OK:

    // Unspecialized: everything is OK
    template<completion_event Event, typename Cx, typename ...T>
    struct check_completion : std::true_type {};
    

    For a promise, if its event matches the given one, then its parameter types must match the given types:

    // Partial specialization for mismatched types: not OK
    template<completion_event Event, typename ...T, typename ...PT>
    struct check_completion<Event, promise_cx<Event, PT...>, T...>
      : std::false_type {};
    
    // Partial specialization for matched types: OK
    template<completion_event Event, typename ...T>
    struct check_completion<Event, promise_cx<Event, T...>, T...> :
      std::true_type {};
    

    For an LPC, we need to know whether it can be invoked on the given types:

    // Placeholder type, where the parameter is used with SFINAE
    template<typename>
    struct placeholder : std::true_type {};
    
    template<typename Func>
    struct is_invokable {
      // Substitutes on successful invocation
      template<typename ...Args>
      static auto invoke(int) ->
        placeholder<typename std::result_of<Func(Args...)>::type>;
      // Varargs overload for unsuccessful substitution of above
      template<typename ...Args>
      static std::false_type invoke(...);
      template<typename ...Args>
      // true/false based on whether substitution succeeded
      using with = decltype(invoke<Args...>(0));
    };
    
    // Partial specialization to check LPC
    template<completion_event Event, typename Func, typename ...T>
    struct check_completion<Event, lpc_cx<Event, Func>, T...> {
      static constexpr bool value =
        is_invokable<Func>::template with<T...>::value;
    };
    

    Here's an example of error detection, where source_cx::as_promise(pro1) is uncommented out of the completion construction above (so that a non-empty promise is used for the empty source completion). With GCC:

    completion2.cpp: In instantiation of 'cx_return_type<Completions, future_return<(completion_event)0>, future_return<(completion_event)2, T> > rget(global_ptr<T>, Completions) [with T = int; Completions = completions<promise_cx<(completion_event)0, int>, promise_cx<(completion_event)2, int>, future_cx<(completion_event)0>, future_cx<(completion_event)2>, future_cx<(completion_event)0>, future_cx<(completion_event)2>, lpc_cx<(completion_event)0, int (*)()>, lpc_cx<(completion_event)0, int (*)()>, lpc_cx<(completion_event)2, int (*)(double)>, rpc_cx<(completion_event)1, int (*)(double), int> >; cx_return_type<Completions, future_return<(completion_event)0>, future_return<(completion_event)2, T> > = std::tuple<future<>, future<int>, future<>, future<int> >]':
    completion2.cpp:312:28:   required from here
    completion2.cpp:269:3: error: static assertion failed: mismatched promise or lpc type for source completion
       static_assert(check_completions<Completions,
       ^
    

    And with Clang:

    completion2.cpp:269:3: error: static_assert failed "mismatched promise or lpc
          type for source completion"
      static_assert(check_completions<Completions,
      ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    completion2.cpp:312:15: note: in instantiation of function template
          specialization 'rget<int, completions<promise_cx<completion_event::source,
          int>, promise_cx<completion_event::operation, int>,
          future_cx<completion_event::source>,
          future_cx<completion_event::operation>,
          future_cx<completion_event::source>,
          future_cx<completion_event::operation>, lpc_cx<completion_event::source,
          int (*)()>, lpc_cx<completion_event::source, int (*)()>,
          lpc_cx<completion_event::operation, int (*)(double)>,
          rpc_cx<completion_event::remote, int (*)(double), int> > >' requested here
      /*int x =*/ rget(gp1, cxs);
                  ^
    1 error generated.
    

    Extensibility

    In order to add a new completion event (e.g. metadata completion), the following is all that is required:

    • Add a member to the completion_event enum
    • Define a *_cx class. For metadata completion, we would just use here_cx<completion_event::metadata>.

    And that is all. Existing user code would still work.

    Conclusion

    If we adopt the proposal above, the spec proper would only discuss the pieces in the Interface section above. Most of the Implementation would be elided from the spec, except that we may include the template metaprogramming for computing a return type in the appendix.

    Thoughts on this?

  5. Dan Bonachea

    Thanks for the outstanding write up! Overall I like some of the improvements this interface provides. I'll let John comment on implementation issues.

    I have semantic questions regarding the set of allowable completion events for a given op:

    I thought we'd agreed (your first comment) not to expose source completion for rgets, which you refer to several times in this proposal. I'm not even sure what semantics that event would imply. Was this an oversight, or do you intend to provide this as a trivially equivalent to operation completion?

    "Remote completion" of an rget is also a bit questionable and would also need to be clarified, ie exact memory ordering semantics wrt the reads and writes of the payload transfer. Your definition of both these completion events seems to be slanted specifically towards an rput operation.

    I think we should avoid exposing completion events for a given op that we don't think are semantically meaningful or would be inseparable from another event in any performant implementation of that particular operation. Ie we specify the subset of completion events available to each operation and reject attempts to ask for meaningless/prohibited ones (eg source completion for contiguous get, metadata completion for a contiguous value put, remote completion for a collective, etc). Can such restrictions be enforced at compile time in this model?

  6. Former user Account Deleted reporter

    This is great. Variadic completion type and permitting multiple actions per event simplify the implementation, and the extensibility support is excellent.

    We could consider changing completion_event from an enum to just empty undefined classes. So struct source_cx_event; struct remote_cx_event; // etc. This has the benefit of being even more extensible since anyone can add new classes, whereas enums have to be extended at their definition.

    At first I didn't like the source_cx::as_future use of :: instead of just flat underscores. But the brevity of the extensibility section sells it nicely.

  7. Amir Kamil

    Thanks for your feedback!

    I agree with Dan that we should only expose completion events that make sense. My use of rget was only for illustration, and I picked it since value-based rget presents the most complicated typing issues. I think rget should only support remote and operation, and lpc/rpc should only support operation.

    We may not be able to define what completion events mean at a global level; we may have to do so on a per-operation basis. In particular, I don't think collectives really fit into my vague definitions of events, since there isn't a single caller and a single remote.

    Specifying rpc with generalized completion is tricky because it is variadic. We may need to provide two overloads, one that's the default future-based and the other that takes in non-defaulted completions.

    If we define a nil completion, e.g.

    using nil_cx = completions<>;
    

    we could fold lpc_ff/rpc_ff into lpc/rpc:

    rpc(rank, nil_cx, func, args);
    

    Here's code that checks that the supplied completions only specify events that are supported:

    template<typename Cx, completion_event ...Events>
    struct check_event : std::false_type {};
    
    template<typename Cx, completion_event E1, completion_event ...ERest>
    struct check_event<Cx, E1, ERest...> : check_event<Cx, ERest...> {};
    
    template<template <completion_event, typename ...> class CxType, // template template!
             typename ...CxArgs, completion_event Event,
             completion_event ...ERest>
    struct check_event<CxType<Event, CxArgs...>, Event, ERest...>
      : std::true_type {};
    
    template<completion_event ...Events>
    struct check_events_impl {
      template<typename ...Cxs>
      struct check : std::true_type {};
    
      template<typename Cx1, typename ...Cxs>
      struct check<Cx1, Cxs...> {
        static constexpr bool value =
          check_event<Cx1, Events...>::value &
          check<Cxs...>::value;
      };
    
      template<typename>
      struct scan_completions;
    
      template<typename ...Cxs>
      struct scan_completions<completions<Cxs...>> {
        static constexpr bool value = check<Cxs...>::value;
      };
    };
    
    template<typename Completions, completion_event ...Events>
    constexpr bool check_events() {
      return check_events_impl<Events...>::
        template scan_completions<Completions>::value;
    }
    

    A check could then be as follows:

      static_assert(check_events<Completions,
                                 completion_event::remote,
                                 completion_event::operation>(),
                    "unsupported completion event requested");
    

    I have no objection to changing completion events from an enum to classes. The actual definitions are not exposed in the interface. It would just require changing completion_event to typename in the template parameters and replacing the completion_event::* with *_cx_event. I've successfully tested a version that does this.

  8. Amir Kamil

    I've pushed the code here to the upcxx repo under the new_completion branch. I integrated argument binding for rpc. I did not (and don't plan to) integrate it with rput/rget, as I don't think I understand the guts well enough to do so.

  9. Dan Bonachea

    Some additional comments after further thought:

    1. We should ideally provide static checking to assert the completion for any RMA initations includes either remote completion or operation completion (or both). In particular, it's always an error to issue an RMA with only source or nil completion, because that's equivalent to permanently discarding the target memory (and additionally makes it impossible to safely finalize the library).
    2. Similarly, completions that return a future would ideally assert get compiler annotations to warn if the return value is discarded (ie GNU __attribute__((__warn_unused_result__))), because asking for a future and then discarding it is usually (always?) a mistake, and one that we should warn about when possible.
    3. Finally, I think we probably want a concise syntax to say "stall this put/rpc injection for source completion" (ie don't return from the initiation until the source memory is serialized or copied). This is both for conciseness of expression, but more importantly as a performance hint - eg if the caller intends to stall for source completion without overlapping anything else, he might as well tell us up front so we can heuristically prioritize source completion (eg serializing or copying the source to a bounce buffer ASAP) and possibly reduce the length of that stall.

    Requesting synchronous source completion could potentially be presented as a fourth completion mechanism, eg source_cx::as_blocking(). The same mechanism could possibly also be used to provide the blocking RMA requested in issue #28 (eg operation_cx::as_blocking()) There might even be a motivation to provide an additional blocking mechanism ::as_waiting(), where the former blocks while advancing only internal progress (potentially dangerous if not used with care), and the latter blocks while advancing user progress (and might run callbacks, but should never create parallel deadlock).

  10. Log in to comment