Remove use of r-value reference for value argument to value collectives

Issue #155 resolved
Max Grossman created an issue

In the current revision of the specification, upcxx::reduce_all returns RType. This type does not seem to be defined anywhere for any of the variants of upcxx::reduce_all, but should be upcxx::future<T> (I believe?).

Additionally, the current behavior of upcxx::reduce_all is to return the final result of the reduction in that returned future but this behavior is not specified in the specification text. This has led to user confusion multiple times because the value to reduce is passed in by reference, which can seem to imply that the results are written back to that location. Hence, users wait on but do not retrieve the reduced value from the returned future and then accidentally use the local input as the reduced result.

Comments (7)

  1. Dan Bonachea

    RType is used throughout the spec and is defined in section 7.2.2 (as referenced by the index for "RType"). Part of what that section explains is that for the default future-based completion, the future will contain whatever values are "produced" by the operation completion (as specified in each operation).

    reduce_all has both value and bulk variants, with different output behavior:

    1. In the bulk variants, the semantics (para 31-32) explain output is in the explicit destination buffer and the operation completion section says these variants produce no completion value, hence the default RType return for bulk reduce is an empty future<>.
    2. For the value variants, the input is via r-value reference (&&) which is the C++ way of saying "this is an in-only parameter" (not to be confused with & which means "in+out"). An C++ programmer should already be expecting no output there (as the caller has relinquished ownership of the input value, so it wouldn't make sense to look at it later for an output). The operation completion section define these variants to produce a value of type T, hence the default RType return for value reduce is future<T>.

    It's notable that the fact this rvalue-reference input argument type involves a template argument might hide some useful compile errors in this regard, because passing, say an int l-value as the input argument and relying on template inference , eg:

    int x = 42;
    upcxx::future<int> f = upcxx::reduce_all(x, upcxx::op_fast_add);
    

    will actually infer the template argument as T=int& and pass an r-value reference to that (ie int&&&). I think it's likely alot of current real uses of value reduce (and value broadcast) are actually falling into this confusing corner case without realizing it.

    @Amir Kamil and @john bachan: why did we use an r-value reference T&& for the value argument to reduce/broadcast instead of just simple pass-by-value T (as we currently use for value rput)? I realize copy-in costs are a concern, but it seems like the same concern applies equally to both value RMA and value collectives and we should probably be consistent. I suspect the historical reason is the additional global_ptr<T> argument to rput constrains the type inference to exclude T=int&, making T&& an unwieldy value input type in practice for RMA. However relying on this surprising int&&& inference behavior for common uses of value collectives seems like a poor design choice that also might encourage confusion in less experienced C++ programmers.

    @Max Grossman Aside from adding better examples in the programming guide, do you have a concrete suggestion for improving the current spec text for reduce?

  2. john bachan

    The biggest reason I can imagine for my former self choosing this universal T&& type is to most efficiently support the various cases of copying needed for deferred serialization of non-trivial types. There are four cases coming from two binary dimensions: dim1 = user gives us const-lvalue or rvalue, dim2 = the runtime decides to serialize now or deferred. When the runtime serializes now, both cases of dim1 are handled the same, just serialize the referred-to value into a buffer and return. But when serialization is deferred, then a copy must be made and stashed internally. A const lvalue means the user wants us to make the copy so they can go on using their value after the injection. An rvalue means the user has decided they no longer need their value and we can just steal it eliding any need to copy. Universal references give the user the flexibility to pass in the reference that is most convenient for them, while allowing the runtime to perform no copies in either case when pressure is low.

    Since non-trivial types were scrubbed from the spec'd collectives the universal reference makes a lot less sense since it breaks with convention from pre-11 C++. I would argue to switch to naked T values since those don't have any risk of lifetime issues that references do, they get copied into the returned future<T> anyway later, and the vector forms exist specifically to elide copying overhead.

  3. Dan Bonachea

    I concur with John - the value argument type to the value collectives should be changed to a simple by-value T.

    I believe we've already resolved that any future introduction of collectives for non-trivially serializable T would use a new function name (in order to make the lack of offload capability syntactically explicit), so we shouldn't constrain the signatures of the current trivial value collectives to accommodate that potential future extension.

    I believe this simply means changing the signature of the following three functions to instead take T value:

    reduce_one(T &&value,...);
    reduce_all(T &&value,...);
    broadcast(T &&value,...);
    
  4. Log in to comment