Outgoing progress semantic guarantees too weak for practical use

Issue #79 resolved
Dan Bonachea created an issue

wait() should probably be defined to include a flush(), or something similar.

Otherwise wait() on a future created by message injection could deadlock in the presence of outgoing message aggregation that is not advanced by internal progress.

In particular, the "C++14-compliant definition of wait" given in section 5.4 is probably insufficient.

Requiring wait to include a full flush is sufficient to solve this problem, but not necessary and probably overkill (eg especially if the future argument is already ready). We just need to ensure that wait will eventually drain any internal queues on this rank that may be blocking fulfillment of its arguments.

Comments (7)

  1. Dan Bonachea reporter

    Thinking more about this, I suspect it may be a more general problem with any blocking call, if we permit an implementation that allows outgoing messages to be aggregated and stalled locally over an unbounded number of internal-progress calls.

    Consider:

    int done = 0; // global variable flag
    ...
    if (rank_me() == 0) { 
      rpc_ff(1, // send an rpc_ff to rank 1
              []() { done = 1; /* signal RPC arrival */ });
    } else { // assume 2 ranks, ie rank == 1
      do { progress(); } while (!done);  // wait for RPC arrival
    }
    barrier();
    

    If this rpc_ff is permitted to stall inside outgoing buffers on rank 0 until the next flush operation, I believe this code could technically deadlock, because barrier does not currently guarantee a flush. This is despite the fact this program is fully "attentive" to UPC++.

    This particular case could be solved by adding an implicit flush or eventual flush-like behavior in barrier(). However the same problem arises if the barrier above is replaced by team::split(), or any other blocking collective. Similarly if it's replaced by MPI_Barrier(). Similarly if it's replaced by an acknowledgement rpc_ff with a progress spin-loop, eg:

    int done = 0; // global variable flag
    ...
    if (rank_me() == 0) { 
      rpc_ff(1, // send an rpc_ff to rank 1
              []() { 
                 done = 1; /* signal RPC arrival to 1 */
                 rpc_ff(0, // send explicit acknowledgement to rank 0
                           []() { done = 1; }) /* signal acknowledgement arrival to 0 */
              });
      do { progress(); } while (!done);  // wait for acknowledgement
    } else { // assume 2 ranks, ie rank == 1
      do { progress(); } while (!done);  // wait for RPC arrival
    }
    

    I think the upshot of allowing unbounded outgoing message aggregation is that in general users may need a flush() call before entering any blocking call or any spin-wait loop to avoid deadlock. Do we really want users sprinkling flush() calls all over their application, even in code that is "attentive" to UPC++? If so, we definitely need to clarify that requirement, probably updating this paragraph in 9.3 (emphasis added):

    In addition to operations requiring internal progress, some operations won’t initiate until additional operations are announced. A good example is an implementation that performs an outgoing message-bundling optimization: a single bundled super-message might be produced and sent only when enough messages to the same destination have been observed (so that their combined sizes exceed a threshold for instance). In such a case, no amount of internal progress will trigger the threshold, so the flush operation is required to tell the UPC++ runtime that any delayed operations should be initiated. As a convenience, the flush function also induces a discharge, so a well-behaved UPC++ application is encouraged to call flush before any long lapse of attentiveness to progress.

    An alternate solution to all these problems would be to strengthen the semantic of internal progress to guarantee that calls to internal progress will eventually attempt to inject any outgoing operations (contrary to what is stated above). The definition of "eventually" could be left as an implementation tuning parameter, provided it is finite in both time and call count.

  2. Former user Account Deleted

    It was never the intention that the implementation could implicitly decide to stall things indefinitely until flushed. Indefinitely aggregating operations (bulk_rpc) were meant to be explicit, but we never spec'd any. Hopefully all comm-ops in the spec state that the operation is guaranteed to reach completion eventually with enough internal progress. Since operations requiring flush don't exist, it might be best to drop this function completely to not create confusion.

  3. Dan Bonachea reporter

    Since operations requiring flush don't exist, it might be best to drop this function completely to not create confusion.

    I'm also in favor of deleting the flush function and the two paragraphs quoted in comment 1 above from the spec. They just raise too many questions and concerns.

    If and when we add interfaces to perform explicitly aggregated communication, we can add the corresponding flush calls along with them.

  4. Log in to comment