Outgoing progress semantic guarantees too weak for practical use

Dan Bonachea reporter

Thinking more about this, I suspect it may be a more general problem with any blocking call, if we permit an implementation that allows outgoing messages to be aggregated and stalled locally over an unbounded number of internal-progress calls.

Consider:

int done = 0; // global variable flag
...
if (rank_me() == 0) { 
  rpc_ff(1, // send an rpc_ff to rank 1
          []() { done = 1; /* signal RPC arrival */ });
} else { // assume 2 ranks, ie rank == 1
  do { progress(); } while (!done);  // wait for RPC arrival
}
barrier();

If this rpc_ff is permitted to stall inside outgoing buffers on rank 0 until the next flush operation, I believe this code could technically deadlock, because barrier does not currently guarantee a flush. This is despite the fact this program is fully "attentive" to UPC++.

This particular case could be solved by adding an implicit flush or eventual flush-like behavior in barrier(). However the same problem arises if the barrier above is replaced by team::split(), or any other blocking collective. Similarly if it's replaced by MPI_Barrier(). Similarly if it's replaced by an acknowledgement rpc_ff with a progress spin-loop, eg:

int done = 0; // global variable flag
...
if (rank_me() == 0) { 
  rpc_ff(1, // send an rpc_ff to rank 1
          []() { 
             done = 1; /* signal RPC arrival to 1 */
             rpc_ff(0, // send explicit acknowledgement to rank 0
                       []() { done = 1; }) /* signal acknowledgement arrival to 0 */
          });
  do { progress(); } while (!done);  // wait for acknowledgement
} else { // assume 2 ranks, ie rank == 1
  do { progress(); } while (!done);  // wait for RPC arrival
}

I think the upshot of allowing unbounded outgoing message aggregation is that in general users may need a flush() call before entering any blocking call or any spin-wait loop to avoid deadlock. Do we really want users sprinkling flush() calls all over their application, even in code that is "attentive" to UPC++? If so, we definitely need to clarify that requirement, probably updating this paragraph in 9.3 (emphasis added):

In addition to operations requiring internal progress, some operations won’t initiate until additional operations are announced. A good example is an implementation that performs an outgoing message-bundling optimization: a single bundled super-message might be produced and sent only when enough messages to the same destination have been observed (so that their combined sizes exceed a threshold for instance). In such a case, no amount of internal progress will trigger the threshold, so the flush operation is required to tell the UPC++ runtime that any delayed operations should be initiated. As a convenience, the flush function also induces a discharge, so a well-behaved UPC++ application is encouraged to call flush before any long lapse of attentiveness to progress.

An alternate solution to all these problems would be to strengthen the semantic of internal progress to guarantee that calls to internal progress will eventually attempt to inject any outgoing operations (contrary to what is stated above). The definition of "eventually" could be left as an implementation tuning parameter, provided it is finite in both time and call count.

2017-07-16T02:24:01+00:00

Comments (7)

Dan Bonachea reporter
Thinking more about this, I suspect it may be a more general problem with any blocking call, if we permit an implementation that allows outgoing messages to be aggregated and stalled locally over an unbounded number of internal-progress calls.

Consider:
```
int done = 0; // global variable flag
...
if (rank_me() == 0) { 
  rpc_ff(1, // send an rpc_ff to rank 1
          []() { done = 1; /* signal RPC arrival */ });
} else { // assume 2 ranks, ie rank == 1
  do { progress(); } while (!done);  // wait for RPC arrival
}
barrier();
```
If this rpc_ff is permitted to stall inside outgoing buffers on rank 0 until the next flush operation, I believe this code could technically deadlock, because barrier does not currently guarantee a flush. This is despite the fact this program is fully "attentive" to UPC++.

This particular case could be solved by adding an implicit flush or eventual flush-like behavior in barrier(). However the same problem arises if the barrier above is replaced by team::split(), or any other blocking collective. Similarly if it's replaced by MPI_Barrier(). Similarly if it's replaced by an acknowledgement rpc_ff with a progress spin-loop, eg:
```
int done = 0; // global variable flag
...
if (rank_me() == 0) { 
  rpc_ff(1, // send an rpc_ff to rank 1
          []() { 
             done = 1; /* signal RPC arrival to 1 */
             rpc_ff(0, // send explicit acknowledgement to rank 0
                       []() { done = 1; }) /* signal acknowledgement arrival to 0 */
          });
  do { progress(); } while (!done);  // wait for acknowledgement
} else { // assume 2 ranks, ie rank == 1
  do { progress(); } while (!done);  // wait for RPC arrival
}
```
I think the upshot of allowing unbounded outgoing message aggregation is that in general users may need a flush() call before entering any blocking call or any spin-wait loop to avoid deadlock. Do we really want users sprinkling flush() calls all over their application, even in code that is "attentive" to UPC++? If so, we definitely need to clarify that requirement, probably updating this paragraph in 9.3 (emphasis added):

In addition to operations requiring internal progress, some operations won’t initiate until additional operations are announced. A good example is an implementation that performs an outgoing message-bundling optimization: a single bundled super-message might be produced and sent only when enough messages to the same destination have been observed (so that their combined sizes exceed a threshold for instance). In such a case, no amount of internal progress will trigger the threshold, so the flush operation is required to tell the UPC++ runtime that any delayed operations should be initiated. As a convenience, the flush function also induces a discharge, so a well-behaved UPC++ application is encouraged to call flush before any long lapse of attentiveness to progress.

An alternate solution to all these problems would be to strengthen the semantic of internal progress to guarantee that calls to internal progress will eventually attempt to inject any outgoing operations (contrary to what is stated above). The definition of "eventually" could be left as an implementation tuning parameter, provided it is finite in both time and call count.
- 2017-07-16T02:24:01+00:00
Dan Bonachea reporter
- changed title to Outgoing progress semantic guarantees too weak for practical use
- marked as major
- 2017-08-25T23:57:11+00:00
Dan Bonachea reporter
- changed component to Progress
- 2017-09-14T04:46:57+00:00
Former user Account Deleted
It was never the intention that the implementation could implicitly decide to stall things indefinitely until flushed. Indefinitely aggregating operations (bulk_rpc) were meant to be explicit, but we never spec'd any. Hopefully all comm-ops in the spec state that the operation is guaranteed to reach completion eventually with enough internal progress. Since operations requiring flush don't exist, it might be best to drop this function completely to not create confusion.
- 2017-09-15T18:44:18+00:00
Dan Bonachea reporter
Since operations requiring flush don't exist, it might be best to drop this function completely to not create confusion.

I'm also in favor of deleting the flush function and the two paragraphs quoted in comment 1 above from the spec. They just raise too many questions and concerns.

If and when we add interfaces to perform explicitly aggregated communication, we can add the corresponding flush calls along with them.
- 2017-09-15T18:58:19+00:00
Dan Bonachea reporter
- changed milestone to 2017.09.30 release
- assigned issue to
  
  Dan Bonachea
- 2017-09-15T19:08:03+00:00
Dan Bonachea reporter
- changed status to resolved
fix issue ~~#79~~: Remove flush

The upcxx::flush() operation is removed, along with related spec prose.

→ <<cset 99eca3b85698>>
- 2017-09-17T08:30:27+00:00
Log in to comment