Clarify requirements on master progress (eg. for OpenMP)

Issue #101 resolved
Dan Bonachea created an issue

The spec is currently ambiguous about when (if ever) it is safe for the master persona to sleep indefinitely.

Eg rpc_ff spec:

So long as the sending persona continues to make internal-level progress it is guaranteed that the message will eventually arrive at the recipient. See §9.5.3 progress_required for an understanding of how much internal-progress is necessary.

But progress_required spec somewhat contradicts this:

A return value of false means that none of the non-master personas indicated by ps requires further internal-progress, but the master persona may or may not require further internal-progress

Master progress is unambiguously required to process incoming RPCs.

However we need to explicitly clarify whether master progress is required to ensure progress of outgoing operations initiated by non-master personas, or acknowledgements to those operations. Our current implementation (after a recent fix) does not require master progress for these cases, but we may want to leave the door open to implementation strategies that would require it.

At the 9/27 meeting we resolved to defer this clarification to next release.

Comments (6)

  1. Dan Bonachea reporter

    This issue was discussed in the 1/10/18 meeting, and we resolved to implement a decision by the September release.

    John advocates making master-is-attentive a requirement, as doing so would allow implementation to funnel work to master (for instance). He argues the current state doesn’t “buy” the user anything, however I'm not sure that's true and I'm especially worried about the interoperability implications of this adding requirement.

    In particular consider an OpenMP code that enters a parallel region where each thread issues an rpc(peer,...).wait() and then hits the thread-sleep barrier at the end of the parallel region (for simplicity assume the peer is running different code that is known to be attentive to master progress). The code on this node could run afowl of a restriction requiring master progress for outgoing rpc, because the OpenMP thread that happens to hold the master persona could finish its job and hit the sleep before the other threads have injected and/or gotten their acknowledgements, causing deadlock. The stronger requirement would force this code to set aside an independent thread to hold and progress the master outside the parallel region. This is similar to the problem we actually encountered in the OpenMP example in guide issue #1 that started this discussion, with the proposed fix described in guide pull request #5.

  2. Dan Bonachea reporter
    • removed responsible
    • marked as critical

    This issue was triaged at the 2018-06-13 Pagoda meeting and assigned a new milestone/priority.

    This was given critical priority to specify because it heavily impacts correctness and interoperability with OpenMP.

  3. Log in to comment