Clarify that lpc/rpc never execute synchronously inside injection

Issue #105 resolved
Dan Bonachea created an issue

A question arose in this week's meeting about whether "loopback" lpc/rpc operations are permitted to execute synchronously inside injection operations, when that injection is performed within the dynamic scope of user progress (ie what the spec labels "restricted context").

The current spec contains language for upcxx::rpc:

[the rpc] is enlisted for execution during user-level progress of the master persona.

and upcxx::persona::lpc:

std::move’s func into an unordered collection of type-erased function objects to be executed during user-level progress of the targeted (this) persona.

The specified behavior is clear when the injection call is made outside the scope of any upcxx calls. However it's currently ambiguous what may happen if the injection is performed in a callback that is already inside the dynamic scope of a call to upcxx::progress(), most importantly for the case of "loopback" lpc/rpcs (those targetting the current persona/rank).

Consider the following program, which injects loopback rpcs and lpcs within the scope of a progress call:

#include <upcxx/upcxx.hpp>
#include <cassert>
#include <iostream>

int x = 0;

int main() {
  upcxx::init();

  upcxx::rpc(upcxx::rank_me(),[]() {
    int tmp1 = x;
    upcxx::rpc(upcxx::rank_me(),[]() { x++; }).then([]() { x++; });
    upcxx::default_persona().lpc([]() { x++; }).then([]() { x++; });
    upcxx::master_persona().lpc([]() { x++; }).then([]() { x++; });
    assert(x == tmp1);
    x = tmp1 + 1;
  }).then([]() { x++; });

  assert(x == 0);
  while (x < 8) upcxx::progress();
  assert(x == 8);

  upcxx::default_persona().lpc([]() {
    int tmp1 = x;
    upcxx::rpc(upcxx::rank_me(),[]() { x++; }).then([]() { x++; });
    upcxx::default_persona().lpc([]() { x++; }).then([]() { x++; });
    upcxx::master_persona().lpc([]() { x++; }).then([]() { x++; });
    assert(x == tmp1);
    x = tmp1 + 1;
  }).then([]() { x++; });

  assert(x == 8);
  while (x < 16) upcxx::progress();
  assert(x == 16);

  upcxx::barrier();
  if (!upcxx::rank_me()) std::cout << "Test result: SUCCESS" << std::endl;

  upcxx::finalize();
  return 0;
}

Is this program guaranteed to succeed? (it currently passes in the trials I've performed)

If the runtime is permitted to run "loopback" lpc/rpcs synchronously inside injections that fall within the restricted context, that behavior could be visible to this program. More generally, a callback running in restricted context needs to know whether other callbacks can run inside injection calls, because if so that potentially requires the callback to manage its state and resources in a reentrant manner at those points.

The injection calls are specified as UPC++ progress level: none, so that suggests those calls should not run further callbacks, but it's unclear what the progress level means when an outer stack frame is already inside user progress.

I'm advocating we clarify/strengthen the spec disallow this form of callback reentrance (ie guarantee the program above executes correctly).

Comments (8)

  1. Dan Bonachea reporter

    Would it be sufficient to modify the wording to say "during the next user-level progress"?

    I think that wording would imply it cannot execute during this same user-level progress call after the current callback returns (but before progress() returns). I don't think we want that restriction.

    I think we may want a more targeted clarification like "upcxx::{rpc,lpc}() has UPC++ progress level: none, which means it will never synchronously execute callbacks, even when invoked from a callback that is already running inside user-level progress."

  2. Dan Bonachea reporter

    are we then back to defining a progress level: none?

    "UPC++ progress level: none" still exists as a concept, it's just the enum value upcxx::progress_level::none that does not exist.

  3. Amir Kamil

    Ah, I misunderstood the desired semantics. We just want to ensure that the callback executes strictly after the current one completes, even if it might still be in the same call to progress(). Is that correct?

  4. Log in to comment