Prohibit communication using non-master personas in SEQ mode

Issue #423 resolved
Dan Bonachea created an issue

Test program:

#include <upcxx/upcxx.hpp>
#include <iostream>
#include <assert.h>

using namespace upcxx;

int main() {
  upcxx::init();

  global_ptr<int> gp1 = new_<int>(0);
  dist_object<global_ptr<int>> dobj(gp1);
  global_ptr<int> gp = dobj.fetch((rank_me()+1)%rank_n()).wait();

  persona p;
  {
    persona_scope ps(p);
    assert(&upcxx::current_persona() == &p);

    auto f1 = upcxx::rput(4, gp)
              .then([&]() {
                assert(p.active_with_caller());
                assert(upcxx::master_persona().active_with_caller());
                assert(&upcxx::current_persona() != &upcxx::master_persona());
                assert(&upcxx::current_persona() == &p);
              });
    f1.wait();
  }

  upcxx::barrier();
  if (!upcxx::rank_me()) std::cout<<"SUCCESS"<<std::endl;
  upcxx::finalize();
  return 0;
}

In -threadmode=par, this program runs successfully.

However in -threadmode=seq, with 2 or more (non-PSHM) nodes (forcing completion to be delayed), the program fails the following assertion:

a.out: comp-pers2.cpp:23: main()::<lambda()>: Assertion `&upcxx::current_persona() != &upcxx::master_persona()' failed.

This failure indicates the completion callback was run with a completion stack containing both personas, but the "top" of the completion stack is erroneously the master persona instead of the injecting persona. This violates spec 10.4-5:

All asynchronous UPC++ operations will have their notification events (signaling of futures or promises) sent to the current persona of the OS thread invoking the operation

As a result, further operations injected within the callback will use the wrong persona.

Based on limited poking, this defect appears to only affect rput and not rget or remote atomics.

Given the heavy threading restrictions under -threadmode=seq, it's unknown whether this can actually lead to incorrect behavior, aside from failing assertion queries such as shown in the test program.

Comments (12)

  1. Amir Kamil

    I’ve dug into this, and it’s a much bigger issue than we previously realized.

    I believe that every asynchronous communication operation suffers from this bug. Here’s a modification to the code above that reproduces it for rget:

    19,20c19,20
    <     auto f1 = upcxx::rput(4, gp)
    <               .then([&]() {
    ---
    >     auto f1 = upcxx::rget(gp)
    >               .then([&](int) {
    

    The core problem is that communication operations enqueue the operation-completion handler on backend::gasnet::get_handle_cb_queue() (either directly or through backend::gasnet::register_cb()), which has the following implementation:

      inline handle_cb_queue& get_handle_cb_queue() {
        UPCXX_ASSERT_MASTER_IFSEQ();
    
        #if UPCXX_BACKEND_GASNET_SEQ
          return gasnet::master_hcbs;
        #elif UPCXX_BACKEND_GASNET_PAR
          return upcxx::current_persona().backend_state_.hcbs;
        #endif
      }
    

    With -threadmode=seq, the handler always gets enqueued on gasnet::master_hcbs, which is serviced by the master persona in progress().

    I don’t know what the right fix is. With -threadmode=seq, individual personas do not have handle queues, so we’d either have to add them or figure out some other fix.

    I’ll also add that I don’t think the specification is clear that callbacks are run with the target persona at the top of the persona stack. I think that should be clarified. (It’s already what the implementation does, this issue notwithstanding.)

  2. Dan Bonachea reporter

    I agree this seems like a problem we should eventually address (although it's definitely too late in this release cycle for anything but documentation changes).

    The "spec" for SEQ-mode behavior is contained in docs/implementation-defined.md, which includes the following:

    Whereas "par-mode" libupcxx permits the full generality of the UPC++ specification with respect to multi-threading concerns, "seq" imposes these additional restrictions on the client application:

    • Only the thread which invokes upcxx::init() may ever hold the master persona. This thread is regarded as the "primordial" thread.

    • Any upcxx routine with internal or user-progress (typically inter-process communication, e.g. upcxx::rput/rget/rpc/...) must be called from the primordial thread while holding the master persona. There are some routines which are excepted from this restriction and are listed below.

    The idea here is only the unique primordial thread ever talks to GASNet, either for injection or completion of network comms, and that thread is always holding the master persona while doing it. This allows us to skip all locking inside the handle queue and GASNet library.

    One potential "solution" here is to document that in SEQ mode any asynchronous events generated by routines with internal progress (those governed by the text above) always become associated with the master persona (which MUST be held by the calling thread, as per above), regardless of what persona is currently at the top of the stack. That means future/promise completions on those events should eventually be delivered to the master persona, making the "anomalous" behavior correct (ie "solve" this by documenting it working-as-intended, although we might need to ensure all such cases have that behavior).

    The suggestion of converting SEQ to have per-persona handle queues seems unpalatable; because we need to maintain the competing thread-safety properties that only the primordial thread reaps the handles in those queues, and that only the thread holding the persona associated with the operation signals the associated promise. So for the contrived example in the original report, if persona p were passed to another thread after initiation, the primordial thread still needs to track that handle in order for it to ever be completed; either because it's in the one-and-only handle queue (the current approach), or (if we added per-persona handle queues) because the primordial thread progressed the handle queues of every existing persona (which we don't currently track). In order for comm completion to be correctly delivered to non-master personas, the primordial thread detecting handle completion would need to arrange to signal the associated promise potentially owned by a non-primordial thread in a way that doesn't break the thread-safety properties (super yuck). Even if we added the bookkeeping to permit this, I think in general this would require sending an LPC from the primordial thread who detects handle completion to the owning persona on a different thread just to signal handle completion, which might result in SEQ mode overheads for communication completion that are HIGHER than in PAR mode. I'm not seeing a high-performance solution for this approach which doesn't potentially sacrifice the intended benefits of SEQ mode (which is intended to be an optimization for apps where only the primordial thread ever communicates).

    CC: @john bachan

  3. Amir Kamil

    I have no objections to spec’ing this away. I think the implementation changes should be pretty simple. Most of it should be taken care of by making master_persona() the default for SEQ mode in backend::fulfill_during<level>() :

    diff --git a/src/backend_fwd.hpp b/src/backend_fwd.hpp
    index ee07ed8d..62c99d4b 100644
    --- a/src/backend_fwd.hpp
    +++ b/src/backend_fwd.hpp
    @@ -289,13 +289,21 @@ namespace backend {
       void fulfill_during(
           detail::future_header_promise<T...> *pro, // takes ref
           std::tuple<T...> vals,
    -      persona &active_per = current_persona()
    +      #if UPCXX_BACKEND_GASNET_PAR
    +        persona &active_per = current_persona()
    +      #else
    +        persona &active_per = master_persona()
    +      #endif
         );
       template<progress_level level, typename ...T>
       void fulfill_during(
           detail::future_header_promise<T...> *pro, // takes ref
           std::intptr_t anon,
    -      persona &active_per = current_persona()
    +      #if UPCXX_BACKEND_GASNET_PAR
    +        persona &active_per = current_persona()
    +      #else
    +        persona &active_per = master_persona()
    +      #endif
         );
    
       template<progress_level level, typename Fn>
    

    There are also a few places where we increment or decrement undischarged_n_ on current_persona(), which may need to change to master_persona() in SEQ mode.

    The only test case that I am aware of that needs updating is copy.cpp.

  4. Dan Bonachea reporter

    implementation changes should be pretty simple. Most of it should be taken care of by making master_persona() the default for SEQ mode in backend::fulfill_during<level>()... few places where we increment or decrement undischarged_n_ on current_persona(), which may need to change to master_persona() in SEQ mode.

    It's definitely more complicated than this. Other examples are where we make_lpc_dormant(upcxx::current_persona()) in completions.hpp for promise/future completions. We'd basically need to audit every use of current_persona() in the runtime and fix it.

    Another possible way to "spec this away" would be to strengthen the SEQ-mode communication restriction even further, eg:

    Any upcxx routine with internal or user-progress (typically inter-process communication, e.g. upcxx::rput/rget/rpc/...) must be called from the primordial thread with the master persona at the top of the active persona stack. There are some routines which are excepted from this restriction and are listed below.

    IOW change communication requirement from "the primordial thread holding the master persona" to "the primordial thread using the master persona", codifying the fact that only the master persona (with the primordial thread) is permitted to act in interprocess communication. This would disqualify the test in the original report and test/copy.cpp as invalid in SEQ mode (we'd fix the test).

    This route should be fully consistent with proposed spec clarifications about persona stack top matching the initiating persona during completions. It should also be less intrusive a runtime change, because it just adds constraint to the client and otherwise remains consistent with how PAR mode works. A trivial change to the implementation of UPCXX_ASSERT_MASTER_IFSEQ() would enforce the strengthened rule in SEQ+DEBUG-mode, which should be sufficient to diagnose most incorrect uses of non-master personas for comm initiation (although this is a quality-of-implementation matter that need not reach 100% coverage).

    Regarding user impact, I think it would be very low. In SEQ mode, personas really only exist for the purpose of LPC. Interprocess communication already requires the primordial thread with the master persona (which can never be passed in SEQ mode), so we might as well additionally require the master persona to be at the top of the stack. I think the only fundamental activity this prohibits is having the primordial thread initiate comms on a non-master persona p and then move p to a different thread for reaping the completions, but that's exactly the behavior we don't want to implement. There's also a simpler way to accomplish what the application probably wanted, which is to instead use as_lpc() completion directly targeting the persona of the non-primordial thread - we would continue to support this, and it's likely more efficient at the app level since it doesn't require threads to synchronize for the purpose of persona-passing.

  5. Dan Bonachea reporter

    it would require auditing every use of current_persona(). But there aren’t very many

    It's not as simple as that - there are also indirect uses of current_persona in the runtime that instead use detail::the_persona_tls->get_top_persona(), so those would need to be updated as well.

  6. Amir Kamil

    Fair enough. Seems like your proposal is easier to implement, and if we believe that the effect on users will be minimal, we should go with that.

  7. Log in to comment