Quiescence issues with parallel backend

Issue #175 resolved
Alexander Pöppl created an issue

Hi,

(sorry if that's the wrong place to post this)

There seems to be an issue with quiescence when it comes to using the parallel backend of my application. I use (atomically updated) counters to keep track of RPCs and LPCs in flight in my application. In some cases, there seem to be multiple LPCs in flight, and no matter how many calls to progress I perform, the don’t seem to decrease. In other cases it terminates. After the main code of my application finishes, I tried to call progress until all the counters were zero, however that would never happen. My first case would be a race condition with the counter variables, but I make sure that they are updated atomically, which should prevent these kind of issues.

As an alternative, I also tried the following:

if (config::isGasnetSequentialBackend) {
    // […SNIP…]
} else {
    auto start = std::chrono::steady_clock::now();
    while (activeActors.load() > 0) {
        upcxx::progress();
    }
    auto end = std::chrono::steady_clock::now();
    runTime = std::chrono::duration<double, std::ratio<1>>(end - start).count();

    for (auto &actorPairs : actors) {
        if (actorPairs.second.where() == upcxx::rank_me()) {
            auto aRef = *(actorPairs.second.local());
            aRef->actorThread.join();
            aRef->actorThread = std::thread();
            std::cout << aRef->name << " thread terminated." << std::endl;
        }
    }
} 
std::cout << "messages in flight: RPCs: " << rpcsInFlight << " LPCs: " << lpcsInFlight << std::endl;
// Drain the queues, we want no more messages in flight.
while (rpcsInFlight.load() > 0 || lpcsInFlight.load() > 0) {
    upcxx::progress();
    for (auto &actorPairs : actors) {
        if (actorPairs.second.where() == upcxx::rank_me()) {
            Actor *a = *(actorPairs.second.local());
            // Always evaluates to true. Why? there are no more other threads active, they are all joined above.
            if (a->actorPersona->active()) {
                std::cout << "persona of " << a->name << " still active!" << std::endl;
                continue;
            }
            upcxx::persona_scope ps(*a->actorPersona);
            upcxx::discharge(ps);
        }
    }
}

However, the if statement (yellow) always evaluates to true. If I omit it, there is an assertion failure. Apparently that persona is still active somewhere. I don’t know where, however. I checked, and there are no other threads active at this point, and this thread did not assume that persona prior to that if statement. I didn't see anything in the spec that would help me with this issue.

Best, Alex

Comments (3)

  1. Dan Bonachea

    Hi Alex -

    I suspect we'll need to either see more of your code or a small complete example to figure this out. Specific questions I have already:

    1. How many threads are running the code you showed, and do any have the master persona?
    2. What does the code for injecting RPC/LPC and incrementing/decrementing the counters look like?
    3. What does the persona management look like for the actor threads?

    For what it's worth, you should also insert a call to upcxx::progress() inside the scope of ps beside the discharge -- because discharge only makes internal-level progress and will not run any of your callbacks. Note the call to progress in the outer loop won't progress personas that are not currently active on the stack of the calling thread.

  2. Alexander Pöppl reporter

    I think this was an issue with my code. I talked to John about it, and my actual problem was solved using rank-internal barriers. The issue in the listing here is due to me using the initial personas on threads that finished executing. Apparently those enter an undefined state after thread execution finishes, and therefore performing actions on them will fail (@jdbachan did I get this right?).

  3. john bachan

    Yes, default personas are undefined when their principal thread terminates. Glad you were able to solve this!

  4. Log in to comment