upcxx::progress_required always return 0 for rpc chain with cx as lpc
If a thread issues an RPC with a completion as LPC on another thread persona, and then calls upcxx::discharge
/ upcxx::progress_required
, the runtime will report that progress is not required.
The other thread, which has its own persona + the master persona, will never executes the RPCs.
Comments (8)
-
reporter -
@Mathias Jacquelin : Thanks for providing the code. However, I believe the assertion on line 28 of your MWE is incorrect. It is never guaranteed that progress is required after an rpc injection, because it's possible the injection call blocked until the rpc left the node.
Incidentally, it's also not guaranteed that the first call to
upcxx::progress
on thelpc_persona
will see the rpc acknowledgement (that needs to be a loop).I've altered your program to the one below, which I think should work correctly, but deadlocks when run (on dirac/smp/debug/develop) with the main thread stuck in the polling loop after printing the RPC arrival message (ie the LPC completion is never sent). Do you think altered version represents your originally reported problem?
Note that compiling with
-DSTALL
enables a polling loop before the injecting thread exits and that causes the observed deadlock to disappear, which suggests the root cause - the initiating thread is being used to forward the lpc completion, but this is not correctly reflected in progress_required.#include <upcxx/upcxx.hpp> #include <thread> #include <cassert> int main (int argc, char ** argv) { upcxx::init(); upcxx::persona lpc_persona; int done = 0; auto t1 = std::thread( [&lpc_persona,&done](){ upcxx::intrank_t nghb = ( upcxx::rank_me() + 1 ) % upcxx::rank_n(); upcxx::intrank_t sender = upcxx::rank_me(); upcxx::rpc(nghb,upcxx::operation_cx::as_lpc(lpc_persona,[nghb,sender,&done](){ /*Body of LPC*/ assert(sender == upcxx::rank_me() ); std::stringstream ss; ss<<"This is the LPC executing on "<<upcxx::rank_me()<<" and tracking RPC executing on "<<nghb<<"\n"; std::cout<<ss.str()<<std::flush; done = 1; }), [sender,nghb](){ /*body of RPC*/ assert(nghb == upcxx::rank_me() ); std::stringstream ss; ss<<"This is the RPC executing on "<<upcxx::rank_me()<<" and issued by "<<sender<<"\n"; std::cout<<ss.str()<<std::flush; }); upcxx::discharge(); assert(!upcxx::progress_required()); #if STALL while (!done) upcxx::progress(); #endif } ); { upcxx::persona_scope ps(lpc_persona); while (!done) upcxx::progress(); } upcxx::barrier(); t1.join(); if (!upcxx::rank_me()) std::cout << "SUCCESS" << std::endl; upcxx::finalize(); return 0; }
-
reporter Thank you @Dan Bonachea , I believe you are correct. This does reflect the issue I was observing.
-
- changed component to Completions
-
assigned issue to
This issue was briefly discussed in the 2019-08-07 meeting.
John is probably the only one with the expertise necessary to address this issue
-
reporter It turns out issue
#169illustrate another issue which I think is related to this one. Using -DSTALL allows the code to complete, but -DSTALL_INTERNAL does not, although no RPC are being run on t1 so user level progress shouldn’t be needed as far as I understand. Here is a modified version of the code:#include <upcxx/upcxx.hpp> #include <thread> #include <cassert> int main (int argc, char ** argv) { upcxx::init(); upcxx::persona lpc_persona; int done = 0; auto t1 = std::thread( [&lpc_persona,&done](){ upcxx::intrank_t nghb = ( upcxx::rank_me() + 1 ) % upcxx::rank_n(); upcxx::intrank_t sender = upcxx::rank_me(); upcxx::rpc(nghb,upcxx::operation_cx::as_lpc(lpc_persona,[nghb,sender,&done](){ /*Body of LPC*/ assert(sender == upcxx::rank_me() ); std::stringstream ss; ss<<"This is the LPC executing on "<<upcxx::rank_me()<<" and tracking RPC executing on "<<nghb<<"\n"; std::cout<<ss.str()<<std::flush; done = 1; }), [sender,nghb](){ /*body of RPC*/ assert(nghb == upcxx::rank_me() ); std::stringstream ss; ss<<"This is the RPC executing on "<<upcxx::rank_me()<<" and issued by "<<sender<<"\n"; std::cout<<ss.str()<<std::flush; }); upcxx::discharge(); assert(!upcxx::progress_required()); #if STALL while (!done) upcxx::progress(); #endif #if STALL_INTERNAL while (!done) upcxx::progress(upcxx::progress_level::internal); #endif } ); { upcxx::persona_scope ps(lpc_persona); while (!done) upcxx::progress(); } upcxx::barrier(); t1.join(); if (!upcxx::rank_me()) std::cout << "SUCCESS" << std::endl; upcxx::finalize(); return 0; }
-
reporter Issue
#169was marked as a duplicate of this issue. -
reporter It looks like pull request #119 solves both issues. -DSTALL_INTERNAL allows the code to complete, but is NOT required, nor -DSTALL.
-
- changed status to resolved
Resolved in pull request #119 merged at 4e577c2
- Log in to comment
Here is a MWE that illustrates the problem: assert on line 28 will fail.