dist_object can hang execution
Issue #129
resolved
so noticed this when trying to make an rput_irregular example
dist_object<uvector> dParticles(uvector(500));
future<global_ptr<particle_t> > hiVectorF = rpc(nebrHi, [](dist_object<uvector>& d){
return global_ptr<particle_t>(&((*d).front()));}, dParticles);
std::cout<<"rpc called "<<me<<std::endl;
hiVectorF.wait();
std::cout<<"pointer arrived "<<me<<std::endl;
can produce output like this:
rpc called 5
rpc called 0
rpc called 2
rpc called 4
pointer arrived 4
rpc called 3
rpc called 7
rpc called 1
pointer arrived 0
pointer arrived 2
pointer arrived 5
rpc called 6
pointer arrived 1
pointer arrived 7
pointer arrived 6
In this case, rank 3 does not return from the wait call. The code hangs and the processes stay pegged at full CPU rate.
Comments (4)
-
-
@bvstraalen - you left out a very important part of your code which is whatever lines come next.
In particular, the code shown has an RPC race - each rank will service RPC's while waiting for the acknowledgement to their own RPC, but may exit that wait (and fall off the end of your snippet) before servicing all incoming RPCs. You need some other later call with user-level progress (like a barrier) to ensure global quiescence and completion of all the RPCs.
-
reporter This is a quiescence error on my part.
-
reporter - changed status to resolved
- Log in to comment
Issue
#130was marked as a duplicate of this issue.