- changed status to resolved
Incorrect specification for `dist_id<T>::when_here` future readying
The 2022.9.0 spec for the two (non-move) dist_object
constructors include the text:
The future returned from
dist_id<T>::when_here
for the correspondingdist_id<T>
will be readied during this constructor. This implies that continuations waiting for that future will execute before the constructor returns.
Everything about this statement is wrong, and does not match how the current implementation actually works. In reality, any future returned from dist_id<T>::when_here()
is readied during the next user-level progress of the caller, and continuations scheduled on that future will never execute before the constructor returns.
Example program:
#include <upcxx/upcxx.hpp>
#include <iostream>
#include <unistd.h>
#include "util.hpp"
using namespace upcxx;
int main() {
init();
say() << "Hello";
if (rank_me() == 0) {
sleep(1);
say() << "starting progress";
for (int i=0; i < 100; i++) upcxx::progress();
say() << "ending progress";
sleep(1);
}
say() << "entering constructor";
dist_object<int> d(rank_me());
say() << "left constructor";
if (rank_me() != 0) {
rpc(0, [](dist_id<int> id, int src) {
say() << "began RPC callback from " << src;
future<dist_object<int>&> f = id.when_here();
UPCXX_ASSERT_ALWAYS(!f.ready());
return f.then([=](dist_object<int>&) {
say() << "when_here callback from " << src;
});
}, d.id(), rank_me()).wait();
}
say() << "starting barrier";
upcxx::barrier();
say() << "ending barrier";
upcxx::finalize();
return 0;
}
Example output using UPC++ 2022.9.0 with 4 ranks on dirac/smp:
[3] Hello
[2] Hello
[1] Hello
[2] entering constructor
[0] Hello
[3] entering constructor
[1] entering constructor
[3] left constructor
[2] left constructor
[1] left constructor
[0] starting progress
[0] began RPC callback from 3
[0] began RPC callback from 2
[0] began RPC callback from 1
[0] ending progress
[0] entering constructor
[0] left constructor
[0] starting barrier
[0] when_here callback from 1
[0] when_here callback from 2
[0] when_here callback from 3
[2] starting barrier
[1] starting barrier
[3] starting barrier
[3] ending barrier
[1] ending barrier
[2] ending barrier
[0] ending barrier
Other permutations of output lines across ranks differ from run-to-run, but the lines "when_here callback from X" never occur during rank 0's call to the dist_object
constructor, they always occur during the next user-level progress on rank 0 (the barrier).
I've confirmed this behavior by source inspection of dist_object.hpp and experimentally going back to release 2018.9.0 (where the behavior changed from something approximating the obsolete spec text to the current behavior it's had for the past four years).
We need to update these sentences in the spec to match the current behavior (which is also the preferred behavior, IMO).
Comments (1)
-
reporter - Log in to comment
Resolved in pull request #97, merged at 8b6cd38