Incorrect specification for `dist_id<T>::when_here` future readying

Issue #201 resolved
Dan Bonachea created an issue

The 2022.9.0 spec for the two (non-move) dist_object constructors include the text:

The future returned from dist_id<T>::when_here for the corresponding dist_id<T> will be readied during this constructor. This implies that continuations waiting for that future will execute before the constructor returns.

Everything about this statement is wrong, and does not match how the current implementation actually works. In reality, any future returned from dist_id<T>::when_here() is readied during the next user-level progress of the caller, and continuations scheduled on that future will never execute before the constructor returns.

Example program:

#include <upcxx/upcxx.hpp>
#include <iostream>
#include <unistd.h>
#include "util.hpp"

using namespace upcxx;

int main() {
  init();

  say() << "Hello";
  if (rank_me() == 0) {
    sleep(1);
    say() << "starting progress";
    for (int i=0; i < 100; i++) upcxx::progress();
    say() << "ending progress";
    sleep(1);
  }

  say() << "entering constructor";
  dist_object<int> d(rank_me());
  say() << "left constructor";

  if (rank_me() != 0) {
    rpc(0, [](dist_id<int> id, int src) {
      say() << "began RPC callback from " << src;
      future<dist_object<int>&> f = id.when_here();
      UPCXX_ASSERT_ALWAYS(!f.ready());
      return f.then([=](dist_object<int>&) { 
        say() << "when_here callback from " << src; 
      });
    }, d.id(), rank_me()).wait();
  }

  say() << "starting barrier";
  upcxx::barrier();
  say() << "ending barrier";


  upcxx::finalize();
  return 0;
}

Example output using UPC++ 2022.9.0 with 4 ranks on dirac/smp:

[3] Hello
[2] Hello
[1] Hello
[2] entering constructor
[0] Hello
[3] entering constructor
[1] entering constructor
[3] left constructor
[2] left constructor
[1] left constructor
[0] starting progress
[0] began RPC callback from 3
[0] began RPC callback from 2
[0] began RPC callback from 1
[0] ending progress
[0] entering constructor
[0] left constructor
[0] starting barrier
[0] when_here callback from 1
[0] when_here callback from 2
[0] when_here callback from 3
[2] starting barrier
[1] starting barrier
[3] starting barrier
[3] ending barrier
[1] ending barrier
[2] ending barrier
[0] ending barrier

Other permutations of output lines across ranks differ from run-to-run, but the lines "when_here callback from X" never occur during rank 0's call to the dist_object constructor, they always occur during the next user-level progress on rank 0 (the barrier).

I've confirmed this behavior by source inspection of dist_object.hpp and experimentally going back to release 2018.9.0 (where the behavior changed from something approximating the obsolete spec text to the current behavior it's had for the past four years).

We need to update these sentences in the spec to match the current behavior (which is also the preferred behavior, IMO).

Comments (1)

  1. Log in to comment