dist_object, garbage collection, and quiescence management

We know we want to give our users access to a fancy auto-quiescing algorithm. My apprehension has always stemmed from not knowing what flavor of quiescence is general enough to cover at least all the use cases I have personally desired. This proposal satisfies my needs.

I propose a variant of dist_object called dist_gcref<T> which functions in a very similar way. Like dist_object: It is constructed collectively, it's magically translated to local instances over rpc, and therpc stalls for remote construction if arrived early. Semantically it differs in that its a a refcounted pointer to T, just like std::shared_ptr<T>. And, magically, the T is only destructed after all ranks in the team have dropped all references and there are no rpc's in flight referencing the object.

I see this as extremely useful for creating temporarily shared data structures that are assigned to once, read by many remotely, and just magically destruct when all reads are done. Consider a ghost zone exchange:

// 1D stencil grid
double mesh[102]; // 100 interior cells, 1 left, 1 right ghost cell

void advance() {
  // publish my left and right ghost cells for peers
  auto left = upcxx::make_dist_gcref<double>(mesh[1]);
  auto rght = upcxx::make_dist_gcref<double>(mesh[100]);

  // collect peer ghost values
  mesh[0] = rght.fetch(rank_me()-1).wait();
  mesh[101] = left.fetch(rank_me()+1).wait();

  for(int i=1; i <= 100; i++)
    mesh[i] = mesh[i-1] + mesh[i+1];

  // no barrier, no counter!!!
}

When a user wants to quiesce a region of code:

// I'm using dist_gcref<std::tuple<>> as the "no-data" object whose only
// meaningful quality is its lifetime. We could consider permitting dist_gcref<void>
// or dist_gcref<>. 
void foo1() {
  auto region = upcxx::make_dist_gcref<std::tuple<>>({});

  // dist_gcref<T>::when_dead() returns future indicating object is globally dead.
  future<> done = region.when_dead();

  // do some communication, all rpc's must carry `region` reference to keep it live. 
  upcxx::rpc_ff(rank_me()+1, [](dist_gcref<tuple<>> region, ...) { ... }, region, ...);

  // drop our reference as a vote to close region
  region = nullptr;

  // spin on progress until region quiesced
  done.wait();
}

// same thing, different style
void foo2() {
  future<> done; {
    auto region = upcxx::make_dist_gcref<std::tuple<>>({});
    done = region.when_dead();

    // do some communication, all rpc's must carry `region` reference to keep it live. 
    upcxx::rpc_ff(rank_me()+1, [](dist_gcref<tuple<>> region, ...) { ... }, region, ...);

    // `region` goes out of local scope here...
  }
  // spin on progress until region quiesced
  done.wait();
}

I think its important that we adopt reference semantics as opposed to the RIAA of dist_object. Conceivably, rpcs could "awaken" objects that a rank previously killed its local RIAA instance of, for instance by inserting a task into a progress queue. It would be awkward to fabricate new RIAA instances as clones of long dead ones.

I really like this design because it permits:

The ghost zone example which is aggressively synchronization-free beyond any other quiescing pattern we've investigated.
Quiescence of named regions, which may overlap.
A single rpc can participate in multiple regions by carrying multiple references.
Quiescence over teams since we would support dist_gcref over teams, just like dist_object.

A drawback is that it does not permit software module A to quiesce module B when B wasn't written with an understanding of this idiom. I do not think this is dire. I equate this to good MPI programmers knowing to take a communicator input and not assume MPI_COMM_WORLD.

Comments (2)