promises cannot safely be captured by value in RPC, despite being CopyAssignable

Issue #424 resolved
Dan Bonachea created an issue

As of 2020.3.0, we advertise that:

upcxx::promise now behaves as a CopyAssignable handle to a reference-counted hidden object, meaning users no longer have to worry about promise lifetime issues.

The spec says: (emphasis added)

Though UPC++ does not prescribe a specific management strategy, the semantics of futures and promises are analogous to those of standard C++11 smart pointers. As with std::shared_ptr, futures and promises may be freely copied, and both the original and the copy represent the same state and are associated with the same underlying set of dependencies and callbacks. Thus, if one copy of a future becomes ready, then so will the other copies, and if a dependency is fulfilled on one copy of a promise, it is fulfilled on the others as well.

A natural assumption would be that handle-like semantics imply that promises are safe to capture-by-value. However, the program below demonstrates this does not extend to value captures sent via RPC:

#include <upcxx/upcxx.hpp>
#include <iostream>

using namespace upcxx;

int main() {
  upcxx::init();

  int me = upcxx::rank_me();
  int peer = (me+1)%upcxx::rank_n();
  global_ptr<int> gp = new_<int>();
  dist_object<global_ptr<int>> dobj(gp);
  global_ptr<int> gp2 = dobj.fetch(peer).wait();

{ // this works
  // uses promise CopyConstructible and CopyAssignable,
  // with handle-like semantics
  promise<> p;
  {
    promise<> p_ca;
    { 
      promise<> p_cc = p;
      p_ca = p_cc;
      { promise<> tmp = p; }
    }
    p_ca.fulfill_anonymous(1);
  }

  p.get_future().wait();
}

  barrier();
  if (!me) std::cout << "step 1 worked" << std::endl;
  barrier();

{ // this works, capture promise by reference
  promise<> p;

  rput(0, gp2, remote_cx::as_rpc([=,&p]() {
    // at peer, do some processing..
    //
    // then send return message
    rpc_ff(me, [=,&p]() {
      p.fulfill_anonymous(1);
    });
  }));

  p.get_future().wait();
}

  barrier();
  if (!me) std::cout << "step 2 worked" << std::endl;
  barrier();

{ // this works, capture promise by value and use locally
  promise<> p;

  auto f = [=]() {
    return [=]() {
      promise<> p1 = p;
      p1.fulfill_anonymous(1);
    };
  };
  { auto g = f;
    { auto h = g();
      h();
    }
  }
  p.get_future().wait();
}

  barrier();
  if (!me) std::cout << "step 3 worked" << std::endl;
  barrier();

{ // this fails, capturing promise by value
  promise<> p;

  rput(0, gp2, remote_cx::as_rpc([=]() {
    // at peer, do some processing..
    //
    // then send return message
    rpc_ff(me, [=]() {
      p.fulfill_anonymous(1);
    });
  }));

  p.get_future().wait();
}

  barrier();
  if (!me) std::cout << "SUCCESS" << std::endl;

  upcxx::finalize();
  return 0;
}

When run on dirac/Linux/debug/smp using all of 2020.3.0, 2020.3.8 and 2020.10.0-rc, the program executes the first three "steps" in the test correctly, and then crashes with heap corruption during the final step:

step 1 worked
step 2 worked
step 3 worked
*** Error in `./a.out': free(): invalid pointer: 0x00000000028b7610 ***

Here is the valgrind output for the first of 8 errors reported during a single-rank smp/debug run:

step 1 worked
step 2 worked
step 3 worked
==31762== Invalid read of size 4
==31762==    at 0x40E225: upcxx::detail::future_impl_shref<upcxx::detail::future_header_ops_promise>::ready() const (impl_shref.hpp:117)
==31762==    by 0x40D9D7: std::conditional<((-1)<(0))&&(((0ul)>(1))), std::tuple<>, upcxx::detail::tuple_element_or_void<((-1)<(0))?(0) : (-1), std::tuple<>, ((0)<=(((-1)<(0))?(0) : (-1)))&&((((-1)<(0))?(0) : (-1))<std::integral_constant<unsigned long, 0ul>::value)>::type>::type upcxx::future1<upcxx::detail::future_kind_shref<upcxx::detail::future_header_ops_promise>>::wait<-1, upcxx::detail::future_wait_upcxx_progress_user>(upcxx::detail::future_wait_upcxx_progress_user&&) const (future1.hpp:259)
==31762==    by 0x4058F3: main (bad-promise2.cpp:88)
==31762==  Address 0x70db1e4 is 4 bytes inside a block of size 56 free'd
==31762==    at 0x4C2B4DD: operator delete(void*) (vg_replace_malloc.c:584)
==31762==    by 0x40C210: upcxx::detail::future_header_result<>::operator delete(void*) (core.hpp:447)
==31762==    by 0x40C367: upcxx::detail::future_header_result<>::delete_me() (core.hpp:496)
==31762==    by 0x40E046: void upcxx::detail::future_header_ops_promise::dropref<, true>(upcxx::detail::future_header*, std::integral_constant<bool, true>) (core.hpp:767)
==31762==    by 0x40D89C: upcxx::detail::future_impl_shref<upcxx::detail::future_header_ops_promise>::~future_impl_shref() (impl_shref.hpp:112)
==31762==    by 0x40D413: upcxx::detail::promise_shref<>::~promise_shref() (promise.hpp:129)
==31762==    by 0x40D42F: upcxx::promise<>::~promise() (promise.hpp:204)
==31762==    by 0x405295: main::{lambda()#3}::operator()() const::{lambda()#1}::~main() (bad-promise2.cpp:83)
==31762==    by 0x4065FD: upcxx::detail::bound_function_base<main::{lambda()#3}::operator()() const::{lambda()#1}, std::tuple<>, upcxx::detail::index_sequence<>, true>::~bound_function_base() (bind.hpp:128)
==31762==    by 0x406619: upcxx::bound_function<main::{lambda()#3}::operator()() const::{lambda()#1}>::~bound_function() (bind.hpp:207)
==31762==    by 0x408AFC: void upcxx::detail::command<upcxx::detail::lpc_base*>::the_executor<upcxx::bound_function<main::{lambda()#3}::operator()() const::{lambda()#1}>, &upcxx::backend::gasnet::rpc_as_lpc::reader_of, &(void upcxx::backend::cleanup<true, false>(upcxx::detail::lpc_base*))>(upcxx::detail::lpc_base*) (command.hpp:37)
==31762==    by 0x434B08: upcxx::detail::lpc_inbox<(upcxx::detail::intru_queue_safety)0>::burst(int)::{lambda(upcxx::detail::lpc_base*)#1}::operator()(upcxx::detail::lpc_base*) const (lpc.hpp:117)
==31762==  Block was alloc'd at
==31762==    at 0x4C2A553: operator new(unsigned long) (vg_replace_malloc.c:342)
==31762==    by 0x40F157: upcxx::detail::future_header_promise<>::operator new(unsigned long) (core.hpp:565)
==31762==    by 0x40DEF0: upcxx::detail::promise_shref<>::promise_shref(long) (promise.hpp:139)
==31762==    by 0x40D87E: upcxx::promise<>::promise(long) (promise.hpp:213)
==31762==    by 0x40585F: main (bad-promise2.cpp:77)

The problem appears to be that lambdas containing value-captured promises are being byte-copied to remote processes, and upon execution or destruction of these lambdas, the hidden reference counting logic in upcxx::promise tries to access the hidden state (that is missing on remote processes) and explodes. This notably breaks EVEN when there are no explicit references to a promise in code running on another rank. This is also more than just a "different address space" or "different persona" problem, because the test fails EVEN when there is only one process (with one thread) and the RPC is loopback - presumably because byte-copying the lambda does not increase the reference count for the embedded promise, but non-trivially destroying the lambda runs the promise destructor that decrements it.

In retrospect parts of this failure make sense, but it's not at all obvious why. A user could reasonably expect a "pointer-like" abstraction to be safely copyable to another rank as opaque data (in the same way as int*), provided it is sent back to the originating rank and never dereferenced remotely. However this program demonstrates this pointer-like idiom does not apply to the current implementation of promise. Unfortunately the pattern above is a natural one to want to write when orchestrating complicated communication patterns - our "idiom" is create a promise on the initiator, and fulfill it later in a return message, but this appears to be fragile in practice if the promise "handle" is not carefully passed using an (unnatural) added level of indirection.

I'm not sure what if anything can be done to improve the situation. We could document that promises should only be captured by reference for RPC, but this unfortunately runs contrary to our general guidance which is to never use reference captures for RPC (because they are not safe in general and only work in basically this exact case).

Clearly we don't want to support distributed reference counting of the hidden future_impl_shref objects. However perhaps the promise implementation could be made smart enough to "skip" refcounting operations when promises are copied or destroyed within a different address space. This means remote references would be ignored for lifetime purposes, but that's probably fine because if the last local reference dies then it has no way to reap a later signal anyhow. Perhaps in combination with ensuring trivially serialized callables are also trivially destroyed, it could make this pattern safe.

Comments (4)

  1. Amir Kamil

    I may be missing something, but I don’t see how this differs from anything else that is not TriviallySerializable. Promises aren’t even Serializable, so there shouldn’t be any reason to expect them to work when sent through anything that applies serialization.

  2. Dan Bonachea reporter

    @Amir Kamil wrote:

    I don’t see how this differs from anything else that is not TriviallySerializable.

    I agree the problem is that promises (and futures) are not TriviallySerializable, so capturing them by-value and sending them across RPC (even back to the same process) is dangerous. However the reason this breaks in practice is non-obvious, and I don't think our documentation currently makes it clear why this is prohibited. My concern is programmer intuition of these types as "handle-like" might easily mislead them into writing this kind of code. We should probably at least augment that "futures and promises may be freely copied" phrase in the spec to clarify they are not Serializable and thus can never be sent via RPC (without adding a level of indirection).

    I think it's probably also unclear to users that all value captures in a lambda closure passed to RPC must be TriviallySerializable. This is a (necessary) implementation detail, but might be non-obvious to users unfamiliar with C++ compiler internals. I think we've always taken this fact for granted, but unless I'm missing something, we don't appear to clearly explain this restriction anywhere in our documentation.

    I'm thinking we might also want to somehow document (perhaps in the guide?) a best-practice idiom for the problematic round-trip RPC pattern represented in the example that seeks to create a promise on an RPC initiator for later fulfillment in an explicit RPC reply (ie one that for whatever reason cannot use the RPC return acknowledgement, where operation_cx::as_promise applies). This idiom would need to include adding a mandatory level of indirection to the promise to create a Serializable reference to it (and in the absence of Serializable smart pointers, this probably means a raw pointer or reference capture).

  3. Amir Kamil

    I agree that we should talk about restrictions on value captures in lambdas, which as far as I can tell, isn’t covered anywhere in the spec. I don’t have an objection to further clarify that futures and promises are not Serializable.

  4. Log in to comment