Consider the following program:
#include <upcxx/upcxx.hpp>
#include <iostream>
#include <algorithm>
using namespace upcxx;
using namespace std;
future<> fetch_vals(global_ptr<int> gp, int *dst, int cnt) {
#if STATIC_PROMISE
static
#endif
promise<> p;
for (int i=0;i<cnt;i++) {
upcxx::rget(gp+i, dst+i, 1, operation_cx::as_promise(p));
}
return p.finalize();
}
double stack_writer(double v) {
if (v < 3) return 1;
else return stack_writer(v-2) * stack_writer(v-1) * v;
}
int main() {
upcxx::init();
int cnt = 100;
int *vals = new int[cnt]();
global_ptr<int> gp;
if (!rank_me()) {
gp = new_array<int>(cnt);
int *src = gp.local();
std::fill(src, src+cnt, 42);
}
gp = broadcast(gp, 0).wait();
future<> f = fetch_vals(gp, vals, cnt);
cout << stack_writer(10.1) << endl;
f.wait();
for (int i=0;i<cnt;i++)
assert(vals[i] == 42);
barrier();
if (!rank_me()) cout << "SUCCESS" << endl;
upcxx::finalize();
}
This proxy does not compute anything interesting, but the pattern in fetch_vals()
is an idiom we'd like to encourage for functions that launch many operations and return a future - specifically, registering all the operations on a single promise costs less overhead than creating and conjoining a bunch of futures, and it's also more convenient in a case like this where the operations are being generated in a localized block of code.
Sadly, this code does not currently work as written, due to a lifetime problem. Running this code on distributed memory generates a crash, eg a SEGV with the following stack:
[1] #22 0x0000000100788f52 in upcxx::detail::persona_tls::fulfill_during_user_of_active<>(upcxx::persona&, upcxx::promise<>&, long) (this=0x60003ac88, per=..., pro=..., anon=1) at /usr/local/upcxx-2019.9.0/upcxx.debug.gasnet_seq.udp/include/upcxx/persona.hpp:659
[1] #23 0x000000010078bc3d in upcxx::backend::fulfill_during_<upcxx::promise<>&>(std::integral_constant<upcxx::progress_level, (upcxx::progress_level)1>, upcxx::promise<>&, long, upcxx::persona&) (pro=..., anon=1, active_per=...) at /usr/local/upcxx-2019.9.0/upcxx.debug.gasnet_seq.udp/include/upcxx/backend.hpp:152
[1] #24 0x000000010078bafe in upcxx::backend::fulfill_during<(upcxx::progress_level)1>(upcxx::promise<>&, long, upcxx::persona&) (pro=..., anon=1, active_per=...) at /usr/local/upcxx-2019.9.0/upcxx.debug.gasnet_seq.udp/include/upcxx/backend.hpp:172
[1] #25 0x000000010078b6db in upcxx::detail::cx_state<upcxx::promise_cx<upcxx::operation_cx_event>, std::tuple<> >::operator()() (this=0x6000d50e8) at /usr/local/upcxx-2019.9.0/upcxx.debug.gasnet_seq.udp/include/upcxx/completion.hpp:383
[1] #26 0x000000010078acfb in upcxx::detail::completions_state_head<true, upcxx::detail::rget_byref_event_values, upcxx::promise_cx<upcxx::operation_cx_event> >::operator_case<>(std::integral_constant<bool, true>) (this=0x6000d50e8) at /usr/local/upcxx-2019.9.0/upcxx.debug.gasnet_seq.udp/include/upcxx/completion.hpp:525
[1] #27 0x000000010078ad67 in upcxx::detail::completions_state_head<true, upcxx::detail::rget_byref_event_values, upcxx::promise_cx<upcxx::operation_cx_event> >::operator()<upcxx::operation_cx_event> (this=0x6000d50e8) at /usr/local/upcxx-2019.9.0/upcxx.debug.gasnet_seq.udp/include/upcxx/completion.hpp:535
[1] #28 0x0000000100789aa8 in upcxx::detail::completions_state<upcxx::detail::event_is_here, upcxx::detail::rget_byref_event_values, upcxx::completions<upcxx::promise_cx<upcxx::operation_cx_event> > >::operator()<upcxx::operation_cx_event> (this=0x6000d50e8) at /usr/local/upcxx-2019.9.0/upcxx.debug.gasnet_seq.udp/include/upcxx/completion.hpp:605
[1] #29 0x000000010078978d in upcxx::detail::rget_cb_byref<upcxx::detail::completions_state<upcxx::detail::event_is_here, upcxx::detail::rget_byref_event_values, upcxx::completions<upcxx::promise_cx<upcxx::operation_cx_event> > >, upcxx::detail::completions_state<upcxx::detail::event_is_remote, upcxx::detail::rget_byref_event_values, upcxx::completions<upcxx::promise_cx<upcxx::operation_cx_event> > > >::execute_and_delete (this=0x6000d50d0) at /usr/local/upcxx-2019.9.0/upcxx.debug.gasnet_seq.udp/include/upcxx/rget.hpp:121
[1] #30 0x000000010078cbe8 in upcxx::backend::gasnet::handle_cb_queue::burst (this=0x100870110 <upcxx::backend::gasnet::master_hcbs>, burst_n=3) at /home/bonachea/UPC/rel/upcxx-2019.9.0/src/backend/gasnet/runtime.cpp:2151
[1] #31 0x000000010041bcfe in upcxx::<lambda(upcxx::persona&)>::operator()(upcxx::persona &) const (__closure=0xffffc930, p=...) at /home/bonachea/UPC/rel/upcxx-2019.9.0/src/backend/gasnet/runtime.cpp:1665
[1] #32 0x000000010041d635 in upcxx::detail::persona_tls::foreach_active_as_top<upcxx::progress(upcxx::progress_level)::<lambda(upcxx::persona&)> >(upcxx::<lambda(upcxx::persona&)> &&) (this=0x60003ac88, fn=...) at /home/bonachea/UPC/rel/upcxx-2019.9.0/src/persona.hpp:768
[1] #33 0x000000010041be82 in upcxx::progress (level=upcxx::progress_level::user) at /home/bonachea/UPC/rel/upcxx-2019.9.0/src/backend/gasnet/runtime.cpp:1660
[1] #34 0x000000010078feaa in upcxx::detail::future_wait_upcxx_progress_user::operator() (this=0xffffcbf7) at /usr/local/upcxx-2019.9.0/upcxx.debug.gasnet_seq.udp/include/upcxx/future/future1.hpp:60
[1] #35 0x000000010078e85f in upcxx::future1<upcxx::detail::future_kind_shref<upcxx::detail::future_header_ops_general>>::wait<-1, upcxx::detail::future_wait_upcxx_progress_user>(upcxx::detail::future_wait_upcxx_progress_user&&) (this=0xffffcbc8, progress=...) at /usr/local/upcxx-2019.9.0/upcxx.debug.gasnet_seq.udp/include/upcxx/future/future1.hpp:218
[1] #36 0x00000001004013e7 in main () at promise-nuke2.cc:39
The problem is that when the promise<> p
used to register the operations is a local variable (often the most convenient and elegant choice) it goes out of scope before fulfillment leading to a later crash when the RMA completes and the runtime tries to fulfill.
The simplest workaround is to extend the promise lifetime by promoting it to a static variable (e.g. -DSTATIC_PROMISE
in the above code), however this is inelegant and doesn't handle multiple calls. The more localized solution that doesn't suffer the latter deficiency is to inject explicitly managed dynamic allocation to provide the required lifetime, something like this:
future<> fetch_vals(global_ptr<int> gp, int *dst, int cnt) {
promise<> *p = new promise<>;
for (int i=0;i<cnt;i++) {
upcxx::rget(gp+i, dst+i, 1, operation_cx::as_promise(*p));
}
return p->finalize().then([p](){ delete p; });
}
However this is non-intuitive (for novices), non-RAII and inserts a dynamic allocation overhead into the critical path. This requirement to explicitly manage the promise lifetime imposes additional complexity into highly asynchronous user code.
As an aside, I don't think this lifetime requirement for operation_cx::as_promise()
is adequately addressed by the current spec - it's only by assumption and implication that readers can infer a potential problem here (probably motivates a spec issue to fix that). It's also a departure from the behavior of future
, which always behave as a handle to a reference-counted object and have no associated lifetime concerns. Finally, it's worth noting this lifetime problem only applies to the interval until promise fulfillment - once fulfilled, the promise can safely be destroyed and readied futures linked to it will continue to work, EVEN for non-empty promises. This means there is already a level of indirection (at least for non-empty promises) to an internal object containing the result value - which makes it particularly unfortunate that the user is forced to insert a redundant and potentially costly level of indirection.
The point of this issue is that I don't see any fundamental reason why promise
needs to continue behaving as it currently does. If promise
instead always behaved as a smart-pointer to a reference-counted (hidden) object, even before fulfillment, then the lifetime could be automatically managed and collection would also happen naturally after fulfillment and the destruction of the last reference (which might be held by the runtime itself). I'd like us to discuss/consider the costs and benefits of deploying such a feature upgrade. I think it would be a nice productivity improvement for highly asynchronous codes, enabling a more natural idiom for use of promises. Also, as a semantic relaxation it should be backwards-compatible with existing code.
@Amir Kamil + @john bachan : I thought up a simpler and more compelling example to motivate
promise_ref
.This is just a minor tweak to the
fetch_vals
function in the original post that needs to perform some post-processing on fetched values before signalling completion to the caller:This code is erroneous because the reference capture of the promise becomes a dangling reference when
fetch_vals
returns, and the execution of the callback within laterprogress
explodes (even on the pull request #141 branch). This is essentially the same problem as the original code (except here fulfillment is invoked in user code), and the fixes are equally nasty - in general, requiring the user to add a dynamically allocated level of indirection to his promise (or ditch the promise entirely and pay the overheads to conjoin a bunch of additional futures).However if the user had access to
promise_ref
functionality, he could instead write something simple like this:Using the existing internal level of indirection to avoid extraneous allocation.
As a side note, specifying
promise_ref
also makes it somewhat easier to specify the new behavior ofcompletion_cx::as_promise(promise <T...> &pro)
- we can specify this method schedules a completion forpro.get_ref()
, or (better yet?) change the argument type topromise_ref
and specify an implicit conversion. The latter would allow the lambda above to schedule additional RMA completions on the same underlying promise, which also seems a useful capability.Thoughts?