Round-trip RPCs lacking an operation_cx should generate an error

Issue #523 resolved
Colin MacLean created an issue

Doing this doesn’t make logical sense and causes RPC cancellation to crash due to cancel_and_delete expecting to start with a valid future:

diff --git a/test/regression/spec-issue176.cpp b/test/regression/spec-issue176.cpp
index b7f990d..8ab5f39 100644
--- a/test/regression/spec-issue176.cpp
+++ b/test/regression/spec-issue176.cpp
@@ -144,6 +144,9 @@ int main() {
     CHECK("rpc(view(char[toobig]))", 
           auto f = rpc(peer, [](view<char> const &v) {}, big_view););

+    CHECK("rpc(source_cx::as_future(),view(char[toobig]))",
+          auto f = rpc(peer, source_cx::as_future(), [](view<char> const &v) { return 0; }, big_view););
+
     CHECK("rpc(view(unbounded))", 
           auto f = rpc(peer, [](view<byte_bag> const &v) {}, unbounded_view););

Checking rpc(source_cx::as_future(),view(char[toobig]))...
*** Caught a fatal signal (proc 0): SIGSEGV(11)
[0] Invoking GDB for backtrace...
[0] /usr/bin/gdb -nx -batch -x /tmp/gasnet_NMJlUK '/home/colin/upcxx2/build/./a.out' 10807
[0] [Thread debugging using libthread_db enabled]
[0] Using host libthread_db library "/lib64/libthread_db.so.1".
[0] 0x00007f1c68acfaf6 in wait4 () from /lib64/libc.so.6
[0] #0  0x00007f1c68acfaf6 in wait4 () from /lib64/libc.so.6
[0] #1  0x00007f1c68a444a3 in ?? () from /lib64/libc.so.6
[0] #2  0x000055ad17cdaae8 in gasneti_system_redirected (cmd=0x55ad180a5700 <cmd> "/usr/bin/gdb -nx -batch -x /tmp/gasnet_NMJlUK '/home/colin/upcxx2/build/./a.out' 10807", stdout_fd=3) at /home/colin/upcxx2/build/bld/GASNet-stable/gasnet_tools.c:1295
[0] #3  0x000055ad17cdb26a in gasneti_bt_gdb (fd=3) at /home/colin/upcxx2/build/bld/GASNet-stable/gasnet_tools.c:1560
[0] #4  0x000055ad17cdbc3e in gasneti_print_backtrace (fd=2) at /home/colin/upcxx2/build/bld/GASNet-stable/gasnet_tools.c:1845
[0] #5  0x000055ad17cdc339 in _gasneti_print_backtrace_ifenabled (fd=2) at /home/colin/upcxx2/build/bld/GASNet-stable/gasnet_tools.c:1978
[0] #6  0x000055ad17ed75b6 in gasneti_defaultSignalHandler (sig=11) at /home/colin/upcxx2/build/bld/GASNet-stable/gasnet_internal.c:1035
[0] #7  <signal handler called>
[0] #8  std::__atomic_base<upcxx::detail::lpc_base*>::load (__m=std::memory_order_relaxed, this=0x8) at /usr/lib/gcc/x86_64-pc-linux-gnu/11.2.0/include/g++-v11/bits/atomic_base.h:838
[0] #9  std::atomic<upcxx::detail::lpc_base*>::load (this=0x8, __m=std::memory_order_relaxed) at /usr/lib/gcc/x86_64-pc-linux-gnu/11.2.0/include/g++-v11/atomic:570
[0] #10 0x000055ad17be891a in upcxx::detail::lpc_dormant<int>::cancel_and_delete (list=0x0) at /home/colin/upcxx2/build/bld/upcxx.assert1.optlev0.dbgsym1.gasnet_seq.smp/include/upcxx/lpc_dormant.hpp:269
[0] #11 0x000055ad17bca05a in operator() (__closure=0x7ffe84bd9120) at /home/colin/upcxx2/build/bld/upcxx.assert1.optlev0.dbgsym1.gasnet_seq.smp/include/upcxx/rpc.hpp:455
[0] #12 0x000055ad17bcb8aa in upcxx::detail::raii_cleanup<upcxx::detail::rpc_internal<upcxx::detail::completions<upcxx::detail::future_cx<upcxx::source_cx_event, true, (upcxx::progress_level)1> >, main()::<lambda(const upcxx::view<char, char*>&)>&&, upcxx::view<char, char*>&>(upcxx::intrank_t, main()::<lambda(const upcxx::view<char, char*>&)>&&, upcxx::view<char, char*>&, upcxx::detail::completions<upcxx::detail::future_cx<upcxx::source_cx_event, true, (upcxx::progress_level)1> >&&, int)::<lambda()> >::~raii_cleanup(void) (this=0x7ffe84bd9120, __in_chrg=<optimized out>) at /home/colin/upcxx2/build/bld/upcxx.assert1.optlev0.dbgsym1.gasnet_seq.smp/include/upcxx/utility.hpp:651
[0] #13 0x000055ad17bca25a in upcxx::detail::rpc_internal<upcxx::detail::completions<upcxx::detail::future_cx<upcxx::source_cx_event, true, (upcxx::progress_level)1> >, main()::<lambda(const upcxx::view<char, char*>&)>&&, upcxx::view<char, char*>&>(upcxx::intrank_t, struct {...} &&, upcxx::detail::completions<upcxx::detail::future_cx<upcxx::source_cx_event, true, (upcxx::progress_level)1> > &&, int) (recipient=0, fn=..., cxs=...) at /home/colin/upcxx2/build/bld/upcxx.assert1.optlev0.dbgsym1.gasnet_seq.smp/include/upcxx/rpc.hpp:487
[0] #14 0x000055ad17bc8b27 in upcxx::rpc<upcxx::detail::completions<upcxx::detail::future_cx<upcxx::source_cx_event, true, (upcxx::progress_level)1> >, main()::<lambda(const upcxx::view<char, char*>&)>, upcxx::view<char, char*>&>(upcxx::intrank_t, upcxx::detail::completions<upcxx::detail::future_cx<upcxx::source_cx_event, true, (upcxx::progress_level)1> > &&, struct {...} &&) (recipient=0, cxs=..., fn=...) at /home/colin/upcxx2/build/bld/upcxx.assert1.optlev0.dbgsym1.gasnet_seq.smp/include/upcxx/rpc.hpp:549
[0] #15 0x000055ad17bc511e in main () at ../test/regression/spec-issue176.cpp:147
[0] [Inferior 1 (process 10807) detached]
[1]    10807 segmentation fault  GASNET_BACKTRACE=1 ./a.out

Attempting to make this RPC call should be forbidden.

Comments (2)

  1. Dan Bonachea

    The key issue here is issuing a round-trip rpc that explicitly excludes a request for operation completion, eg:

    auto sf = upcxx::rpc(target, source_cx::as_buffered(), ...);
    

    where there is NO completion triggered for the acknowledgment.

    I agree this input should be forbidden, for the following practical reasons:

    1. It means the application has requested an RPC acknowledgement (which has a performance cost), but discarded the only way it could ever learn of its arrival or contents.
    2. As a result, the initiating thread can never be safely quiesced; the program has no way to stall for delivery of the (pointless) acknowledgement message,which also means the program can never safely finalize()

    Basically any program that thinks this call is a good idea should be using rpc_ff instead.

    It turns out this is already specified in 7.2.1-2:

    If a UPC++ call provides both operation and remote completion, then at least one must be requested by the provided completion object. If a call provides operation but not remote completion, then operation completion must be requested. The behavior of the program is undefined if neither operation nor remote completion is requested from a call that supports one or both of operation or remote completion.

    Our implementation already forbids (via runtime error) any rget calls lacking operation completion and rput calls lacking one of operation or remote completion, complying with the spec above and with similar motivation. The real defect in this issue is we currently lack enforcement of that rule for round-trip rpc (and possibly other operations). This should probably follow the pattern of runtime enforcement, to avoid compile errors on unreachable calls that might get expanded during generic programming.

    The fact that issuing this operation specified as UB happens to also crash the runtime for the rare care of a resource exhaustion failure is a minor detail which is irrelevant because we shouldn't allow such a call in the first place.

  2. Log in to comment