Lack of backpressure in RPC injection leads to shared memory-exhaustion crashes

Issue #242 resolved
Dan Bonachea created an issue

Example program:

#include <upcxx/upcxx.hpp>
#include <iostream>
#include <vector>
#include <cassert>
#include <cstdio>

using namespace std;

long incoming;

int main(int argc, char **argv) {
  long iters = 0;
  long sz = 0;
  if (argc > 1) iters = atol(argv[1]);
  if (argc > 2) sz = atol(argv[2]);
  if (iters <= 0) iters = 10000;
  if (sz <= 0) sz = 10*1024*1024;
  incoming = iters;

  upcxx::init();
  int me = upcxx::rank_me();
  int peer = (upcxx::rank_me() + 1) % upcxx::rank_n();
  if (!me) cout << upcxx::rank_n() << " ranks running " << iters << " iterations of " << sz << " bytes" << endl;

  upcxx::barrier();

  std::vector<char> myvec;
  myvec.resize(sz);
  printf("Hello from rank %i\n",me);
  cout << flush;

  upcxx::barrier();

  while (iters--) {
    upcxx::rpc_ff(peer, [=](upcxx::view<char> view) {
       assert(view.size() == sz);
       incoming--;
    }, 
    upcxx::make_view(myvec)
  );
  }

  do { upcxx::progress(); } while(incoming);


  upcxx::barrier();

  if (!me) cout << "SUCCESS" << endl;

  upcxx::finalize();
  return 0;
}

This program sends a stream of one-way RPCs with view-based serialization to/from each process. Note the program does not explicitly allocate any objects on the shared heap. A user might therefore reasonably expect this program to have no shared heap requirements.

In practice, the UPC++ runtime is using the shared heap to allocate bounce buffers for rendezvous transfers. However there are several problems with the current implementation:

  1. The runtime does not exert backpressure on its shared heap utilization, leading to unbounded resource consumption and eventual crash given the right communication pattern (demonstrations below)
  2. When the memory exhaustion occurs, the result is an opaque assertion failure (or in some cases simply a SEGV), providing the user with no insight on the problem or how to solve it.
  3. We don't document the runtime's implicit use of the shared heap, so the user has no expectation that some should be reserved for the runtime.
  4. The user has no way to predict or control how much of the shared heap is used for bounce-buffering, and thus cannot plan his own allocations and job parameters to accomodate within the fixed memory resources of the node.

This problem is causing crashes in our extend-add proxy application and is believed responsible for crashes observed on across multiple platforms in CI on the issue138 test program (ex1 ex2 ex3)

Building the test program above on dirac using the 2019.3.2 release:

$ module load upcxx/2019.3.2/gcc-9.1.0 
$ upcxx -g rpc_flood.cpp
$ env UPCXX_GASNET_CONDUIT=ibv upcxx -g rpc_flood.cpp
$ upcxx-run -np 2 -shared-heap=1GB a.out 10
2 ranks running 10 iterations of 10485760 bytes
Hello from rank 0
Hello from rank 1
SUCCESS
$ env UPCXX_GASNET_CONDUIT=ibv upcxx -g rpc_flood.cpp     
$ upcxx-run -np 2 -shared-heap=128MB a.out 10
2 ranks running 10 iterations of 10485760 bytes
Hello from rank 0
Hello from rank 1
*** Caught a fatal signal (proc 1): SIGSEGV(11)
[1] Invoking GDB for backtrace...
*** FATAL ERROR (proc 0): 
//////////////////////////////////////////////////
UPC++ assertion failure:
 rank=0
 file=/tmp/upcxx-nightly-dirac-gcc/bld/upcxx_install/upcxx-2019.3.2/src/backend/gasnet/runtime.cpp:1355

Failed condition: buf != nullptr

To have UPC++ freeze during these errors so you can attach a debugger, rerun the program with GASNET_FREEZE_ON_ERROR=1 in the environment.
//////////////////////////////////////////////////

[0] Invoking GDB for backtrace...
[1] /usr/local/pkg/gdb/newest/bin/gdb -nx -batch -x /tmp/gasnet_gZzFsJ '/home/pcp1/bonachea/UPC/code/a.out' 1547
[1] [Thread debugging using libthread_db enabled]
[1] Using host libthread_db library "/lib64/libthread_db.so.1".
[1] 0x00007f2171b61a3c in waitpid () from /lib64/libc.so.6
[1] To enable execution of this file add
[1]     add-auto-load-safe-path /usr/local/pkg/gcc/9.1.0/lib64/libstdc++.so.6.0.26-gdb.py
[1] line to your configuration file "/home/pcp1/bonachea/.gdbinit".
[1] To completely disable this security protection add
[1]     set auto-load safe-path /
[1] line to your configuration file "/home/pcp1/bonachea/.gdbinit".
[1] For more information about this security protection see the
[1] "Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
[1]     info "(gdb)Auto-loading safe path"
[1] #0  0x00007f2171b61a3c in waitpid () from /lib64/libc.so.6
[1] #1  0x00007f2171adfde2 in do_system () from /lib64/libc.so.6
[1] #2  0x00000000004febd3 in gasneti_system_redirected (cmd=0xaeb3a0 <cmd> "/usr/local/pkg/gdb/newest/bin/gdb -nx -batch -x /tmp/gasnet_gZzFsJ '/home/pcp1/bonachea/UPC/code/a.out' 1547", stdout_fd=6) at /tmp/upcxx-nightly-dirac-gcc/bld/upcxx_install/upcxx-2019.3.2/.nobs/art/d1310ccf9120fa6830b666156636faf8440b8cc4/GASNet-2019.3.2/gasnet_tools.c:1271
[1] #3  0x00000000004ff5d4 in gasneti_bt_gdb (fd=6) at /tmp/upcxx-nightly-dirac-gcc/bld/upcxx_install/upcxx-2019.3.2/.nobs/art/d1310ccf9120fa6830b666156636faf8440b8cc4/GASNet-2019.3.2/gasnet_tools.c:1518
[1] #4  0x00000000004ffe14 in gasneti_print_backtrace (fd=2) at /tmp/upcxx-nightly-dirac-gcc/bld/upcxx_install/upcxx-2019.3.2/.nobs/art/d1310ccf9120fa6830b666156636faf8440b8cc4/GASNet-2019.3.2/gasnet_tools.c:1793
[1] #5  0x00000000005003f6 in _gasneti_print_backtrace_ifenabled (fd=2) at /tmp/upcxx-nightly-dirac-gcc/bld/upcxx_install/upcxx-2019.3.2/.nobs/art/d1310ccf9120fa6830b666156636faf8440b8cc4/GASNet-2019.3.2/gasnet_tools.c:1925
[1] #6  0x00000000006ebb49 in gasneti_defaultSignalHandler (sig=11) at /tmp/upcxx-nightly-dirac-gcc/bld/upcxx_install/upcxx-2019.3.2/.nobs/art/d1310ccf9120fa6830b666156636faf8440b8cc4/GASNet-2019.3.2/gasnet_internal.c:703
[1] #7  <signal handler called>
[1] #8  0x00000000004083dc in upcxx::parcel_writer::put_trivial_aligned<upcxx::global_fnptr<void (upcxx::detail::lpc_base*)> >(upcxx::global_fnptr<void (upcxx::detail::lpc_base*)> const&) (this=0x7fffda749330, x=...) at /usr/local/pkg/upcxx-dirac/gcc-9.1.0/stable-2019.3.2/upcxx.debug.gasnet_seq.ibv/include/upcxx/parcel.hpp:365
[1] #9  0x00000000004069b2 in upcxx::command<upcxx::detail::lpc_base*>::pack<upcxx::backend::gasnet::rpc_as_lpc::reader_of, upcxx::backend::gasnet::rpc_as_lpc::cleanup<false>, upcxx::bound_function<main(int, char**)::<lambda(upcxx::view<char>)>, upcxx::view<char, __gnu_cxx::__normal_iterator<char const*, std::vector<char, std::allocator<char> > > > >&>(upcxx::parcel_writer &, std::size_t, upcxx::bound_function<main(int, char**)::<lambda(upcxx::view<char, char*>)>, upcxx::view<char, __gnu_cxx::__normal_iterator<char const*, std::vector<char, std::allocator<char> > > > > &) (w=..., size_ub=10485784, fn=...) at /usr/local/pkg/upcxx-dirac/gcc-9.1.0/stable-2019.3.2/upcxx.debug.gasnet_seq.ibv/include/upcxx/command.hpp:69
[1] #10 0x00000000004067d9 in upcxx::backend::send_am_master<(upcxx::progress_level)1, upcxx::bound_function<main(int, char**)::<lambda(upcxx::view<char>)>, upcxx::view<char, __gnu_cxx::__normal_iterator<char const*, std::vector<char, std::allocator<char> > > > > >(upcxx::team &, upcxx::intrank_t, upcxx::bound_function<main(int, char**)::<lambda(upcxx::view<char, char*>)>, upcxx::view<char, __gnu_cxx::__normal_iterator<char const*, std::vector<char, std::allocator<char> > > > > &&) (tm=..., recipient=0, fn=...) at /usr/local/pkg/upcxx-dirac/gcc-9.1.0/stable-2019.3.2/upcxx.debug.gasnet_seq.ibv/include/upcxx/backend/gasnet/runtime.hpp:273
[1] #11 0x000000000040663e in upcxx::rpc_ff<main(int, char**)::<lambda(upcxx::view<char>)>, upcxx::view<char, __gnu_cxx::__normal_iterator<char const*, std::vector<char, std::allocator<char> > > > >(upcxx::team &, upcxx::intrank_t, <lambda(upcxx::view<char, char*>)> &&, upcxx::view<char, __gnu_cxx::__normal_iterator<char const*, std::vector<char, std::allocator<char> > > > &&) (tm=..., recipient=0, fn=..., args#0=...) at /usr/local/pkg/upcxx-dirac/gcc-9.1.0/stable-2019.3.2/upcxx.debug.gasnet_seq.ibv/include/upcxx/rpc.hpp:75
[1] #12 0x00000000004065c9 in upcxx::rpc_ff<main(int, char**)::<lambda(upcxx::view<char>)>, upcxx::view<char, __gnu_cxx::__normal_iterator<char const*, std::vector<char, std::allocator<char> > > > >(upcxx::intrank_t, <lambda(upcxx::view<char, char*>)> &&, upcxx::view<char, __gnu_cxx::__normal_iterator<char const*, std::vector<char, std::allocator<char> > > > &&) (recipient=0, fn=..., args#0=...) at /usr/local/pkg/upcxx-dirac/gcc-9.1.0/stable-2019.3.2/upcxx.debug.gasnet_seq.ibv/include/upcxx/rpc.hpp:86
[1] #13 0x00000000004064f9 in main (argc=2, argv=0x7fffda749798) at rpc_flood.cpp:35
[0] /usr/local/pkg/gdb/newest/bin/gdb -nx -batch -x /tmp/gasnet_310zYr '/home/pcp1/bonachea/UPC/code/a.out' 15611
[0] [Thread debugging using libthread_db enabled]
[0] Using host libthread_db library "/lib64/libthread_db.so.1".
[0] 0x00007fb724619a3c in waitpid () from /lib64/libc.so.6
[0] To enable execution of this file add
[0]     add-auto-load-safe-path /usr/local/pkg/gcc/9.1.0/lib64/libstdc++.so.6.0.26-gdb.py
[0] line to your configuration file "/home/pcp1/bonachea/.gdbinit".
[0] To completely disable this security protection add
[0]     set auto-load safe-path /
[0] line to your configuration file "/home/pcp1/bonachea/.gdbinit".
[0] For more information about this security protection see the
[0] "Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
[0]     info "(gdb)Auto-loading safe path"
[0] #0  0x00007fb724619a3c in waitpid () from /lib64/libc.so.6
[0] #1  0x00007fb724597de2 in do_system () from /lib64/libc.so.6
[0] #2  0x00000000004febd3 in gasneti_system_redirected (cmd=0xaeb3a0 <cmd> "/usr/local/pkg/gdb/newest/bin/gdb -nx -batch -x /tmp/gasnet_310zYr '/home/pcp1/bonachea/UPC/code/a.out' 15611", stdout_fd=6) at /tmp/upcxx-nightly-dirac-gcc/bld/upcxx_install/upcxx-2019.3.2/.nobs/art/d1310ccf9120fa6830b666156636faf8440b8cc4/GASNet-2019.3.2/gasnet_tools.c:1271
[0] #3  0x00000000004ff5d4 in gasneti_bt_gdb (fd=6) at /tmp/upcxx-nightly-dirac-gcc/bld/upcxx_install/upcxx-2019.3.2/.nobs/art/d1310ccf9120fa6830b666156636faf8440b8cc4/GASNet-2019.3.2/gasnet_tools.c:1518
[0] #4  0x00000000004ffe14 in gasneti_print_backtrace (fd=2) at /tmp/upcxx-nightly-dirac-gcc/bld/upcxx_install/upcxx-2019.3.2/.nobs/art/d1310ccf9120fa6830b666156636faf8440b8cc4/GASNet-2019.3.2/gasnet_tools.c:1793
[0] #5  0x00000000005003f6 in _gasneti_print_backtrace_ifenabled (fd=2) at /tmp/upcxx-nightly-dirac-gcc/bld/upcxx_install/upcxx-2019.3.2/.nobs/art/d1310ccf9120fa6830b666156636faf8440b8cc4/GASNet-2019.3.2/gasnet_tools.c:1925
[0] #6  0x00000000004fdca2 in gasneti_error_abort () at /tmp/upcxx-nightly-dirac-gcc/bld/upcxx_install/upcxx-2019.3.2/.nobs/art/d1310ccf9120fa6830b666156636faf8440b8cc4/GASNet-2019.3.2/gasnet_tools.c:739
[0] #7  0x00000000004fde29 in gasneti_fatalerror (*** Caught a fatal signal (proc 0): SIGABRT(6)

Note it succeeds for 10 iterations when the shared heap is an overly large 1GB, but fails with the default shared heap size of 128MB. Because the usage is unbounded, raising the iteration count on this test will eventually exhaust any size of shared heap.

The failure behavior is manifestly similar with the current head of develop:

$ module switch upcxx upcxx/nightly/gcc-9.1.0        
$ env UPCXX_GASNET_CONDUIT=ibv upcxx -g rpc_flood.cpp                                
$ upcxx-run -np 2 -shared-heap=1GB a.out 10
2 ranks running 10 iterations of 10485760 bytes
Hello from rank 1
Hello from rank 0
SUCCESS
$ upcxx-run -np 2 -shared-heap=128MB a.out 10   
2 ranks running 10 iterations of 10485760 bytes
Hello from rank 0
Hello from rank 1
*** Caught a fatal signal (proc 1): SIGSEGV(11)
[1] Invoking GDB for backtrace...
*** FATAL ERROR (proc 0): 
//////////////////////////////////////////////////
UPC++ assertion failure:
 rank=0
 file=/tmp/upcxx-nightly-dirac-gcc/bld/upcxx_install/berkeleylab-upcxx-develop/src/backend/gasnet/runtime.cpp:1359

Failed condition: buf != nullptr

To have UPC++ freeze during these errors so you can attach a debugger, rerun the program with GASNET_FREEZE_ON_ERROR=1 in the environment.
//////////////////////////////////////////////////

[0] Invoking GDB for backtrace...
[1] /usr/local/pkg/gdb/newest/bin/gdb -nx -batch -x /tmp/gasnet_OUHBmk '/home/pcp1/bonachea/UPC/code/a.out' 1879
[1] [Thread debugging using libthread_db enabled]
[1] Using host libthread_db library "/lib64/libthread_db.so.1".
[1] 0x00007f3582fe4a3c in waitpid () from /lib64/libc.so.6
[1] To enable execution of this file add
[1]     add-auto-load-safe-path /usr/local/pkg/gcc/9.1.0/lib64/libstdc++.so.6.0.26-gdb.py
[1] line to your configuration file "/home/pcp1/bonachea/.gdbinit".
[1] To completely disable this security protection add
[1]     set auto-load safe-path /
[1] line to your configuration file "/home/pcp1/bonachea/.gdbinit".
[1] For more information about this security protection see the
[1] "Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
[1]     info "(gdb)Auto-loading safe path"
[1] #0  0x00007f3582fe4a3c in waitpid () from /lib64/libc.so.6
[1] #1  0x00007f3582f62de2 in do_system () from /lib64/libc.so.6
[1] #2  0x00000000004fb4a3 in gasneti_system_redirected (cmd=0xaea3e0 <cmd> "/usr/local/pkg/gdb/newest/bin/gdb -nx -batch -x /tmp/gasnet_OUHBmk '/home/pcp1/bonachea/UPC/code/a.out' 1879", stdout_fd=6) at /tmp/upcxx-nightly-dirac-gcc/bld/upcxx_install/berkeleylab-upcxx-develop/.nobs/art/ea6a834a22a2e2c80e2e074571f59e5f866621b5/GASNet-stable/gasnet_tools.c:1271
[1] #3  0x00000000004fbea4 in gasneti_bt_gdb (fd=6) at /tmp/upcxx-nightly-dirac-gcc/bld/upcxx_install/berkeleylab-upcxx-develop/.nobs/art/ea6a834a22a2e2c80e2e074571f59e5f866621b5/GASNet-stable/gasnet_tools.c:1518
[1] #4  0x00000000004fc6e4 in gasneti_print_backtrace (fd=2) at /tmp/upcxx-nightly-dirac-gcc/bld/upcxx_install/berkeleylab-upcxx-develop/.nobs/art/ea6a834a22a2e2c80e2e074571f59e5f866621b5/GASNet-stable/gasnet_tools.c:1793
[1] #5  0x00000000004fccc6 in _gasneti_print_backtrace_ifenabled (fd=2) at /tmp/upcxx-nightly-dirac-gcc/bld/upcxx_install/berkeleylab-upcxx-develop/.nobs/art/ea6a834a22a2e2c80e2e074571f59e5f866621b5/GASNet-stable/gasnet_tools.c:1925
[1] #6  0x00000000006e8685 in gasneti_defaultSignalHandler (sig=11) at /tmp/upcxx-nightly-dirac-gcc/bld/upcxx_install/berkeleylab-upcxx-develop/.nobs/art/ea6a834a22a2e2c80e2e074571f59e5f866621b5/GASNet-stable/gasnet_internal.c:704
[1] #7  <signal handler called>
[1] #8  0x00007f3583075cf4 in __memcpy_ssse3_back () from /lib64/libc.so.6
[1] #9  0x0000000000408ab1 in upcxx::detail::memcpy_aligned<8ul> (dst=0x0, src=0x7ffe819ef808, sz=8) at /usr/local/pkg/upcxx-dirac/gcc-9.1.0/nightly-2019.07.25/upcxx.debug.gasnet_seq.ibv/include/upcxx/utility.hpp:68
[1] #10 0x000000000040878f in upcxx::detail::serialization_writer<true>::push_trivial<upcxx::global_fnptr<void (upcxx::detail::lpc_base*)> >(upcxx::global_fnptr<void (upcxx::detail::lpc_base*)> const&) (this=0x7ffe819ef870, x=...) at /usr/local/pkg/upcxx-dirac/gcc-9.1.0/nightly-2019.07.25/upcxx.debug.gasnet_seq.ibv/include/upcxx/serialization.hpp:314
[1] #11 0x0000000000406a2f in upcxx::detail::command<upcxx::detail::lpc_base*>::serialize<upcxx::backend::gasnet::rpc_as_lpc::reader_of, upcxx::backend::gasnet::rpc_as_lpc::cleanup<false>, upcxx::bound_function<main(int, char**)::<lambda(upcxx::view<char>)>, upcxx::view<char, __gnu_cxx::__normal_iterator<char const*, std::vector<char, std::allocator<char> > > > >&, upcxx::detail::serialization_writer<true> >(upcxx::detail::serialization_writer<true> &, std::size_t, upcxx::bound_function<main(int, char**)::<lambda(upcxx::view<char, char*>)>, upcxx::view<char, __gnu_cxx::__normal_iterator<char const*, std::vector<char, std::allocator<char> > > > > &) (w=..., size_ub=10485784, fn=...) at /usr/local/pkg/upcxx-dirac/gcc-9.1.0/nightly-2019.07.25/upcxx.debug.gasnet_seq.ibv/include/upcxx/command.hpp:72
[1] #12 0x000000000040683e in upcxx::backend::send_am_master<(upcxx::progress_level)1, upcxx::bound_function<main(int, char**)::<lambda(upcxx::view<char>)>, upcxx::view<char, __gnu_cxx::__normal_iterator<char const*, std::vector<char, std::allocator<char> > > > > >(upcxx::team &, upcxx::intrank_t, upcxx::bound_function<main(int, char**)::<lambda(upcxx::view<char, char*>)>, upcxx::view<char, __gnu_cxx::__normal_iterator<char const*, std::vector<char, std::allocator<char> > > > > &&) (tm=..., recipient=0, fn=...) at /usr/local/pkg/upcxx-dirac/gcc-9.1.0/nightly-2019.07.25/upcxx.debug.gasnet_seq.ibv/include/upcxx/backend/gasnet/runtime.hpp:311
[1] #13 0x00000000004066fe in upcxx::rpc_ff<main(int, char**)::<lambda(upcxx::view<char>)>, upcxx::view<char, __gnu_cxx::__normal_iterator<char const*, std::vector<char, std::allocator<char> > > > >(upcxx::team &, upcxx::intrank_t, <lambda(upcxx::view<char, char*>)> &&, upcxx::view<char, __gnu_cxx::__normal_iterator<char const*, std::vector<char, std::allocator<char> > > > &&) (tm=..., recipient=0, fn=..., args#0=...) at /usr/local/pkg/upcxx-dirac/gcc-9.1.0/nightly-2019.07.25/upcxx.debug.gasnet_seq.ibv/include/upcxx/rpc.hpp:75
[1] #14 0x0000000000406689 in upcxx::rpc_ff<main(int, char**)::<lambda(upcxx::view<char>)>, upcxx::view<char, __gnu_cxx::__normal_iterator<char const*, std::vector<char, std::allocator<char> > > > >(upcxx::intrank_t, <lambda(upcxx::view<char, char*>)> &&, upcxx::view<char, __gnu_cxx::__normal_iterator<char const*, std::vector<char, std::allocator<char> > > > &&) (recipient=0, fn=..., args#0=...) at /usr/local/pkg/upcxx-dirac/gcc-9.1.0/nightly-2019.07.25/upcxx.debug.gasnet_seq.ibv/include/upcxx/rpc.hpp:86
[1] #15 0x00000000004065b9 in main (argc=2, argv=0x7ffe819efcf8) at rpc_flood.cpp:35
[0] /usr/local/pkg/gdb/newest/bin/gdb -nx -batch -x /tmp/gasnet_XLQiU2 '/home/pcp1/bonachea/UPC/code/a.out' 15784
[0] [Thread debugging using libthread_db enabled]
[0] Using host libthread_db library "/lib64/libthread_db.so.1".
[0] 0x00007fc543d2ca3c in waitpid () from /lib64/libc.so.6
[0] To enable execution of this file add
[0]     add-auto-load-safe-path /usr/local/pkg/gcc/9.1.0/lib64/libstdc++.so.6.0.26-gdb.py
[0] line to your configuration file "/home/pcp1/bonachea/.gdbinit".
[0] To completely disable this security protection add
[0]     set auto-load safe-path /
[0] line to your configuration file "/home/pcp1/bonachea/.gdbinit".
[0] For more information about this security protection see the
[0] "Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
[0]     info "(gdb)Auto-loading safe path"
[0] #0  0x00007fc543d2ca3c in waitpid () from /lib64/libc.so.6
[0] #1  0x00007fc543caade2 in do_system () from /lib64/libc.so.6
[0] #2  0x00000000004fb4a3 in gasneti_system_redirected (cmd=0xaea3e0 <cmd> "/usr/local/pkg/gdb/newest/bin/gdb -nx -batch -x /tmp/gasnet_XLQiU2 '/home/pcp1/bonachea/UPC/code/a.out' 15784", stdout_fd=6) at /tmp/upcxx-nightly-dirac-gcc/bld/upcxx_install/berkeleylab-upcxx-develop/.nobs/art/ea6a834a22a2e2c80e2e074571f59e5f866621b5/GASNet-stable/gasnet_tools.c:1271
[0] #3  0x00000000004fbea4 in gasneti_bt_gdb (fd=6) at /tmp/upcxx-nightly-dirac-gcc/bld/upcxx_install/berkeleylab-upcxx-develop/.nobs/art/ea6a834a22a2e2c80e2e074571f59e5f866621b5/GASNet-stable/gasnet_tools.c:1518
[0] #4  0x00000000004fc6e4 in gasneti_print_backtrace (fd=2) at /tmp/upcxx-nightly-dirac-gcc/bld/upcxx_install/berkeleylab-upcxx-develop/.nobs/art/ea6a834a22a2e2c80e2e074571f59e5f866621b5/GASNet-stable/gasnet_tools.c:1793
[0] #5  0x00000000004fccc6 in _gasneti_print_backtrace_ifenabled (fd=2) at /tmp/upcxx-nightly-dirac-gcc/bld/upcxx_install/berkeleylab-upcxx-develop/.nobs/art/ea6a834a22a2e2c80e2e074571f59e5f866621b5/GASNet-stable/gasnet_tools.c:1925
[0] #6  0x00000000004fa572 in gasneti_error_abort () at /tmp/upcxx-nightly-dirac-gcc/bld/upcxx_install/berkeleylab-upcxx-develop/.nobs/art/ea6a834a22a2e2c80e2e074571f59e5f866621b5/GASNet-stable/gasnet_tools.c:739
[0] #7  0x00000000004fa6f9 in gasneti_fatalerror (msg=0x7d1255 "\n%s") at /tmp/upcxx-nightly-dirac-gcc/bld/upcxx_install/berkeleylab-upcxx-develop/.nobs/art/ea6a834a22a2e2c80e2e074571f59e5f866621b5/GASNet-stable/gasnet_tools.c:768
[0] #8  0x0000000000442ce4 in upcxx::assert_failed (*** Caught a fatal signal (proc 0): SIGABRT(6)

There's at least one other distinct failure mode which does not generate an assertion, just a SEGV. It appears to be triggered by loopback rpc, and can be demonstrated with the same test by running a single process:

$ upcxx-run -np 1 -shared-heap=1GB a.out 100    
1 ranks running 100 iterations of 10485760 bytes
Hello from rank 0
SUCCESS
$ upcxx-run -np 1 -shared-heap=128MB a.out 100
1 ranks running 100 iterations of 10485760 bytes
Hello from rank 0
*** Caught a fatal signal (proc 0): SIGSEGV(11)
[0] Invoking GDB for backtrace...
[0] /usr/local/pkg/gdb/newest/bin/gdb -nx -batch -x /tmp/gasnet_tcP3Z4 '/home/pcp1/bonachea/UPC/code/a.out' 15899
[0] [Thread debugging using libthread_db enabled]
[0] Using host libthread_db library "/lib64/libthread_db.so.1".
[0] 0x00007f0ca1680a3c in waitpid () from /lib64/libc.so.6
[0] To enable execution of this file add
[0]     add-auto-load-safe-path /usr/local/pkg/gcc/9.1.0/lib64/libstdc++.so.6.0.26-gdb.py
[0] line to your configuration file "/home/pcp1/bonachea/.gdbinit".
[0] To completely disable this security protection add
[0]     set auto-load safe-path /
[0] line to your configuration file "/home/pcp1/bonachea/.gdbinit".
[0] For more information about this security protection see the
[0] "Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
[0]     info "(gdb)Auto-loading safe path"
[0] #0  0x00007f0ca1680a3c in waitpid () from /lib64/libc.so.6
[0] #1  0x00007f0ca15fede2 in do_system () from /lib64/libc.so.6
[0] #2  0x00000000004fb4a3 in gasneti_system_redirected (cmd=0xaea3e0 <cmd> "/usr/local/pkg/gdb/newest/bin/gdb -nx -batch -x /tmp/gasnet_tcP3Z4 '/home/pcp1/bonachea/UPC/code/a.out' 15899", stdout_fd=6) at /tmp/upcxx-nightly-dirac-gcc/bld/upcxx_install/berkeleylab-upcxx-develop/.nobs/art/ea6a834a22a2e2c80e2e074571f59e5f866621b5/GASNet-stable/gasnet_tools.c:1271
[0] #3  0x00000000004fbea4 in gasneti_bt_gdb (fd=6) at /tmp/upcxx-nightly-dirac-gcc/bld/upcxx_install/berkeleylab-upcxx-develop/.nobs/art/ea6a834a22a2e2c80e2e074571f59e5f866621b5/GASNet-stable/gasnet_tools.c:1518
[0] #4  0x00000000004fc6e4 in gasneti_print_backtrace (fd=2) at /tmp/upcxx-nightly-dirac-gcc/bld/upcxx_install/berkeleylab-upcxx-develop/.nobs/art/ea6a834a22a2e2c80e2e074571f59e5f866621b5/GASNet-stable/gasnet_tools.c:1793
[0] #5  0x00000000004fccc6 in _gasneti_print_backtrace_ifenabled (fd=2) at /tmp/upcxx-nightly-dirac-gcc/bld/upcxx_install/berkeleylab-upcxx-develop/.nobs/art/ea6a834a22a2e2c80e2e074571f59e5f866621b5/GASNet-stable/gasnet_tools.c:1925
[0] #6  0x00000000006e8685 in gasneti_defaultSignalHandler (sig=11) at /tmp/upcxx-nightly-dirac-gcc/bld/upcxx_install/berkeleylab-upcxx-develop/.nobs/art/ea6a834a22a2e2c80e2e074571f59e5f866621b5/GASNet-stable/gasnet_internal.c:704
[0] #7  <signal handler called>
[0] #8  0x00007f0ca1711cf4 in __memcpy_ssse3_back () from /lib64/libc.so.6
[0] #9  0x0000000000408ab1 in upcxx::detail::memcpy_aligned<8ul> (dst=0x0, src=0x7ffc354023e8, sz=8) at /usr/local/pkg/upcxx-dirac/gcc-9.1.0/nightly-2019.07.25/upcxx.debug.gasnet_seq.ibv/include/upcxx/utility.hpp:68
[0] #10 0x000000000040878f in upcxx::detail::serialization_writer<true>::push_trivial<upcxx::global_fnptr<void (upcxx::detail::lpc_base*)> >(upcxx::global_fnptr<void (upcxx::detail::lpc_base*)> const&) (this=0x7ffc35402450, x=...) at /usr/local/pkg/upcxx-dirac/gcc-9.1.0/nightly-2019.07.25/upcxx.debug.gasnet_seq.ibv/include/upcxx/serialization.hpp:314
[0] #11 0x0000000000406a2f in upcxx::detail::command<upcxx::detail::lpc_base*>::serialize<upcxx::backend::gasnet::rpc_as_lpc::reader_of, upcxx::backend::gasnet::rpc_as_lpc::cleanup<false>, upcxx::bound_function<main(int, char**)::<lambda(upcxx::view<char>)>, upcxx::view<char, __gnu_cxx::__normal_iterator<char const*, std::vector<char, std::allocator<char> > > > >&, upcxx::detail::serialization_writer<true> >(upcxx::detail::serialization_writer<true> &, std::size_t, upcxx::bound_function<main(int, char**)::<lambda(upcxx::view<char, char*>)>, upcxx::view<char, __gnu_cxx::__normal_iterator<char const*, std::vector<char, std::allocator<char> > > > > &) (w=..., size_ub=10485784, fn=...) at /usr/local/pkg/upcxx-dirac/gcc-9.1.0/nightly-2019.07.25/upcxx.debug.gasnet_seq.ibv/include/upcxx/command.hpp:72
[0] #12 0x000000000040683e in upcxx::backend::send_am_master<(upcxx::progress_level)1, upcxx::bound_function<main(int, char**)::<lambda(upcxx::view<char>)>, upcxx::view<char, __gnu_cxx::__normal_iterator<char const*, std::vector<char, std::allocator<char> > > > > >(upcxx::team &, upcxx::intrank_t, upcxx::bound_function<main(int, char**)::<lambda(upcxx::view<char, char*>)>, upcxx::view<char, __gnu_cxx::__normal_iterator<char const*, std::vector<char, std::allocator<char> > > > > &&) (tm=..., recipient=0, fn=...) at /usr/local/pkg/upcxx-dirac/gcc-9.1.0/nightly-2019.07.25/upcxx.debug.gasnet_seq.ibv/include/upcxx/backend/gasnet/runtime.hpp:311
[0] #13 0x00000000004066fe in upcxx::rpc_ff<main(int, char**)::<lambda(upcxx::view<char>)>, upcxx::view<char, __gnu_cxx::__normal_iterator<char const*, std::vector<char, std::allocator<char> > > > >(upcxx::team &, upcxx::intrank_t, <lambda(upcxx::view<char, char*>)> &&, upcxx::view<char, __gnu_cxx::__normal_iterator<char const*, std::vector<char, std::allocator<char> > > > &&) (tm=..., recipient=0, fn=..., args#0=...) at /usr/local/pkg/upcxx-dirac/gcc-9.1.0/nightly-2019.07.25/upcxx.debug.gasnet_seq.ibv/include/upcxx/rpc.hpp:75
[0] #14 0x0000000000406689 in upcxx::rpc_ff<main(int, char**)::<lambda(upcxx::view<char>)>, upcxx::view<char, __gnu_cxx::__normal_iterator<char const*, std::vector<char, std::allocator<char> > > > >(upcxx::intrank_t, <lambda(upcxx::view<char, char*>)> &&, upcxx::view<char, __gnu_cxx::__normal_iterator<char const*, std::vector<char, std::allocator<char> > > > &&) (recipient=0, fn=..., args#0=...) at /usr/local/pkg/upcxx-dirac/gcc-9.1.0/nightly-2019.07.25/upcxx.debug.gasnet_seq.ibv/include/upcxx/rpc.hpp:86
[0] #15 0x00000000004065b9 in main (argc=2, argv=0x7ffc354028d8) at rpc_flood.cpp:35

Comments (11)

  1. john bachan

    It is a consequence of our rpc API that no implementation could possibly guarantee both bounded memory usage and deadlock freedom. But I do think we could make some improvements that better avoid exhausting memory with some "soft" back pressure.

    Candidate improvements:

    1. Have a softly bounded amount of outgoing serialized RPC's (in terms of total bytes serialized), RPCs injected in excess of this bound will be locally queued in non-serialized form. The examples above would benefit from this tremendously for large view's. This will be a soft bound since a single RPC could exceed it.

    2. Have a softly bounded amount of incoming rendezvous GETs posted. Again, the above example should benefit.

    Point (2) risks deadlock in the case where the user uses future returning rpc's to extend the lifetime of the rendezvous buffer. If the user were to have that future be dependent on the completions of other RPC related communication events we might execute RPCs in the wrong order. Example: by user's logic, RPC 1 could hang on to a buffer until the subsequent RPC 2 executes, but 2 is stalled waiting for 1 to release its buffer... deadlock. We can differentiate RPCs by their return types, so perhaps we can devise some clever approach to deal specifically with future returning RPCs.

    1. Let them run wild as they currently do.

    2. Use a soft bound like above and then document to the user that they cannot return futures which are dependent on network traffic (we should permit lpc's though).

    3. Permit all kinds of future dependencies and use a soft bound that heuristically grows (like spin lock backoff) while it's being exceeded. Print a diagnostic when this happens so the user knows to increase the initial value of the bound to avoid waiting for the backoff to kick in.

  2. Steven Hofmeyr

    I have experienced this problem in a real application, namely one of the stages in HipMer. It aggregates updates in vectors and then dispatches them using rpc_ff. It tracks the rpcs issued and received so that it can ensure all updates were processed before the stage completes. Consequently, the message size can be large. I’ve noticed that it will crash with out-of-memory errors on larger KNL runs (eg 256 nodes) even though the actual memory requirements calculated by the program are very small compared to the available memory. It will consistently succeed on smaller size runs. I’m testing a hack solution to periodically quiesce all the rpcs, but this is not in general what you’d want the user to have to do. I feel this urgently needs a solution, preferably something in the runtime that the user doesn’t have to know about, or, failing that, some sort of easy programmatic way for the user to deal with this (e.g. some sort of quiesce function), and clear documentation in the user guide.

  3. Dan Bonachea reporter

    I agree with John there is no obvious "best backpressure algorithm" for this system. However I also agree with Steve that deploying some improvements to address this defect should be a high priority.

    I'd like to emphasize that even if we don't have time to make any algorithmic changes for this release, simply deploying memory-exhaustion diagnostics could be a massive improvement in the user experience, and should not require significant developer time investment. The diagnostics can reference documentation that explains the runtime use of shared memory and gives recommendations on how to manage it (including documenting any controlling knobs we eventually add).

    I'm not sure why John mentioned "wrong order" in his reply; for the benefit of other readers I'd like to emphasize that UPC++ provides no ordered delivery guarantees for RPCs that are concurrently traversing the network. The only ordering constraints are those imposed by control dependencies on the injector. However I agree with John's general point: RPC's future-returning buffer lifetime extension semantic means that in apps exploiting this feature, buffering resource consumption for a given RPC is not bounded by the end of RPC execution, and is only bounded in time by application behavior. If an application algorithm chooses to extend buffer lifetime contingent on the arrival of additional RPCs, the application is forcing buffer resource growth - I think in this situation we'd be justified in printing a warning when buffer utilization is approaching some threshold, and possibly even aborting with an explanatory message.

    Regarding John's proposed algorithmic improvements:

    I think proposed improvement 1 could be the basis for a very helpful improvement. However I'd strongly argue for user control over the shared-memory usage of the algorithm. Ie I'd like there to be a knob that enforces a hard bound on the amount of shared memory the runtime will acquire. The motivation is that it's reasonable for application writers to budget their physical memory and select a decomposition and runtime bound that fits within their segment; however it's not reasonable to make the runtime's segment usage "elastic" and thereby force application writers to dynamically deal with shared memory allocation failure (effectively the current situation).

    Such a limit can be enforced with simple bookkeeping of the current runtime-owned utilization and backpressure when the next serialization (or chunk of serialization) would exceed that bound. I'm fine with backpressure meaning that pre-serialized injections are queued in private memory (consuming zero shared memory), or even that a partial serialization stalls awaiting the network to drain. If a single RPC serializes to larger than the hard bound, I'm fine with a fatal error (with explanatory message), or possibly an optimistic attempt to acquire the necessary space with a LOUD warning pointing to the knob's docs.

    Re 2: I don't understand why bounding incoming rendezvous gets affects shared memory utilization - is the runtime allocating shared heap destination space for RMA gets? If so that should probably be changed or at least be a tunable knob, as this is unnecessary and (on several networks) not even beneficial.

    As an aside, I think it would be valuable to separate discussion of "rendezvous" RPCs that consume shared heap resources due to significant payload size from "eager" RPCs that do not. Even in a flow-control scenario, we should consider the ability to continue injecting "eager" RPCs.

  4. Dan Bonachea reporter

    This issue was discussed in the 2019-08-07 meeting

    We resolved that it's a high priority to at least improve the error diagnostics for this failure mode prior to the next release, even if we run out of time to implement backpressure on shared memory utilization. This would ideally include adding code to track how much of the shared heap the runtime is using for buffering at any given moment, so that information can be provided when appropriate.

    One of the possibilities floated was to print diagnostics any time dl_malloc fails to allocate space. I'd like to mention that I think it's important to maintain the spec-guaranteed semantics for user allocation failure - specifically I don't think we should fatal error or print unsuppressable warnings for user allocation calls where the spec says we should return a nullptr or throw std::bad_alloc on memory exhaustion. However perhaps printing them in debug mode (with an envvar to silence them) would be appropriate. Subclassing std::bad_alloc and overriding std::bad_alloc::what() to return a string describing the current state of shared memory and the runtime's usage also seems like a really helpful addition.

    On the other hand, I think cases where the runtime fails to allocate memory for buffering (which are currently silently fatal) should be improved to output more information - ie if we are going to crash anyhow, we might as well explain to the user what happened.

    Finally, it's worth noting that in UPC/UPC++ hybrid mode using the default UPCXX_USE_UPC_ALLOC=yes mode (which Steve is probably using in HipMer), we are not using dl_malloc at all and allocation failures from UPCR's shared heap allocator are instantly fatal (with a message that describes the overall heap state and size of the failed attempt -this is a documented divergence from the UPC++ spec). There's probably nothing we can do to help that case, short of tighter integration allowing UPCR to query the proposed UPC++ runtime shared memory usage counter during an OOM crash. The non-default UPCXX_USE_UPC_ALLOC=no mode uses dl_malloc so could be a workaround for diagnosing such crashes, but comes with some downsides that might prevent its use in HipMer.

  5. Dan Bonachea reporter

    With pull request #115 merged at f95dafc, we now provide improved user-friendly diagnostics when the runtime exhausts the shared heap due to hidden buffering, and fewer buffers are used overall.

    The underlying fundamental problem remains and certain communication patterns can still result in unbounded shared memory consumption with runtime buffers. There is still no way to enforce or reliably reason about a hard bound on the runtime's shared memory use to allow the end user to plan their shared memory utilization for ensuring lack of OOM failures (especially at large scale). Adding such a control is deferred to the next release.

  6. Dan Bonachea reporter

    This issue has been resolved in PR 376, which exposes backpressure by throwing a upcxx::bad_shared_alloc exception from rpc/rpc_ff calls that would have previously led to a resource exhaustion fatal error.

    Of course this still results in a std::terminate for applications that ignore the exception, but applications can now choose to catch the exception and recover (eg delaying their RPC injection attempt until later).

  7. Log in to comment