dist_id::when_here() is not always ready when it trivially ought to be

Issue #456 resolved
Rob Egan created an issue

I’m not sure if this is a bug, but it is definitely an unexpected assertion failure where a dist_id from an existing dist_object does not have a trivially ready future returned from when_here()

#include <cassert>
#include <cstdio>
#include <upcxx/upcxx.hpp>


using upcxx::barrier;
using upcxx::dist_object;
using upcxx::rank_me;
using upcxx::rank_n;

int main(int argc, char **argv) {
 upcxx::init();
 for(int i =0; i < 1000; i++) {
  upcxx::dist_object<size_t> dist_count(0);
  auto id = dist_count.id();
  upcxx::barrier();
  assert(id.when_here().ready());
  upcxx::barrier();
 }
 upcxx::finalize();
 return 0;
}

$ GASNET_BACKTRACE=1 upcxx-run -n 48 ./test_dist_obj_here_is_ready
test_dist_obj_here_is_ready: ../test/test_dist_obj_here_is_ready.cpp:17: int main(int, char**): Assertion `id.when_here().ready()' failed.
*** Caught a fatal signal (proc 0): SIGABRT(6)
[0] Invoking GDB for backtrace...
[0] /usr/bin/gdb -nx -batch -x /tmp/gasnet_utIgPx '/home/regan/workspace/mhm2/upcxx-utils/build-debug/./test_dist_obj_here_is_ready' 31123
[0] [Thread debugging using libthread_db enabled]
[0] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[0] 0x00007fd339708457 in __GI___waitpid (pid=31172, stat_loc=stat_loc@entry=0x7fffc4e65868, options=options@entry=0) at ../sysdeps/unix/sysv/linux/waitpid.c:30
[0] #0  0x00007fd339708457 in __GI___waitpid (pid=31172, stat_loc=stat_loc@entry=0x7fffc4e65868, options=options@entry=0) at ../sysdeps/unix/sysv/linux/waitpid.c:30
[0] #1  0x00007fd339673177 in do_system (line=<optimized out>) at ../sysdeps/posix/system.c:149
[0] #2  0x000055f3d971dd5d in gasneti_system_redirected (cmd=0x55f3d9c860c0 <cmd> "/usr/bin/gdb -nx -batch -x /tmp/gasnet_utIgPx '/home/regan/workspace/mhm2/upcxx-utils/build-debug/./test_dist_obj_here_is_ready' 31123", stdout_fd=3) at /dev/shm/upcxx-regan-hipmer-builds/upcxx-2020.10.0/bld/GASNet-2020.10.0/gasnet_tools.c:1276
[0] #3  0x000055f3d971e615 in gasneti_bt_gdb (fd=3) at /dev/shm/upcxx-regan-hipmer-builds/upcxx-2020.10.0/bld/GASNet-2020.10.0/gasnet_tools.c:1532
[0] #4  0x000055f3d971efb1 in gasneti_print_backtrace (fd=2) at /dev/shm/upcxx-regan-hipmer-builds/upcxx-2020.10.0/bld/GASNet-2020.10.0/gasnet_tools.c:1810
[0] #5  0x000055f3d971f65e in _gasneti_print_backtrace_ifenabled (fd=2) at /dev/shm/upcxx-regan-hipmer-builds/upcxx-2020.10.0/bld/GASNet-2020.10.0/gasnet_tools.c:1943
[0] #6  0x000055f3d98da9a6 in gasneti_defaultSignalHandler (sig=6) at /dev/shm/upcxx-regan-hipmer-builds/upcxx-2020.10.0/bld/GASNet-2020.10.0/gasnet_internal.c:723
[0] #7  <signal handler called>
[0] #8  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
[0] #9  0x00007fd339664921 in __GI_abort () at abort.c:79
[0] #10 0x00007fd33965448a in __assert_fail_base (fmt=0x7fd3397db750 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x55f3d99aafe8 "id.when_here().ready()", file=file@entry=0x55f3d99aafc0 "../test/test_dist_obj_here_is_ready.cpp", line=line@entry=17, function=function@entry=0x55f3d99ab300 <main::__PRETTY_FUNCTION__> "int main(int, char**)") at assert.c:92
[0] #11 0x00007fd339654502 in __GI___assert_fail (assertion=0x55f3d99aafe8 "id.when_here().ready()", file=0x55f3d99aafc0 "../test/test_dist_obj_here_is_ready.cpp", line=17, function=0x55f3d99ab300 <main::__PRETTY_FUNCTION__> "int main(int, char**)") at assert.c:101
[0] #12 0x000055f3d965a036 in main (argc=1, argv=0x7fffc4e68d18) at ../test/test_dist_obj_here_is_ready.cpp:17
[0] [Inferior 1 (process 31123) detached]
Aborted (core dumped)

Comments (2)

  1. Dan Bonachea

    Hi @Rob Egan This is not a bug, rather is working as intended.

    The spec permits but does not guarantee a trivially ready future to be returned for this case.

    The current behavior is an artifact of how dist_object is implemented. Inserting a call to upcxx::progress() between the dist_object<T> construction and dist_id<T>::when_here() should result in the behavior you expect, but this is not a guaranteed property.

    Code that is certain the dist_object<T> exists should use dist_id<T>::here() instead; dist_id<T>::when_here() is intended for the case where lifetime is not statically known.

    FWIW the non-ready future returned here should not incur any additional cost in and of itself - the associated internal promise is created as part of dist_object construction, fulfillment is just deferred.

    These semantics could potentially be strengthened in a future release, but not before we fully disallow (currently deprecated) dist_object construction inside progress callbacks (the rationale behind the current implementation).

  2. Rob Egan reporter

    Got it and that makes perfect sense, especially with the nuance that dist_object creation is presently only deprecated in the restricted progress context, and not disallowed.

  3. Log in to comment