SEGV in handle_cb_queue::enqueue() for PAR backend with many tests

Using the 2017.9.0 release on dirac/gcc-7/smp/debug:

{pcp-d-5 ~/upcxx-2017.9.0/example/prog-guide} env UPCXX_GASNET_CONDUIT=smp UPCXX_CODEMODE=debug UPCXX_THREADMODE=seq UPCXX_INSTALL=/home/pcp1/bonachea/upcxx-2017.9.0/foo gmake clean compute-pi-multi-examples      
rm -rf hello-world compute-pi compute-pi-multi-examples persona-example
g++ compute-pi-multi-examples.cpp -DUPCXX_BACKEND=gasnet1_seq -D_GNU_SOURCE=1 -DGASNET_SEQ -I/home/pcp1/bonachea/upcxx-2017.9.0/foo/gasnet.debug/include -I/home/pcp1/bonachea/upcxx-2017.9.0/foo/gasnet.debug/include/smp-conduit -I/home/pcp1/bonachea/upcxx-2017.9.0/foo/upcxx.debug.gasnet1_seq.smp/include -std=c++11 -Wno-inline -g3 -Wno-unused -Wno-unused-parameter -Wno-address -std=c++11 -Wno-inline -L/home/pcp1/bonachea/upcxx-2017.9.0/foo/upcxx.debug.gasnet1_seq.smp/lib -lupcxx -lpthread -L/home/pcp1/bonachea/upcxx-2017.9.0/foo/gasnet.debug/lib -lgasnet-smp-seq -lrt -L/usr/local/pkg/gcc/7.2.0/lib/gcc/x86_64-pc-linux-gnu/7.2.0 -lgcc -lm -o compute-pi-multi-examples 
{pcp-d-5 ~/upcxx-2017.9.0/example/prog-guide} env GASNET_PSHM_NODES=1 ./compute-pi-multi-examples
Testing compute-pi-multi-examples.cpp with 1 ranks
Calculating pi with 100000 trials, distributed across 1 ranks.
rpc: pi estimate: 3.14152, rank 0 alone: 3.14152
rpc_no_barrier: pi estimate: 3.14152, rank 0 alone: 3.14152
global_ptrs: pi estimate: 3.14152, rank 0 alone: 3.14152
distobj: pi estimate: 3.14152, rank 0 alone: 3.14152
async_distobj: pi estimate: 3.14152, rank 0 alone: 3.14152
atomics: pi estimate: 3.14152, rank 0 alone: 3.14152
quiescence: pi estimate: 3.14152, rank 0 alone: 3.14152
Computed pi to be 3.14152
SUCCESS

{pcp-d-5 ~/upcxx-2017.9.0/example/prog-guide} env UPCXX_GASNET_CONDUIT=smp UPCXX_CODEMODE=debug UPCXX_THREADMODE=par UPCXX_INSTALL=/home/pcp1/bonachea/upcxx-2017.9.0/foo gmake clean compute-pi-multi-examples   
rm -rf hello-world compute-pi compute-pi-multi-examples persona-example
g++ compute-pi-multi-examples.cpp -DUPCXX_BACKEND=gasnetex_par -D_GNU_SOURCE=1 -DGASNET_PAR -D_REENTRANT -I/home/pcp1/bonachea/upcxx-2017.9.0/foo/gasnet.debug/include -I/home/pcp1/bonachea/upcxx-2017.9.0/foo/gasnet.debug/include/smp-conduit -I/home/pcp1/bonachea/upcxx-2017.9.0/foo/upcxx.debug.gasnetex_par.smp/include -std=c++11 -Wno-inline -g3 -Wno-unused -Wno-unused-parameter -Wno-address -std=c++11 -Wno-inline -L/home/pcp1/bonachea/upcxx-2017.9.0/foo/upcxx.debug.gasnetex_par.smp/lib -lupcxx -lpthread -L/home/pcp1/bonachea/upcxx-2017.9.0/foo/gasnet.debug/lib -lgasnet-smp-par -lpthread -lrt -L/usr/local/pkg/gcc/7.2.0/lib/gcc/x86_64-pc-linux-gnu/7.2.0 -lgcc -lm -o compute-pi-multi-examples 
{pcp-d-5 ~/upcxx-2017.9.0/example/prog-guide} env GASNET_PSHM_NODES=1 ./compute-pi-multi-examples                                                                                                              
Testing compute-pi-multi-examples.cpp with 1 ranks
Calculating pi with 100000 trials, distributed across 1 ranks.
rpc: pi estimate: 3.14152, rank 0 alone: 3.14152
rpc_no_barrier: pi estimate: 3.14152, rank 0 alone: 3.14152
*** Caught a fatal signal: SIGSEGV(11) on node 0/1

Note the test works fine on the SEQ backend but crashes on the PAR backend. In both cases this is a single rank containing a single thread (this test does not spawn threads).

Here is the crash stack:

Program received signal SIGSEGV, Segmentation fault.
0x000000000043b72c in upcxx::backend::gasnet::handle_cb_queue::enqueue (this=0x8b0188 <upcxx::backend::master+136>, cb=0x8f0580)
    at /home/pcp1/bonachea/upcxx-2017.9.0/.nobs/art/9745a86cc2134db69f60402e204d700b7b484511/upcxx/backend/gasnet/handle_cb.hpp:33
33          *this->tailp_ = cb;
(gdb) where
#0  0x000000000043b72c in upcxx::backend::gasnet::handle_cb_queue::enqueue (this=0x8b0188 <upcxx::backend::master+136>, cb=0x8f0580)
    at /home/pcp1/bonachea/upcxx-2017.9.0/.nobs/art/9745a86cc2134db69f60402e204d700b7b484511/upcxx/backend/gasnet/handle_cb.hpp:33
#1  0x000000000043a758 in upcxx::backend::rma_put (rank_d=0, buf_d=0x7fff763563c8, buf_s=0x8f05b0, buf_size=4, cb=0x8f0580) at /home/pcp1/bonachea/upcxx-2017.9.0/src/backend/gasnet/runtime.cpp:261
#2  0x00000000004110dd in upcxx::rput<int, upcxx::nil_cx, upcxx::nil_cx, upcxx::future_cx<0> > (value_s=78538, gp_d=..., cxs=...)
    at /home/pcp1/bonachea/upcxx-2017.9.0/foo/upcxx.debug.gasnetex_par.smp/include/upcxx/rput.hpp:187
#3  0x0000000000404bfe in global_ptrs::reduce_to_rank0 (my_hits=78538) at global-ptrs-reduce_to_rank0.hpp:15
#4  0x000000000040559a in main (argc=1, argv=0x7fffffffd5f8) at compute-pi-multi-examples.cpp:83
(gdb) print *this
$4 = {head_ = 0x0, tailp_ = 0x0}

Looks like a bug pushing onto an empty queue - appears something is not properly resetting to the expected empty list state.

In addition to compute-pi-multi-examples, I see similar crashes when running these nobs tests against the par backend: rpc_barrier, rput, atomics

Comments (20)