bench/put_flood crashes on clang/opt/Linux

Issue #184 resolved
Dan Bonachea created an issue

Recent CI results show that bench/put_flood on develop (555cf93) is crashing at the end of runs on at least three different clang-7/Linux systems in opt mode (only). The same symptom occurs in both SEQ and PAR threadmodes, and appears to be very reproducible. The problem has been seen with all of smp-conduit, udp-conduit and ibv-conduit, in both local-only single-node runs and local-and-remote distributed test runs.

The problem has not so far been witnessed on MacOS, which all currently run an older fork of LLVM (Apple clang). The problem has also not been seen thus far with any version of gcc in the normal testing rotation.

Below is a manual demonstration on kotten. The stack trace should be taken with a grain of salt since optimizations are enabled, but it seems to indicate a problem with the custom hash table operator in bench/common/row.hpp

{kotten ~/UPC/upcxx/bench} module load upcxx/nightly/clang-7.0.0 

{kotten ~/UPC/upcxx/bench} upcxx --version
UPC++ version 20180903 upcxx-2018.9.3-3-g555cf93
Copyright (c) 2018, The Regents of the University of California,
through Lawrence Berkeley National Laboratory.
http://upcxx.lbl.gov

clang version 7.0.0 (tags/RELEASE_700/final)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/local/pkg/clang/7.0.0/bin

{kotten ~/UPC/upcxx/bench} env UPCXX_GASNET_CONDUIT=smp upcxx -O -o put_flood put_flood.cpp

{kotten ~/UPC/upcxx/bench} upcxx-run -np 2 put_flood
Running with peers: local=1 and remote=-1
Putting to peer=1
Measuring size=8 kind=lat how=upcxx
Measuring size=8 kind=lat how=gasnet
Measuring size=8 kind=bw how=upcxx-pro
Measuring size=8 kind=bw how=upcxx-fut
Measuring size=8 kind=bw how=gasnet
Measuring size=8 kind=bw how=gasnet-nbi
Measuring size=16 kind=lat how=upcxx
Measuring size=16 kind=lat how=gasnet
Measuring size=16 kind=bw how=upcxx-pro
Measuring size=16 kind=bw how=upcxx-fut
Measuring size=16 kind=bw how=gasnet
Measuring size=16 kind=bw how=gasnet-nbi
Measuring size=32 kind=lat how=upcxx
Measuring size=32 kind=lat how=gasnet
Measuring size=32 kind=bw how=upcxx-pro
Measuring size=32 kind=bw how=upcxx-fut
Measuring size=32 kind=bw how=gasnet
Measuring size=32 kind=bw how=gasnet-nbi
Measuring size=64 kind=lat how=upcxx
Measuring size=64 kind=lat how=gasnet
Measuring size=64 kind=bw how=upcxx-pro
Measuring size=64 kind=bw how=upcxx-fut
Measuring size=64 kind=bw how=gasnet
Measuring size=64 kind=bw how=gasnet-nbi
Measuring size=128 kind=lat how=upcxx
Measuring size=128 kind=lat how=gasnet
Measuring size=128 kind=bw how=upcxx-pro
Measuring size=128 kind=bw how=upcxx-fut
Measuring size=128 kind=bw how=gasnet
Measuring size=128 kind=bw how=gasnet-nbi
Measuring size=256 kind=lat how=upcxx
Measuring size=256 kind=lat how=gasnet
Measuring size=256 kind=bw how=upcxx-pro
Measuring size=256 kind=bw how=upcxx-fut
Measuring size=256 kind=bw how=gasnet
Measuring size=256 kind=bw how=gasnet-nbi
Measuring size=512 kind=lat how=upcxx
Measuring size=512 kind=lat how=gasnet
Measuring size=512 kind=bw how=upcxx-pro
Measuring size=512 kind=bw how=upcxx-fut
Measuring size=512 kind=bw how=gasnet
Measuring size=512 kind=bw how=gasnet-nbi
Measuring size=1024 kind=lat how=upcxx
Measuring size=1024 kind=lat how=gasnet
Measuring size=1024 kind=bw how=upcxx-pro
Measuring size=1024 kind=bw how=upcxx-fut
Measuring size=1024 kind=bw how=gasnet
Measuring size=1024 kind=bw how=gasnet-nbi
Measuring size=2048 kind=lat how=upcxx
Measuring size=2048 kind=lat how=gasnet
Measuring size=2048 kind=bw how=upcxx-pro
Measuring size=2048 kind=bw how=upcxx-fut
Measuring size=2048 kind=bw how=gasnet
Measuring size=2048 kind=bw how=gasnet-nbi
Measuring size=4096 kind=lat how=upcxx
Measuring size=4096 kind=lat how=gasnet
Measuring size=4096 kind=bw how=upcxx-pro
Measuring size=4096 kind=bw how=upcxx-fut
Measuring size=4096 kind=bw how=gasnet
Measuring size=4096 kind=bw how=gasnet-nbi
Measuring size=8192 kind=lat how=upcxx
Measuring size=8192 kind=lat how=gasnet
Measuring size=8192 kind=bw how=upcxx-pro
Measuring size=8192 kind=bw how=upcxx-fut
Measuring size=8192 kind=bw how=gasnet
Measuring size=8192 kind=bw how=gasnet-nbi
Measuring size=16384 kind=lat how=upcxx
Measuring size=16384 kind=lat how=gasnet
Measuring size=16384 kind=bw how=upcxx-pro
Measuring size=16384 kind=bw how=upcxx-fut
Measuring size=16384 kind=bw how=gasnet
Measuring size=16384 kind=bw how=gasnet-nbi
Measuring size=32768 kind=lat how=upcxx
Measuring size=32768 kind=lat how=gasnet
Measuring size=32768 kind=bw how=upcxx-pro
Measuring size=32768 kind=bw how=upcxx-fut
Measuring size=32768 kind=bw how=gasnet
Measuring size=32768 kind=bw how=gasnet-nbi
Measuring size=65536 kind=lat how=upcxx
Measuring size=65536 kind=lat how=gasnet
Measuring size=65536 kind=bw how=upcxx-pro
Measuring size=65536 kind=bw how=upcxx-fut
Measuring size=65536 kind=bw how=gasnet
Measuring size=65536 kind=bw how=gasnet-nbi
Measuring size=131072 kind=lat how=upcxx
Measuring size=131072 kind=lat how=gasnet
Measuring size=131072 kind=bw how=upcxx-pro
Measuring size=131072 kind=bw how=upcxx-fut
Measuring size=131072 kind=bw how=gasnet
Measuring size=131072 kind=bw how=gasnet-nbi
Measuring size=262144 kind=lat how=upcxx
Measuring size=262144 kind=lat how=gasnet
Measuring size=262144 kind=bw how=upcxx-pro
Measuring size=262144 kind=bw how=upcxx-fut
Measuring size=262144 kind=bw how=gasnet
Measuring size=262144 kind=bw how=gasnet-nbi
Measuring size=524288 kind=lat how=upcxx
Measuring size=524288 kind=lat how=gasnet
Measuring size=524288 kind=bw how=upcxx-pro
Measuring size=524288 kind=bw how=upcxx-fut
Measuring size=524288 kind=bw how=gasnet
Measuring size=524288 kind=bw how=gasnet-nbi
Measuring size=1048576 kind=lat how=upcxx
Measuring size=1048576 kind=lat how=gasnet
Measuring size=1048576 kind=bw how=upcxx-pro
Measuring size=1048576 kind=bw how=upcxx-fut
Measuring size=1048576 kind=bw how=gasnet
Measuring size=1048576 kind=bw how=gasnet-nbi
Measuring size=2097152 kind=lat how=upcxx
Measuring size=2097152 kind=lat how=gasnet
Measuring size=2097152 kind=bw how=upcxx-pro
Measuring size=2097152 kind=bw how=upcxx-fut
Measuring size=2097152 kind=bw how=gasnet
Measuring size=2097152 kind=bw how=gasnet-nbi
Measuring size=4194304 kind=lat how=upcxx
Measuring size=4194304 kind=lat how=gasnet
Measuring size=4194304 kind=bw how=upcxx-pro
Measuring size=4194304 kind=bw how=upcxx-fut
Measuring size=4194304 kind=bw how=gasnet
Measuring size=4194304 kind=bw how=gasnet-nbi
*** Caught a fatal signal (proc 0): SIGSEGV(11)
NOTICE: We recommend linking the debug version of GASNet to assist you in resolving this application issue.
[0] Invoking GDB for backtrace...
[0] /usr/bin/gdb -nx -batch -x /tmp/gasnet_6ffh92 '/home/pcp1/bonachea/UPC/upcxx/bench/put_flood' 17453
[0] [Thread debugging using libthread_db enabled]
[0] Using host libthread_db library "/usr/lib64/libthread_db.so.1".
[0] 0x00007fd33805a10c in waitpid () from /usr/lib64/libc.so.6
[0] To enable execution of this file add
[0]     add-auto-load-safe-path /usr/local/pkg/gcc/8.2.0/lib64/libstdc++.so.6.0.25-gdb.py
[0] line to your configuration file "/home/pcp1/bonachea/.gdbinit".
[0] To completely disable this security protection add
[0]     set auto-load safe-path /
[0] line to your configuration file "/home/pcp1/bonachea/.gdbinit".
[0] For more information about this security protection see the
[0] "Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
[0]     info "(gdb)Auto-loading safe path"
[0] #0  0x00007fd33805a10c in waitpid () from /usr/lib64/libc.so.6
[0] #1  0x00007fd337fd7de2 in do_system () from /usr/lib64/libc.so.6
[0] #2  0x0000000000424f45 in gasneti_system_redirected ()
[0] #3  0x00000000004249de in gasneti_bt_gdb ()
[0] #4  0x000000000041f06d in gasneti_print_backtrace ()
[0] #5  0x000000000047b90e in gasneti_defaultSignalHandler ()
[0] #6  <signal handler called>
[0] #7  0x000000000040a30d in std::hash<bench::row<char const*, char const*> >::operator()(bench::row<char const*, char const*> const&) const ()
[0] #8  0x000000000040a224 in std::hash<bench::row<char const*, unsigned long, char const*, char const*> >::operator()(bench::row<char const*, unsigned long, char const*, char const*> const&) const ()
[0] #9  0x0000000000406a31 in main ()
Segmentation fault (core dumped)

Comments (5)

  1. Paul Hargrove

    The problem is not exclusive to clang-7.
    For a while now, we have been testing Clang-8 (on Dirac, Kotten, ppc64 and ppc64el) and this problem persists on all four systems.

    Additionally, testing of pre-prelease snapshots of gcc-9 (expected out by early May) seem to show the same SEGV. I will update with something more definitive once the final gcc-9 release enters our CI testing.

  2. Dan Bonachea reporter

    Here is the gcc-9.1.0/opt crash stack, with debugging symbols compiled into the app code:

    *** Caught a fatal signal (proc 0): SIGSEGV(11)
    NOTICE: We recommend linking the debug version of GASNet to assist you in resolving this application issue.
    [0] Invoking GDB for backtrace...
    [0] /usr/local/pkg/gdb/8.0.1/bin/gdb -nx -batch -x /tmp/gasnet_wyRn3q '/home/pcp1/bonachea/UPC/upcxx/bench/put_flood-opt' 7626
    [0] [Thread debugging using libthread_db enabled]
    [0] Using host libthread_db library "/usr/lib64/libthread_db.so.1".
    [0] 0x00007f1bb918510c in waitpid () from /usr/lib64/libc.so.6
    [0] To enable execution of this file add
    [0]     add-auto-load-safe-path /usr/local/pkg/gcc/9.1.0/lib64/libstdc++.so.6.0.26-gdb.py
    [0] line to your configuration file "/home/pcp1/bonachea/.gdbinit".
    [0] To completely disable this security protection add
    [0]     set auto-load safe-path /
    [0] line to your configuration file "/home/pcp1/bonachea/.gdbinit".
    [0] For more information about this security protection see the
    [0] "Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
    [0]     info "(gdb)Auto-loading safe path"
    [0] #0  0x00007f1bb918510c in waitpid () from /usr/lib64/libc.so.6
    [0] #1  0x00007f1bb9102de2 in do_system () from /usr/lib64/libc.so.6
    [0] #2  0x0000000000421db9 in gasneti_system_redirected ()
    [0] #3  0x00000000004224b3 in gasneti_bt_gdb ()
    [0] #4  0x0000000000426d46 in gasneti_print_backtrace ()
    [0] #5  0x00000000004092f2 in gasneti_defaultSignalHandler ()
    [0] #6  <signal handler called>
    [0] #7  0x000000000040d714 in std::hash<bench::row<char const*> >::operator() (x=..., this=<optimized out>) at /usr/local/pkg/gcc/9.1.0/include/c++/9.1.0/bits/functional_hash.h:168
    [0] #8  std::hash<bench::row<char const*, char const*> >::operator() (x=..., this=<optimized out>) at common/row.hpp:189
    [0] #9  std::hash<bench::row<unsigned long, char const*, char const*> >::operator() (x=..., this=<optimized out>) at common/row.hpp:170
    [0] #10 std::hash<bench::row<char const*, unsigned long, char const*, char const*> >::operator() (x=..., this=0x7ffee89324f0) at common/row.hpp:189
    [0] #11 std::__detail::_Hash_code_base<bench::row<char const*, unsigned long, char const*, char const*>, std::pair<bench::row<char const*, unsigned long, char const*, char const*> const, double>, std::__detail::_Select1st, std::hash<bench::row<char const*, unsigned long, char const*, char const*> >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, true>::_M_hash_code (__k=..., this=0x7ffee89324f0) at /usr/local/pkg/gcc/9.1.0/include/c++/9.1.0/bits/hashtable_policy.h:1377
    [0] #12 std::_Hashtable<bench::row<char const*, unsigned long, char const*, char const*>, std::pair<bench::row<char const*, unsigned long, char const*, char const*> const, double>, std::allocator<std::pair<bench::row<char const*, unsigned long, char const*, char const*> const, double> >, std::__detail::_Select1st, std::equal_to<bench::row<char const*, unsigned long, char const*, char const*> >, std::hash<bench::row<char const*, unsigned long, char const*, char const*> >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::count (this=this@entry=0x7ffee89324f0, __k=...) at /usr/local/pkg/gcc/9.1.0/include/c++/9.1.0/bits/hashtable.h:1455
    [0] #13 0x000000000040c2fc in std::unordered_map<bench::row<char const*, unsigned long, char const*, char const*>, double, std::hash<bench::row<char const*, unsigned long, char const*, char const*> >, std::equal_to<bench::row<char const*, unsigned long, char const*, char const*> >, std::allocator<std::pair<bench::row<char const*, unsigned long, char const*, char const*> const, double> > >::count (__x=..., this=0x7ffee89324f0) at put_flood.cpp:394
    [0] #14 main () at put_flood.cpp:394
    Segmentation fault (core dumped)
    

    Full diagnosis and proposed fix in pull request #97

  3. Dan Bonachea reporter

    fix issue #184: SEGV on put_flood

    As previously written, the put_flood output loop was treading into UB, using a std::initializer_list beyond its lifetime. I've confirmed via printf that in -O3 mode, gcc-9.1.0/x86_64/Linux was overwriting the contents of the std::initializer_list before the end of the loop, resulting in passing of a garbage char * pointer to the how argument of make_row, precipitating a later SEGV in hash.

    → <<cset 3d1992e82f0d>>

  4. Log in to comment