gdb segfault when printing a global_ptr during interactive debugging

Issue #443 resolved
Rob Egan created an issue

I cannot figure out if this is a known issue or just an issue with my version of gdb or gcc, but whenever I attempt to debug code with gdb, it works fine up until it tries to expand a variable with a upcxx::global_ptr, and then gdb core dumps.

Here I am debugging an active process:

(gdb) up
#3  0x00005555557ca282 in ska::detailv8::sherwood_v8_table<std::pair<Kmer<32>, KmerCounts>, Kmer<32>, KmerHash<32>, ska::detailv3::KeyOrValueHasher<Kmer<32>, std::pair<Kmer<32>, KmerCounts>, KmerHash<32> >, KmerEqual<32>, ska::detailv3::KeyOrValueEquality<Kmer<32>, std::pair<Kmer<32>, KmerCounts>, KmerEqual<32> >, std::allocator<std::pair<Kmer<32>, KmerCounts> >, std::allocator<unsigned char>, (unsigned char)8>::emplace<std::pair<Kmer<32>, KmerCounts>>(std::pair<Kmer<32>, KmerCounts>&&) (this=this@entry=0x7fffffffa8c0, key=...)
    at /var/lib/gitlab-runner/builds/x6sRe-zd/0/robegan21/mhmxx/include/bytell_hash_map.hpp:476
476                     return emplace_direct_hit({ index, block }, std::forward<Key>(key), std::forward<Args>(args)...);
(gdb) p index
$9 = 1328859
(gdb) p block
$10 = (ska::detailv8::sherwood_v8_table<std::pair<Kmer<32>, KmerCounts>, Kmer<32>, KmerHash<32>, ska::detailv3::KeyOrValueHasher<Kmer<32>, std::pair<Kmer<32>, KmerCounts>, KmerHash<32> >, KmerEqual<32>, ska::detailv3::KeyOrValueEquality<Kmer<32>, std::pair<Kmer<32>, KmerCounts>, KmerEqual<32> >, std::allocator<std::pair<Kmer<32>, KmerCounts> >, std::allocator<unsigned char>, 8>::BlockPointer) 0x7ffc9b616f68
(gdb) p key
$11 = (std::pair<Kmer<32>, KmerCounts> &&) @0x7ffcb8255888: {first = {static k = 21, static N_LONGS = 1, longs = {_M_elems = {11390612276768669696}}}, second = {left_exts = {count_A = 0, count_C = 0,
      count_G = 0, count_T = 4}, right_exts = {count_A = 0, count_C = 0, count_G = 4, count_T = 0}, uutig_frag = {<upcxx::global_ptr<FragElem const, (upcxx::memory_kind)1>> = {
Segmentation fault

Is there something I can do with gdb to prevent the crash? or do I need to install a newer version?

gdb --version
GNU gdb (Ubuntu 8.2-0ubuntu1~16.04.1) 8.2
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

g++ (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

regan@hulk:/work/gitlab-ci/scratch/mhm2-6cc70abe-RefactorReadNames-$ /work/gitlab-ci/ci-install-upcxx-2020.3.8/bin/upcxx --version
UPC++ version 2020.3.8 / gex-2020.3.8
Copyright (c) 2020, The Regents of the University of California,
through Lawrence Berkeley National Laboratory.
https://upcxx.lbl.gov

g++ (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Comments (5)

  1. Dan Bonachea

    Hi @Rob Egan -

    I've never seen this particular issue before, and I wouldn't have the foggiest notion how to debug a gdb crash. This definitely seems like an "external" bug (in gdb or possibly the system configuration).

    However, I can report success with at least the following setup on our dirac cluster, using the smp/debug backend and test/global_ptr.cpp:

    (gdb) print ptr
    $2 = {<upcxx::global_ptr<int const, (upcxx::memory_kind)1>> = {rank_ = 0, 
        raw_ptr_ = 0x7fffed5f23c8}, <No data fields>}
    (gdb) print cptr
    $3 = {rank_ = 0, raw_ptr_ = 0x0}
    (gdb) quit
    {pcp-d-5} upcxx --version
    UPC++ version 2020.11.5 upcxx-2020.11.3-memory_kinds-34-gee1c768 / gex-stable-2021_01_08
    Copyright (c) 2021, The Regents of the University of California,
    through Lawrence Berkeley National Laboratory.
    https://upcxx.lbl.gov
    
    g++ (GCC) 10.2.0
    Copyright (C) 2020 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions.  There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
    
    {pcp-d-5} gdb --version
    GNU gdb (GDB) 8.0.1
    Copyright (C) 2017 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
    and "show warranty" for details.
    This GDB was configured as "x86_64-pc-linux-gnu".
    Type "show configuration" for configuration details.
    For bug reporting instructions, please see:
    <http://www.gnu.org/software/gdb/bugs/>.
    Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.
    For help, type "help".
    Type "apropos word" to search for commands related to "word".
    

    It works fine for me on the same platform with gdb 8.0.1 and all of:

    • upcxx mk-develop and gcc 10.2.0
    • upcxx 2020.3.8 and gcc 10.2.0
    • upcxx 2020.3.8 and gcc 6.4.0

    Examining your failing output, this line looks "fishy" to me:

    uutig_frag = {<upcxx::global_ptr<FragElem const, (upcxx::memory_kind)1>> = {
    Segmentation fault
    

    What is the actual declared type of the field in uutig_frag?

    The output syntax here seems to indicate uutig_frag is an aggregate where the first field is template type that is instantiated on upcxx::global_ptr<FragElem const>, but the template type itself is "missing" in the output. I'd guess that's more likely the actual cause of the segfault - perhaps this is a non-trivial libstdc++ container and the gdb version doesn't correctly grok your glibc binary version?

  2. Rob Egan reporter

    Coming back to this and sorry for the long delay..

    uutig_frag is just a simple struct FragElem that includes a global_ptr to another FragElem, so maybe that recursion is the cause.

    struct FragElem {
      global_ptr<FragElem> left_gptr, right_gptr;
      bool left_is_rc, right_is_rc;
      global_ptr<char> frag_seq;
      unsigned frag_len;
      int64_t sum_depths;
      bool visited;
    
      FragElem()
          : left_gptr(nullptr)
          , right_gptr(nullptr)
          , left_is_rc(false)
          , right_is_rc(false)
          , frag_seq(nullptr)
          , frag_len(0)
          , sum_depths(0)
          , visited(false) {}
    };
    

    … I’ll see if adding a forward declaration fixes the gdb crash, but I think there may be more going on.

    I am able to confirm that my ubuntu 18.04.05 version of gdb 8.2 crashes when reading cores or attaching to a running process

    gdb --version
    GNU gdb (Ubuntu 8.2-0ubuntu1~18.04) 8.2
    
    
    Reading symbols from /home/regan/workspace/mhm2/build-debug/install/bin/mhm2.../build/gdb-nKO2sj/gdb-8.2/gdb/cp-support.c:1581: demangler-warning: unable to demangle '_ZN5upcxx6detail28apply_futured_as_future_helpIONS0_19future_composite_fnINS2_INS0_25bound_function_applicatorINS_12global_fnptrIFvRNS_11dist_objectISt8functionIFvN7KmerDHTILi160EE10KmerAndExtERNS5_IN3ska15bytell_hash_mapI4KmerILi160EE10KmerCounts8KmerHashILi160EE9KmerEqualILi160EESaISt4pairISD_SE_EEEEEERNS5_I11BloomFilterEEEEEENS_4viewIS9_NS_22deserializing_iteratorIS9_EEEERNS5_IN11upcxx_utils12FASRPCCountsEEEimSO_SR_EEEJSV_SZ_S13_imSO_SR_EEENS0_19rpc_recipient_afterIPNS0_11lpc_dormantIJEEEEEEENS0_7commandIJPNS0_8lpc_baseEEE13after_executeINS0_27deserialized_bound_functionIRKZNS0_3rpcINS_11completionsIJNS_9future_cxINS_18operation_cx_eventELNS_14progress_levelE1EEEEEERS14_JSV_RSZ_S13_OiRmSO_SR_EEENS0_10rpc_returnIFT0_DpT1_ENSt5decayIT_E4typeENS0_18rpc_remote_resultsIS1Y_vE4typeEE4typeERKNS_4teamEiOS1V_DpOS1W_OS20_iEUlONS1I_IS15_JSV_RKSZ_S13_RKiRKmSO_SR_EEEE0_JRKNS_14bound_functionIS15_JSV_S2G_S13_S2I_S2K_SO_SR_EEEEEELb1EXadL_ZNS_7backend6gasnet10rpc_as_lpc7cleanupILb0ELb0EEEvS1F_EEEEEEONS_7future1INS0_20future_kind_when_allIJNS32_INS0_18future_kind_resultEJOS15_EEENS32_INS0_17future_kind_shrefINS0_25future_header_ops_generalELb0EEEJSV_EEENS32_IS34_JOSZ_EEENS32_IS39_JS13_EEENS32_IS34_JS1S_EEENS32_IS34_JOmEEENS32_IS39_JSO_EEENS32_IS39_JSR_EEEEEEJS35_SV_S3B_S13_S1S_S3F_SO_SR_EEES3K_EclES31_ONS0_17future_dependencyIS3K_EE' (demangler failed with signal 11)
    
    ..
    
    A problem internal to GDB has been detected,further debugging may prove unreliable.
    

    … but gdb 10 does not, so this definitely looks like a gdb bug.

    So, is there a way that I can configure my install to get the gasnet backtrace to support to use my custom install of gdb instead of /usr/bin/gdb? And additionally, is there a way to get a backtrace of all threads instead of just the first thread (i.e. gdb command ‘thread apply all bt’)

    Incidentally, even gdb 10.1 cannot read a global_ptr field. Here I attached to running processes during a segfault crash (so clearly something is wrong with my code here), but in the rget function that is being waited on, the global_ptr (ctg_loc.seq_gptr) cannot be resolved by gdb with an error reading the variable: missing ELF symbol

    #12 0x0000559f02f18fad in KmerCtgDHT<32>::compute_alns_for_read (this=0x7ffe9f334a60, aligned_ctgs_map=0x559f07d9ee10, rname=..., rseq_fw=..., read_group_id=0, aln_kernel_timer=...)
        at /home/regan/workspace/mhm2/src/klign.cpp:816
    816               rget(ctg_loc.seq_gptr + get_start, ctg_str.data() + get_start, get_len).wait();
    (gdb) p ctg_loc
    $1 = {cid = 1532400, seq_gptr = {<upcxx::global_ptr<char const, (upcxx::memory_kind)1>> = {
          static kind = <error reading variable: Missing ELF symbol "_ZN5upcxx10global_ptrIKcLNS_11memory_kindE1EE4kindE".>, device_ = -1, rank_ = 17,
          raw_ptr_ = 0x7f51934ec4e0 <incomplete sequence \336>}, <No data fields>}, clen = 23, depth = 2, pos_in_ctg = 0, is_rc = true}
    

  3. Dan Bonachea

    @Rob Egan :

    So, is there a way that I can configure my install to get the gasnet backtrace to support to use my custom install of gdb instead of /usr/bin/gdb?

    If you set envvar GDB_PATH=/path/to/your/gdb before configure that should encode the gdb you want.

    And additionally, is there a way to get a backtrace of all threads instead of just the first thread (i.e. gdb command ‘thread apply all bt’)

    This should already be the default if you build with upcxx -threadmode=par, which links in the threaded version of GASNet. You can confirm this with the following command: upcxx-run -i a.out | grep Mode

    the global_ptr (ctg_loc.seq_gptr) cannot be resolved by gdb with an error reading the variable: missing ELF symbol

    <upcxx::global_ptr<char const, (upcxx::memory_kind)1>> = {
          static kind = <error reading variable: Missing ELF symbol "_ZN5upcxx10global_ptrIKcLNS_11memory_kindE1EE4kindE".>, 
    device_ = -1, rank_ = 17, raw_ptr_ = 0x7f51934ec4e0 <incomplete sequence \336>}
    

    Disclaimer: the representation of global_ptr is unspecified and subject to implementation details that may change without notice or even differ between builds, so peering at the internal private fields with the debugger is not officially supported.

    That being said, this is a global_ptr to host memory (memory_kind 1). The relevant fields of the global_ptr are the (private) rank and raw_ptr fields that appear in the output, the others are irrelevant to a host pointer.

    The member variable gdb is complaining about is global_ptr <T, Kind >::kind which is a static constexpr field that is probably not referenced in your program. Even if it was, the C++ compiler/linker is justified in eliding the symbol for constexpr variables. So I'd also consider this a gdb bug that it's complaining about a missing symbol for a variable that has been optimized away.

  4. Dan Bonachea

    Clearing milestone for external issue.

    @Rob Egan : I suspect there's nothing more to be done on our side regarding this issue. If you agree, please mark it resolved.

  5. Log in to comment