gdb segfault when printing a global_ptr during interactive debugging

Issue #443 resolved

Rob Egan created an issue 2021-01-23

I cannot figure out if this is a known issue or just an issue with my version of gdb or gcc, but whenever I attempt to debug code with gdb, it works fine up until it tries to expand a variable with a upcxx::global_ptr, and then gdb core dumps.

Here I am debugging an active process:

(gdb) up
#3  0x00005555557ca282 in ska::detailv8::sherwood_v8_table<std::pair<Kmer<32>, KmerCounts>, Kmer<32>, KmerHash<32>, ska::detailv3::KeyOrValueHasher<Kmer<32>, std::pair<Kmer<32>, KmerCounts>, KmerHash<32> >, KmerEqual<32>, ska::detailv3::KeyOrValueEquality<Kmer<32>, std::pair<Kmer<32>, KmerCounts>, KmerEqual<32> >, std::allocator<std::pair<Kmer<32>, KmerCounts> >, std::allocator<unsigned char>, (unsigned char)8>::emplace<std::pair<Kmer<32>, KmerCounts>>(std::pair<Kmer<32>, KmerCounts>&&) (this=this@entry=0x7fffffffa8c0, key=...)
    at /var/lib/gitlab-runner/builds/x6sRe-zd/0/robegan21/mhmxx/include/bytell_hash_map.hpp:476
476                     return emplace_direct_hit({ index, block }, std::forward<Key>(key), std::forward<Args>(args)...);
(gdb) p index
$9 = 1328859
(gdb) p block
$10 = (ska::detailv8::sherwood_v8_table<std::pair<Kmer<32>, KmerCounts>, Kmer<32>, KmerHash<32>, ska::detailv3::KeyOrValueHasher<Kmer<32>, std::pair<Kmer<32>, KmerCounts>, KmerHash<32> >, KmerEqual<32>, ska::detailv3::KeyOrValueEquality<Kmer<32>, std::pair<Kmer<32>, KmerCounts>, KmerEqual<32> >, std::allocator<std::pair<Kmer<32>, KmerCounts> >, std::allocator<unsigned char>, 8>::BlockPointer) 0x7ffc9b616f68
(gdb) p key
$11 = (std::pair<Kmer<32>, KmerCounts> &&) @0x7ffcb8255888: {first = {static k = 21, static N_LONGS = 1, longs = {_M_elems = {11390612276768669696}}}, second = {left_exts = {count_A = 0, count_C = 0,
      count_G = 0, count_T = 4}, right_exts = {count_A = 0, count_C = 0, count_G = 4, count_T = 0}, uutig_frag = {<upcxx::global_ptr<FragElem const, (upcxx::memory_kind)1>> = {
Segmentation fault

Is there something I can do with gdb to prevent the crash? or do I need to install a newer version?

gdb --version
GNU gdb (Ubuntu 8.2-0ubuntu1~16.04.1) 8.2
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

‌

g++ (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

‌

regan@hulk:/work/gitlab-ci/scratch/mhm2-6cc70abe-RefactorReadNames-$ /work/gitlab-ci/ci-install-upcxx-2020.3.8/bin/upcxx --version
UPC++ version 2020.3.8 / gex-2020.3.8
Copyright (c) 2020, The Regents of the University of California,
through Lawrence Berkeley National Laboratory.
https://upcxx.lbl.gov

Comments (5)

Dan Bonachea

changed title to gdb segfault when printing a global_ptr during interactive debugging
changed component to External

Hi @Rob Egan -

I've never seen this particular issue before, and I wouldn't have the foggiest notion how to debug a gdb crash. This definitely seems like an "external" bug (in gdb or possibly the system configuration).

However, I can report success with at least the following setup on our dirac cluster, using the smp/debug backend and test/global_ptr.cpp:

(gdb) print ptr
$2 = {<upcxx::global_ptr<int const, (upcxx::memory_kind)1>> = {rank_ = 0, 
    raw_ptr_ = 0x7fffed5f23c8}, <No data fields>}
(gdb) print cptr
$3 = {rank_ = 0, raw_ptr_ = 0x0}
(gdb) quit
{pcp-d-5} upcxx --version
UPC++ version 2020.11.5 upcxx-2020.11.3-memory_kinds-34-gee1c768 / gex-stable-2021_01_08
Copyright (c) 2021, The Regents of the University of California,
through Lawrence Berkeley National Laboratory.
https://upcxx.lbl.gov

g++ (GCC) 10.2.0
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

{pcp-d-5} gdb --version
GNU gdb (GDB) 8.0.1
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".

It works fine for me on the same platform with gdb 8.0.1 and all of:

upcxx mk-develop and gcc 10.2.0
upcxx 2020.3.8 and gcc 10.2.0
upcxx 2020.3.8 and gcc 6.4.0

Examining your failing output, this line looks "fishy" to me:

uutig_frag = {<upcxx::global_ptr<FragElem const, (upcxx::memory_kind)1>> = {
Segmentation fault

What is the actual declared type of the field in uutig_frag?

The output syntax here seems to indicate uutig_frag is an aggregate where the first field is template type that is instantiated on upcxx::global_ptr<FragElem const>, but the template type itself is "missing" in the output. I'd guess that's more likely the actual cause of the segfault - perhaps this is a non-trivial libstdc++ container and the gdb version doesn't correctly grok your glibc binary version?

2021-02-03T05:01:44+00:00

Rob Egan reporter

Coming back to this and sorry for the long delay..

uutig_frag is just a simple struct FragElem that includes a global_ptr to another FragElem, so maybe that recursion is the cause.

‌

struct FragElem {
  global_ptr<FragElem> left_gptr, right_gptr;
  bool left_is_rc, right_is_rc;
  global_ptr<char> frag_seq;
  unsigned frag_len;
  int64_t sum_depths;
  bool visited;

  FragElem()
      : left_gptr(nullptr)
      , right_gptr(nullptr)
      , left_is_rc(false)
      , right_is_rc(false)
      , frag_seq(nullptr)
      , frag_len(0)
      , sum_depths(0)
      , visited(false) {}
};

… I’ll see if adding a forward declaration fixes the gdb crash, but I think there may be more going on.

‌

I am able to confirm that my ubuntu 18.04.05 version of gdb 8.2 crashes when reading cores or attaching to a running process

gdb --version
GNU gdb (Ubuntu 8.2-0ubuntu1~18.04) 8.2


Reading symbols from /home/regan/workspace/mhm2/build-debug/install/bin/mhm2.../build/gdb-nKO2sj/gdb-8.2/gdb/cp-support.c:1581: demangler-warning: unable to demangle '_ZN5upcxx6detail28apply_futured_as_future_helpIONS0_19future_composite_fnINS2_INS0_25bound_function_applicatorINS_12global_fnptrIFvRNS_11dist_objectISt8functionIFvN7KmerDHTILi160EE10KmerAndExtERNS5_IN3ska15bytell_hash_mapI4KmerILi160EE10KmerCounts8KmerHashILi160EE9KmerEqualILi160EESaISt4pairISD_SE_EEEEEERNS5_I11BloomFilterEEEEEENS_4viewIS9_NS_22deserializing_iteratorIS9_EEEERNS5_IN11upcxx_utils12FASRPCCountsEEEimSO_SR_EEEJSV_SZ_S13_imSO_SR_EEENS0_19rpc_recipient_afterIPNS0_11lpc_dormantIJEEEEEEENS0_7commandIJPNS0_8lpc_baseEEE13after_executeINS0_27deserialized_bound_functionIRKZNS0_3rpcINS_11completionsIJNS_9future_cxINS_18operation_cx_eventELNS_14progress_levelE1EEEEEERS14_JSV_RSZ_S13_OiRmSO_SR_EEENS0_10rpc_returnIFT0_DpT1_ENSt5decayIT_E4typeENS0_18rpc_remote_resultsIS1Y_vE4typeEE4typeERKNS_4teamEiOS1V_DpOS1W_OS20_iEUlONS1I_IS15_JSV_RKSZ_S13_RKiRKmSO_SR_EEEE0_JRKNS_14bound_functionIS15_JSV_S2G_S13_S2I_S2K_SO_SR_EEEEEELb1EXadL_ZNS_7backend6gasnet10rpc_as_lpc7cleanupILb0ELb0EEEvS1F_EEEEEEONS_7future1INS0_20future_kind_when_allIJNS32_INS0_18future_kind_resultEJOS15_EEENS32_INS0_17future_kind_shrefINS0_25future_header_ops_generalELb0EEEJSV_EEENS32_IS34_JOSZ_EEENS32_IS39_JS13_EEENS32_IS34_JS1S_EEENS32_IS34_JOmEEENS32_IS39_JSO_EEENS32_IS39_JSR_EEEEEEJS35_SV_S3B_S13_S1S_S3F_SO_SR_EEES3K_EclES31_ONS0_17future_dependencyIS3K_EE' (demangler failed with signal 11)

..

A problem internal to GDB has been detected,further debugging may prove unreliable.

‌

… but gdb 10 does not, so this definitely looks like a gdb bug.

‌

So, is there a way that I can configure my install to get the gasnet backtrace to support to use my custom install of gdb instead of /usr/bin/gdb? And additionally, is there a way to get a backtrace of all threads instead of just the first thread (i.e. gdb command ‘thread apply all bt’)

‌

Incidentally, even gdb 10.1 cannot read a global_ptr field. Here I attached to running processes during a segfault crash (so clearly something is wrong with my code here), but in the rget function that is being waited on, the global_ptr (ctg_loc.seq_gptr) cannot be resolved by gdb with an error reading the variable: missing ELF symbol

#12 0x0000559f02f18fad in KmerCtgDHT<32>::compute_alns_for_read (this=0x7ffe9f334a60, aligned_ctgs_map=0x559f07d9ee10, rname=..., rseq_fw=..., read_group_id=0, aln_kernel_timer=...)
    at /home/regan/workspace/mhm2/src/klign.cpp:816
816               rget(ctg_loc.seq_gptr + get_start, ctg_str.data() + get_start, get_len).wait();
(gdb) p ctg_loc
$1 = {cid = 1532400, seq_gptr = {<upcxx::global_ptr<char const, (upcxx::memory_kind)1>> = {
      static kind = <error reading variable: Missing ELF symbol "_ZN5upcxx10global_ptrIKcLNS_11memory_kindE1EE4kindE".>, device_ = -1, rank_ = 17,
      raw_ptr_ = 0x7f51934ec4e0 <incomplete sequence \336>}, <No data fields>}, clen = 23, depth = 2, pos_in_ctg = 0, is_rc = true}

‌

2021-03-17T17:22:54+00:00

Dan Bonachea
@Rob Egan :

So, is there a way that I can configure my install to get the gasnet backtrace to support to use my custom install of gdb instead of /usr/bin/gdb?

If you set envvar GDB_PATH=/path/to/your/gdb before configure that should encode the gdb you want.

And additionally, is there a way to get a backtrace of all threads instead of just the first thread (i.e. gdb command ‘thread apply all bt’)

This should already be the default if you build with upcxx -threadmode=par, which links in the threaded version of GASNet. You can confirm this with the following command: upcxx-run -i a.out | grep Mode

the global_ptr (ctg_loc.seq_gptr) cannot be resolved by gdb with an error reading the variable: missing ELF symbol
```
<upcxx::global_ptr<char const, (upcxx::memory_kind)1>> = {
      static kind = <error reading variable: Missing ELF symbol "_ZN5upcxx10global_ptrIKcLNS_11memory_kindE1EE4kindE".>, 
device_ = -1, rank_ = 17, raw_ptr_ = 0x7f51934ec4e0 <incomplete sequence \336>}
```
Disclaimer: the representation of global_ptr is unspecified and subject to implementation details that may change without notice or even differ between builds, so peering at the internal private fields with the debugger is not officially supported.

That being said, this is a global_ptr to host memory (memory_kind 1). The relevant fields of the global_ptr are the (private) rank and raw_ptr fields that appear in the output, the others are irrelevant to a host pointer.

The member variable gdb is complaining about is global_ptr <T, Kind >::kind which is a static constexpr field that is probably not referenced in your program. Even if it was, the C++ compiler/linker is justified in eliding the symbol for constexpr variables. So I'd also consider this a gdb bug that it's complaining about a missing symbol for a variable that has been optimized away.
- 2021-03-17T19:36:43+00:00
Dan Bonachea
- removed milestone
- assigned issue to
  
  Rob Egan
Clearing milestone for external issue.

@Rob Egan : I suspect there's nothing more to be done on our side regarding this issue. If you agree, please mark it resolved.
- 2021-04-03T02:57:50+00:00
Rob Egan reporter
- changed status to resolved
building and using a more recent version of gdb fixed this issue for me
- 2021-04-08T23:08:37+00:00
Log in to comment

Assignee: Rob Egan

Type: bug

Priority: minor

Status: resolved

Component: External

Milestone: –

Version: 2020.3.8 snapshot

Votes: 0

Watchers: 1