- changed title to gdb segfault when printing a global_ptr during interactive debugging
- changed component to External
gdb segfault when printing a global_ptr during interactive debugging
I cannot figure out if this is a known issue or just an issue with my version of gdb or gcc, but whenever I attempt to debug code with gdb, it works fine up until it tries to expand a variable with a upcxx::global_ptr, and then gdb core dumps.
Here I am debugging an active process:
(gdb) up
#3 0x00005555557ca282 in ska::detailv8::sherwood_v8_table<std::pair<Kmer<32>, KmerCounts>, Kmer<32>, KmerHash<32>, ska::detailv3::KeyOrValueHasher<Kmer<32>, std::pair<Kmer<32>, KmerCounts>, KmerHash<32> >, KmerEqual<32>, ska::detailv3::KeyOrValueEquality<Kmer<32>, std::pair<Kmer<32>, KmerCounts>, KmerEqual<32> >, std::allocator<std::pair<Kmer<32>, KmerCounts> >, std::allocator<unsigned char>, (unsigned char)8>::emplace<std::pair<Kmer<32>, KmerCounts>>(std::pair<Kmer<32>, KmerCounts>&&) (this=this@entry=0x7fffffffa8c0, key=...)
at /var/lib/gitlab-runner/builds/x6sRe-zd/0/robegan21/mhmxx/include/bytell_hash_map.hpp:476
476 return emplace_direct_hit({ index, block }, std::forward<Key>(key), std::forward<Args>(args)...);
(gdb) p index
$9 = 1328859
(gdb) p block
$10 = (ska::detailv8::sherwood_v8_table<std::pair<Kmer<32>, KmerCounts>, Kmer<32>, KmerHash<32>, ska::detailv3::KeyOrValueHasher<Kmer<32>, std::pair<Kmer<32>, KmerCounts>, KmerHash<32> >, KmerEqual<32>, ska::detailv3::KeyOrValueEquality<Kmer<32>, std::pair<Kmer<32>, KmerCounts>, KmerEqual<32> >, std::allocator<std::pair<Kmer<32>, KmerCounts> >, std::allocator<unsigned char>, 8>::BlockPointer) 0x7ffc9b616f68
(gdb) p key
$11 = (std::pair<Kmer<32>, KmerCounts> &&) @0x7ffcb8255888: {first = {static k = 21, static N_LONGS = 1, longs = {_M_elems = {11390612276768669696}}}, second = {left_exts = {count_A = 0, count_C = 0,
count_G = 0, count_T = 4}, right_exts = {count_A = 0, count_C = 0, count_G = 4, count_T = 0}, uutig_frag = {<upcxx::global_ptr<FragElem const, (upcxx::memory_kind)1>> = {
Segmentation fault
Is there something I can do with gdb to prevent the crash? or do I need to install a newer version?
gdb --version
GNU gdb (Ubuntu 8.2-0ubuntu1~16.04.1) 8.2
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
g++ (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
regan@hulk:/work/gitlab-ci/scratch/mhm2-6cc70abe-RefactorReadNames-$ /work/gitlab-ci/ci-install-upcxx-2020.3.8/bin/upcxx --version
UPC++ version 2020.3.8 / gex-2020.3.8
Copyright (c) 2020, The Regents of the University of California,
through Lawrence Berkeley National Laboratory.
https://upcxx.lbl.gov
g++ (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Comments (5)
-
-
reporter Coming back to this and sorry for the long delay..
uutig_frag is just a simple struct FragElem that includes a global_ptr to another FragElem, so maybe that recursion is the cause.
struct FragElem { global_ptr<FragElem> left_gptr, right_gptr; bool left_is_rc, right_is_rc; global_ptr<char> frag_seq; unsigned frag_len; int64_t sum_depths; bool visited; FragElem() : left_gptr(nullptr) , right_gptr(nullptr) , left_is_rc(false) , right_is_rc(false) , frag_seq(nullptr) , frag_len(0) , sum_depths(0) , visited(false) {} };
… I’ll see if adding a forward declaration fixes the gdb crash, but I think there may be more going on.
I am able to confirm that my ubuntu 18.04.05 version of gdb 8.2 crashes when reading cores or attaching to a running process
gdb --version GNU gdb (Ubuntu 8.2-0ubuntu1~18.04) 8.2 Reading symbols from /home/regan/workspace/mhm2/build-debug/install/bin/mhm2.../build/gdb-nKO2sj/gdb-8.2/gdb/cp-support.c:1581: demangler-warning: unable to demangle '_ZN5upcxx6detail28apply_futured_as_future_helpIONS0_19future_composite_fnINS2_INS0_25bound_function_applicatorINS_12global_fnptrIFvRNS_11dist_objectISt8functionIFvN7KmerDHTILi160EE10KmerAndExtERNS5_IN3ska15bytell_hash_mapI4KmerILi160EE10KmerCounts8KmerHashILi160EE9KmerEqualILi160EESaISt4pairISD_SE_EEEEEERNS5_I11BloomFilterEEEEEENS_4viewIS9_NS_22deserializing_iteratorIS9_EEEERNS5_IN11upcxx_utils12FASRPCCountsEEEimSO_SR_EEEJSV_SZ_S13_imSO_SR_EEENS0_19rpc_recipient_afterIPNS0_11lpc_dormantIJEEEEEEENS0_7commandIJPNS0_8lpc_baseEEE13after_executeINS0_27deserialized_bound_functionIRKZNS0_3rpcINS_11completionsIJNS_9future_cxINS_18operation_cx_eventELNS_14progress_levelE1EEEEEERS14_JSV_RSZ_S13_OiRmSO_SR_EEENS0_10rpc_returnIFT0_DpT1_ENSt5decayIT_E4typeENS0_18rpc_remote_resultsIS1Y_vE4typeEE4typeERKNS_4teamEiOS1V_DpOS1W_OS20_iEUlONS1I_IS15_JSV_RKSZ_S13_RKiRKmSO_SR_EEEE0_JRKNS_14bound_functionIS15_JSV_S2G_S13_S2I_S2K_SO_SR_EEEEEELb1EXadL_ZNS_7backend6gasnet10rpc_as_lpc7cleanupILb0ELb0EEEvS1F_EEEEEEONS_7future1INS0_20future_kind_when_allIJNS32_INS0_18future_kind_resultEJOS15_EEENS32_INS0_17future_kind_shrefINS0_25future_header_ops_generalELb0EEEJSV_EEENS32_IS34_JOSZ_EEENS32_IS39_JS13_EEENS32_IS34_JS1S_EEENS32_IS34_JOmEEENS32_IS39_JSO_EEENS32_IS39_JSR_EEEEEEJS35_SV_S3B_S13_S1S_S3F_SO_SR_EEES3K_EclES31_ONS0_17future_dependencyIS3K_EE' (demangler failed with signal 11) .. A problem internal to GDB has been detected,further debugging may prove unreliable.
… but gdb 10 does not, so this definitely looks like a gdb bug.
So, is there a way that I can configure my install to get the gasnet backtrace to support to use my custom install of gdb instead of /usr/bin/gdb? And additionally, is there a way to get a backtrace of all threads instead of just the first thread (i.e. gdb command ‘thread apply all bt’)
Incidentally, even gdb 10.1 cannot read a global_ptr field. Here I attached to running processes during a segfault crash (so clearly something is wrong with my code here), but in the rget function that is being waited on, the global_ptr (ctg_loc.seq_gptr) cannot be resolved by gdb with an error reading the variable: missing ELF symbol
#12 0x0000559f02f18fad in KmerCtgDHT<32>::compute_alns_for_read (this=0x7ffe9f334a60, aligned_ctgs_map=0x559f07d9ee10, rname=..., rseq_fw=..., read_group_id=0, aln_kernel_timer=...) at /home/regan/workspace/mhm2/src/klign.cpp:816 816 rget(ctg_loc.seq_gptr + get_start, ctg_str.data() + get_start, get_len).wait(); (gdb) p ctg_loc $1 = {cid = 1532400, seq_gptr = {<upcxx::global_ptr<char const, (upcxx::memory_kind)1>> = { static kind = <error reading variable: Missing ELF symbol "_ZN5upcxx10global_ptrIKcLNS_11memory_kindE1EE4kindE".>, device_ = -1, rank_ = 17, raw_ptr_ = 0x7f51934ec4e0 <incomplete sequence \336>}, <No data fields>}, clen = 23, depth = 2, pos_in_ctg = 0, is_rc = true}
-
@Rob Egan :
So, is there a way that I can configure my install to get the gasnet backtrace to support to use my custom install of gdb instead of /usr/bin/gdb?
If you set envvar
GDB_PATH=/path/to/your/gdb
before configure that should encode the gdb you want.And additionally, is there a way to get a backtrace of all threads instead of just the first thread (i.e. gdb command ‘thread apply all bt’)
This should already be the default if you build with
upcxx -threadmode=par
, which links in the threaded version of GASNet. You can confirm this with the following command:upcxx-run -i a.out | grep Mode
the global_ptr (ctg_loc.seq_gptr) cannot be resolved by gdb with an error reading the variable: missing ELF symbol
<upcxx::global_ptr<char const, (upcxx::memory_kind)1>> = { static kind = <error reading variable: Missing ELF symbol "_ZN5upcxx10global_ptrIKcLNS_11memory_kindE1EE4kindE".>, device_ = -1, rank_ = 17, raw_ptr_ = 0x7f51934ec4e0 <incomplete sequence \336>}
Disclaimer: the representation of global_ptr is unspecified and subject to implementation details that may change without notice or even differ between builds, so peering at the internal private fields with the debugger is not officially supported.
That being said, this is a global_ptr to host memory (memory_kind 1). The relevant fields of the global_ptr are the (private) rank and raw_ptr fields that appear in the output, the others are irrelevant to a host pointer.
The member variable gdb is complaining about is
global_ptr <T, Kind >::kind
which is a static constexpr field that is probably not referenced in your program. Even if it was, the C++ compiler/linker is justified in eliding the symbol for constexpr variables. So I'd also consider this a gdb bug that it's complaining about a missing symbol for a variable that has been optimized away. -
- removed milestone
-
assigned issue to
Clearing milestone for external issue.
@Rob Egan : I suspect there's nothing more to be done on our side regarding this issue. If you agree, please mark it resolved.
-
reporter - changed status to resolved
building and using a more recent version of gdb fixed this issue for me
- Log in to comment
Hi @Rob Egan -
I've never seen this particular issue before, and I wouldn't have the foggiest notion how to debug a gdb crash. This definitely seems like an "external" bug (in gdb or possibly the system configuration).
However, I can report success with at least the following setup on our dirac cluster, using the smp/debug backend and test/global_ptr.cpp:
It works fine for me on the same platform with gdb 8.0.1 and all of:
Examining your failing output, this line looks "fishy" to me:
What is the actual declared type of the field in
uutig_frag
?The output syntax here seems to indicate
uutig_frag
is an aggregate where the first field is template type that is instantiated onupcxx::global_ptr<FragElem const>
, but the template type itself is "missing" in the output. I'd guess that's more likely the actual cause of the segfault - perhaps this is a non-trivial libstdc++ container and the gdb version doesn't correctly grok your glibc binary version?