- removed comment
Correct backtrace generation in Carpet
The file backtrace.cc in CarpetLib does not #include <cctk.h>; hence all HAVE_BACKTRACE* macros are undefined, and only basic backtraces are generated.
Correcting this is non-trivial, since the backtrace code is arcane, is written in C, probably expects glibc, contains (I'm fairly certain) memory allocation errors, and doesn't build e.g. on Mac OSX. The code also spends an inordinate amount of time allocating and freeing string buffers, which should be replaced by simply using C++ streams.
The backtrace code also probably requires a few more autoconf tests, so that it can be disabled where it does not work.
Keyword:
Comments (7)
-
-
- removed comment
Replying to [comment:1 hinder]:
Given that backtraces are very useful, and used to work on "standard" linux systems, maybe the code could be enabled despite the problems that you mention. The memory allocation errors and buffers shouldn't be a problem since this only happens when the process is about to terminate due to the error anyway, right? Can we detect that we have glibc and Linux, and enable the backtrace code in that case? I know it's not as elegant as correctly autoconfing everything, but it's probably a lot easier. And having a backtrace with meaningful symbols is extremely useful.
I second that :-)
-
- removed comment
ping.
-
The current code in Carpet produces backtraces (but no line numbers) on Linux but not for OSX:
Backtrace from rank 0 pid 33066: 1. CarpetLib::signal_handler(int) [/data/rhaas/postdoc/gr/cactus/ET_trunk/exe/cactus_sim(_ZN9CarpetLib14signal_handlerEi+0xe7) + [0x563a32128657]] 2. /lib/x86_64-linux-gnu/libc.so.6(+0x3a100) [0x7fea0e0df100] 3. /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x141) [0x7fea0e0df081] 4. /lib/x86_64-linux-gnu/libc.so.6(abort+0x121) [0x7fea0e0ca535] 5. /lib/x86_64-linux-gnu/libc.so.6(+0x2540f) [0x7fea0e0ca40f] 6. /lib/x86_64-linux-gnu/libc.so.6(+0x32b92) [0x7fea0e0d7b92] 7. /data/rhaas/postdoc/gr/cactus/ET_trunk/exe/cactus_sim(+0xca22a3) [0x563a320582a3] 8. /data/rhaas/postdoc/gr/cactus/ET_trunk/exe/cactus_sim(CCTKi_ScheduleGHInit+0x4a) [0x563a340527ba] 9. Carpet::Initialise(tFleshConfig*) [/data/rhaas/postdoc/gr/cactus/ET_trunk/exe/cactus_sim(_ZN6Carpet10InitialiseEP12tFleshConfig+0x2cc) + [0x563a3201f6dc]] 10. /data/rhaas/postdoc/gr/cactus/ET_trunk/exe/cactus_sim(main+0x35) [0x563a31c585f5] 11. /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb) [0x7fea0e0cbbbb] 12. /data/rhaas/postdoc/gr/cactus/ET_trunk/exe/cactus_sim(_start+0x2a) [0x563a31c5e53a] The hexadecimal addresses in this backtrace can also be interpreted with a debugger (e.g. gdb), or with the 'addr2line' (or 'gaddr2line') command line tool: 'addr2line -e cactus_sim <address>'.
and
Backtrace from rank 0 pid 76396: The hexadecimal addresses in this backtrace can also be interpreted with a debugger (e.g. gdb), or with the 'addr2line' (or 'gaddr2line') command line tool: 'addr2line -e cactus_sim <address>'.
on OSX all of
HAVE_BACKTRACE
,HAVE_DLADDR
,HAVE___CXA_DEMANGLE
, andHAVE_BACKTRACE_SYMBOLS
were set incctk_Config.h
so the empty stack trace basically is a failure ofbacktrace()
to return anything useful. It s not failing, and indeed letting it output all stack frames shows that it does produce a backtrace of the backtrace function call itself and one more level up.This may be an optimization issue. Compiling the simple backtrace example in https://stackoverflow.com/questions/77005/how-to-automatically-generate-a-stacktrace-when-my-program-crashes with
-O0
shows the backtrace but-O3
makes it 2 levels deep (same as Cactus). So it may just be an issue that gcc messes up the call stack on high enough optimization settings (same as icc does).Building just
backtrace.cc
with-O0
lets me get a backtrace.This pull request implements this and also removes dead code from
backtrace.cc
(that was inside of#ifdef HAVE_BACKTRACE
but before#include <cctk.h>
): https://bitbucket.org/eschnett/carpet/pull-requests/31/rhaas-deadbacktrace/diff -
- edited description
- changed status to open
@Erik Schnetter @Ian Hinder Please review.
-
Unless objected I will push this change after 2019-12-16
-
- changed status to resolved
- Log in to comment
Given that backtraces are very useful, and used to work on "standard" linux systems, maybe the code could be enabled despite the problems that you mention. The memory allocation errors and buffers shouldn't be a problem since this only happens when the process is about to terminate due to the error anyway, right? Can we detect that we have glibc and Linux, and enable the backtrace code in that case? I know it's not as elegant as correctly autoconfing everything, but it's probably a lot easier. And having a backtrace with meaningful symbols is extremely useful.