mlogger broken by commit 199d391

Issue #229 resolved
Thomas Lindner created an issue

Running mlogger with the HEAD version of MIDAS (Centos-7) and a fresh ODB I get an immediate seg-fault when I run mlogger.

The specific error message is

bash-4.2$ mlogger -v

hs_get_history: see channel hkey 145680, name 'MIDAS', active 1, type [MIDAS], debug 0
[Logger,INFO] Connected history channel 'MIDAS' type MIDAS history
[Logger,INFO] Writing history to channel 'MIDAS' type 'MIDAS'
hs_get_history: see channel hkey 145984, name 'ODBC', active 0, type [ODBC], debug 0
hs_get_history: see channel hkey 146288, name 'SQLITE', active 0, type [SQLITE], debug 0
hs_get_history: see channel hkey 146592, name 'MYSQL', active 0, type [MYSQL], debug 0
hs_get_history: see channel hkey 146896, name 'FILE', active 0, type [FILE], debug 0
[Logger,INFO] Cannot find /Equipment entry in database, history system is inactive
Log     directory is /home/mpmtdaq/online/
Data    directory is same as Log unless specified in /Logger/channels/
History directory is same as Log unless specified in /Logger/history/
ELog    directory is same as Log
SQL     database is localhost/mpmtdaq/Runlog
MIDAS logger started. Stop with "!"
Segmentation fault

GDB says:

(gdb) back
#0  0x00007ffff54d6ecc in free () from /lib64/libc.so.6
#1  0x000000000048b35d in image_thread (name=...) at /home/mpmtdaq/packages/midas/src/history_image.cxx:69
#2  0x000000000048dca8 in _M_invoke<0ul> (this=<optimized out>) at /usr/include/c++/4.8.2/functional:1732
#3  operator() (this=<optimized out>) at /usr/include/c++/4.8.2/functional:1720
#4  std::thread::_Impl<std::_Bind_simple<void (*(std::string))(std::string)> >::_M_run() (this=<optimized out>)
    at /usr/include/c++/4.8.2/thread:115
#5  0x00007ffff5dec070 in ?? () from /lib64/libstdc++.so.6
#6  0x00007ffff7746e65 in start_thread () from /lib64/libpthread.so.0
#7  0x00007ffff554f88d in clone () from /lib64/libc.so.6

I stepped back through MIDAS history and found that the error was probably introduced in commit 199d391

Indeed I find I can ‘fix’ this error by making the following change to history_image.cxx

do {
std::this_thread::sleep_for(std::chrono::seconds(1));
continue; /// <<<< this line
// check for old files

I assume error is triggered by me missing some ODB variable, but I didn’t debug fruther…

Comments (7)

  1. Stefan Ritt

    I can’t reproduce the crash, so I need your help in debugging.

    In history_image.cxx:195, I create the ODB entry /History/Images/Demo if it does not exist. In line 214 I create all sub-elements “Name”, “Enabled” etc. The the image_thread is stated, which accesses “/History/Images/Demo/Storage hours” in line history_image:78. I believe there your crash happens. Can you check the ODB contents after the crash? Is the entry “/History/Images/Demo/Storage hours” really there? Also debugging without optimization (-O0) would be helpful.

    Stefan

  2. Thomas Lindner reporter

    Hi Stefan,

    I attach a file that shows the error message and state of the ODB after starting from a completely fresh ODB.

    The file also show the gdb stack trace with optimization turned off.

  3. Stefan Ritt

    @KO: Sorry of not having informed you. I’m working on the “image history”, where you access any webcam and put the pictures in a system similar to our history system, so you can go to the history page and scroll pack to the past to see old images. You can also make a time-lapse movie. To grab the imaged, I use the CURL library (I did talk to you about that!) and you recommended me to put the http get into a separate thread, not to block the logger. So we have now a very simple thread for each camera which just does the http get and writes it to a file, nothing else. So the main logging is completely unaffected.

  4. Stefan Ritt

    @Thomas: Thanks for posting the error log, that really helped. Funny enough the bug is not in the new code, but in ss_file_find() which is many years old. If the directory does not exist, the “flist” array does not get allocated in ss_file_find, and the following free() in the user code crashes on some uninitialized pointer. So now I always allocate some memory. Please update to the latest commit and try again.

  5. Log in to comment