abort() while holding odb or event buffer lock

Issue #324 new
dd1 created an issue

abort() while holding odb or event buffer lock will cause other midas programs to crash.

if core dump is enabled, taking a dump may take very long time (minutes). during this time we are holding the ODB (or the event buffer) semaphore and other midas programs will timeout waiting for this lock. Some programs are killed by ODB semaphore timeout. odbedit is killed by SYSMSG semaphore timeout.

if core dump is disabled, abort() finishes quickly and there is no problem.

to make things interesting, abort() called with semaphore locked (bm_validate_client_index() & co) does not produce interesting core dumps - all it tells us is that after being stuck somewhere for a long time, we finally try to do something (access ODB or event buffer) and discover that we have been kicked out from everywhere by ODB or event buffer timeout. These stack traces are always very boring.

we could disable the core dump before calling abort(). but then programs will “disappear” (now they make a core dump). (they may or may not be able to write something to midas.log).

we could try to unlock the semaphore before calling abort(), but this will not work if we hold two semaphores (both ODB and event buffer semaphores - we can unlock only one of them…).

K.O.

Comments (0)

  1. Log in to comment