frontend crash - buffer read pointer did not move while waiting for bytes
feudp crashed from an oldish problem that is not logged in this bug tracker. There is a long standing bug in the event buffer code. Under an unknown condition, the writer to the event buffer would go into an infinite loop. Years ago I replaced the infinite loop with a crash and a following message. Never figured out the root cause of the problem.
(messages are in the reverse order: last message is shown first)
00:16:34.904 2017/11/15 [feudp,INFO] Program feudp on host alphagdaq stopped
00:16:34.904 2017/11/15 [feudp,ERROR] [mfe.c:1597:receive_trigger_event,ERROR] rpc_send_event error 203
00:16:34.903 2017/11/15 [feudp,ERROR] [midas.c:7310:bm_wait_for_free_space,ERROR] BUG: read pointer did not move while waiting for 98764 bytes, bytes available: 4304, buffer size: 500000000
00:16:34.903 2017/11/15 [feudp,INFO] Corrected read pointer for client 'feudp' on buffer 'BUFUDP' from 74511528 to 74515752
K.O.
Comments (8)
-
reporter -
reporter happened again. not clear why this problem resurfaced again now, in the agdaq system... K.O.
-
reporter looked at the core dumps, replaced failing asserts with cm_msg() and error return. most likely this creates an infinite loop of error messages. will see. K.O.
-
reporter -
assigned issue to
-
assigned issue to
-
reporter saw the crash again. one of the asserts replaced by error message is now an infinite loop. the error messages show crazy event data length - looks like either bad events go into the buffer or become corrupted in the buffer. Need to write a consistency checked for event buffer content. K.O.
-
reporter the error message in bm_push_event() produces an infinite loop as it is called from cm_yield() and the error status about corrupted event buffer is not propagated there. Replaced this infinite loop with a crash, for now. K.O.
-
reporter event buffer code rewritten on branch feature/bm_send and merged into develop. K.O.
-
reporter - changed status to resolved
this bug no longer exists in this form - the event buffer code was rewritten. K.O.
- Log in to comment
this has happened several times again. K.O.