watchdog timeout during RPC calls

Issue #207 resolved
dd1 created an issue

If the watchdog timeout is shorter than the RPC connect timeout or the RPC reply timeout, programs that make RPC calls may be killed by the watchdog timeout. When it happens symptoms are confusing: other programs report “program xxx removed from ODB, etc”, the program itself complains about ODB and event buffer PID mismatch, etc.

This would usually happen during run transitions, the program that gets killed would be one of the single-threaded programs that start or stop runs (msequencer, mlogger, mtransition). Some programs (mhttpd, odbedit) use the watchdog thread and do not suffer from this problem.

Since rpc_client_connect(), rpc_call() and rpc_client_call() can wait for RPC reply for very long time, they must ensure that the watchdog timeout does not expire.

K.O.

Comments (2)

  1. dd1 reporter

    I wanted to fix it by calling cm_periodic_tasks() while waiting for the RPC reply. But cm_periodic_tasks() does too much - i.e. it checks for alarms, which can cause a run STOP transition, leading to more RPC calls, leading to a recursion.

    Instead, I doctor the watchdog timeout to be longer than the RPC timeout.

    K.O.

  2. Log in to comment