watchdog timeout during RPC calls

If the watchdog timeout is shorter than the RPC connect timeout or the RPC reply timeout, programs that make RPC calls may be killed by the watchdog timeout. When it happens symptoms are confusing: other programs report “program xxx removed from ODB, etc”, the program itself complains about ODB and event buffer PID mismatch, etc.

This would usually happen during run transitions, the program that gets killed would be one of the single-threaded programs that start or stop runs (msequencer, mlogger, mtransition). Some programs (mhttpd, odbedit) use the watchdog thread and do not suffer from this problem.

Since rpc_client_connect(), rpc_call() and rpc_client_call() can wait for RPC reply for very long time, they must ensure that the watchdog timeout does not expire.

K.O.

Comments (2)