sequencer/mhttpd race condition causes "Stop after current run" button to be temperamental
The Sequencer webpage has a "Stop after current run" button that is very useful. Unfortunately there is a race condition where mhttpd can update "/Sequencer/State/Stop after run", but it is then reset to the original value by the sequencer. (If mhttpd does the update any time between sequencer calling db_get_record
and db_set_record
).
This happens very often, and makes the button either useless, annoying, or misleading, depending on the specific outcome.
The immediate solution is simply to edit the end of sequencer()
in sequencer.cxx
from:
/* get steering parameters, since they might have been changed in between */
SEQUENCER seq1;
size = sizeof(seq1);
db_get_record(hDB, hKeySeq, &seq1, &size, 0);
seq.running = seq1.running;
seq.finished = seq1.finished;
seq.paused = seq1.paused;
/* update current line number */
db_set_record(hDB, hKeySeq, &seq, sizeof(seq), 0);
to include the extra line:
seq.stop_after_run = seq1.stop_after_run;
I've tested the above fix and it fixes the problem.
However, I am sure the same bug can also happen in reverse, where mhttpd reads the state, the sequencer updates ODB with a new line number (for example), then mhttpd updates the paused/stop_after_run flags based on the old state. I would suggest that mhttpd uses db_set_value
instead of db_set_record
to avoid these issues, but perhaps a deeper look is required.
Comments (4)
-
-
"stop after current run" should probably be replaced by a RPC call into msequencer. K.O.
-
I committed Ben's proposal from above. The race window spans now three lines of code, which gives kind of one in a million chance to happen. That's a quick and dirty fix for until I have more time to rework the logic. One possibility would be to split the sequencer parameter set into two sets, once which is only written by the sequencer (such as line number etc.) and one "steering parameter" set which is only written from outside (mhttpd, odb, ...). This way stop_after_run cannot be overwritten by the sequencer.
-
To close this off, need to move following odb entries from “state” to “command”:
- paused
- new file (already in Command)
- stop after run
All other commands are already in “Command”:
- user interface writes to “Command”, msequencer clears “Command”
- msequencer writes to “State”, nobody else writes to “State”.
K.O.
- Log in to comment
the race condition is still there - between db_get_record() and db_set_record() - the race window is smaller than before - which can be bad because now it will happen less often, but when it does happen it will be much harder to reproduce. better solution is to straighten the communications - only one person writes into a data structure. K.O.