- changed status to resolved
Race condition when two SIGHUPs occur back to back
Two SIGHUPs (monit reload) in quick succession can hang the main thread.
A SIGHUP ends up setting the http/engine.c:stopped flag via the main thread, which then waits for the http thread to exit, and then reloads and starts a new http thread.
If another SIGHUP arrives before the new http thread gets to the place where she clears the stopped flag, the main thread again sets the http/engine.c:stopped flag, and waits for the thread to exit. However, as the http thread was still starting up, it would then CLEAR the stopped flag, and never exit, leaving the main thread stuck waiting for her via pthread_join()
If the 2nd SIGHUP comes after the http thread starts, it works fine.
If the 2nd SIGHUP comes before the http thread starts...hang.
This bash command can often show the issue (although, being a race condition, these things are hard to always trigger!)
# for i in $(seq 1 1000); do monit reload; done
Symptoms are main monit process is stuck in thread_join(), yet the http thread is constantly polling away every second.
# strace -fp 13474
strace: Process 13474 attached with 2 threads
[pid 2831] restart_syscall(<... resuming interrupted poll ...> <unfinished ...>
[pid 13474] futex(0x7f7263c789d0, FUTEX_WAIT, 2831, NULL <unfinished ...>
[pid 2831] <... restart_syscall resumed>) = 0
[pid 2831] poll([{fd=5, events=POLLIN}], 1, 1000) = 0 (Timeout)
[pid 2831] poll([{fd=5, events=POLLIN}], 1, 1000) = 0 (Timeout)
...
Here we see the main thread (pid 13474) stuck in futex() (aka pthread_join), yet the httpd thread (pid 2831) is happily doing it's thing, and not stopping.
Attached is a patch against 5.27.0 that corrects the issue by having the main thread clear the ‘stopped’ flag, and not the http thread.
--Andy “Woof!” Spitzer
Comments (2)
-
repo owner -
repo owner Thank you for patch, merged to the development branch
- Log in to comment
Fixed: Issue
#928: Fixed race condition in HTTP interface reload on fast consecutive SIGHUP. Thanks to Andy Spitzer for patch.→ <<cset abf6c4240149>>