monit daemon doesn't start if stale PID file exists
I’m observing an issue (first seen in a docker image) where the monit daemon does not start after an unclean shutdown. It is due to a stale PID file existing in /var/run/monit.pid
even though the process itself no longer exists. I have created a simple, docker based reproducer here: https://github.com/Austinpayne/monit-pid-reproducer
I’m wondering what the solution is. It seems simple enough for monit to check if the PID file exists then also check if that process is a running daemon and either a) exit if it does in fact exist or restart the daemon or b) continue to restart the daemon. However, maybe this is the intended behavior? I did not find any mention of the current behavior in the docs or issues here. For my own purposes the workaround in the linked repo is sufficient.
Comments (6)
-
repo owner -
reporter Thanks for the input @Tildeslash . It does appear to be case 1 like you say. Is this behavior documented in the docs somewhere? For what it’s worth, here is a run with debug logging turned on:
$ docker start -a monit New Monit id: 41b632b21dab57a919a813767e896bb0 Stored in '/var/lib/monit/id' Runtime constants: Control file = /etc/monit/monitrc Log file = /var/log/monit.log Pid file = /run/monit.pid Id file = /var/lib/monit/id State file = /var/lib/monit/state Debug = True Log = True Use syslog = False Is Daemon = True Use process engine = True Limits = { = programOutput: 512 B = sendExpectBuffer: 256 B = fileContentBuffer: 512 B = httpContentBuffer: 1 MB = networkTimeout: 5 s = programTimeout: 5 m = stopTimeout: 30 s = startTimeout: 30 s = restartTimeout: 30 s = } On reboot = start Poll time = 15 seconds with start delay 0 seconds Event queue = base directory /var/lib/monit/events with 100 slots Start monit httpd = False The service list contains the following entries: System Name = a4ff9d635b20 Monitoring mode = active On reboot = start ------------------------------------------------------------------------------- pidfile '/run/monit.pid' does not exist Starting Monit 5.31.0 daemon 'a4ff9d635b20' Monit 5.31.0 started Processing postponed events queue Cannot read proc file '/proc/1/attr/current' -- Invalid argument $ docker kill --signal KILL monit $ docker start -a monit Runtime constants: Control file = /etc/monit/monitrc Log file = /var/log/monit.log Pid file = /run/monit.pid Id file = /var/lib/monit/id State file = /var/lib/monit/state Debug = True Log = True Use syslog = False Is Daemon = True Use process engine = True Limits = { = programOutput: 512 B = sendExpectBuffer: 256 B = fileContentBuffer: 512 B = httpContentBuffer: 1 MB = networkTimeout: 5 s = programTimeout: 5 m = stopTimeout: 30 s = startTimeout: 30 s = restartTimeout: 30 s = } On reboot = start Poll time = 15 seconds with start delay 0 seconds Event queue = base directory /var/lib/monit/events with 100 slots Start monit httpd = False The service list contains the following entries: System Name = a4ff9d635b20 Monitoring mode = active On reboot = start ------------------------------------------------------------------------------- Monit daemon with PID 1 awakened
-
repo owner - changed status to resolved
I don’t think it is documented. It could warrant a FAQ entry though. I'm closing this as resolved - you found the solution yourselves :-) Under normal operations Monit will create and delete its pid file. But on server hard shutdown or crash and subsequent startup, this situation is unlikely but definitely might occur. It is best to just remove Monit's pid file before Monit is started as a general rule. Thanks for great testing and feel free to enter an entry in the wiki FAQ
-
reporter Handling this at the system level seems appropriate, thank you for the input!
-
What you could actually do in containers to avoid that problem, is to store PID-files on tmpfs with (Docs docker)
docker start --tmpfs /run
or in a docker-compose.yaml with (Docs composer)
tmps: - /run
This will simply create a tmpfs that is definitely empty when the container starts.
-
repo owner That's a good idea Henning. I forgot about that in my reply. Edit: I took the liberty of adding this to the FAQ. Feel free to edit if you want to elaborate.
- Log in to comment
There are two things that can do this AFAICS,
In both cases you should be able to see something in the Monit log file if you start Monit in debug mode. (monit -v)
Ps. I took a quick look at your repo and the problem is, as expected, the first case above. Monit reads pid = 1 from its pid file, test if it belongs to a running process and if so assume it is Monit and wakes it up and report
Monit daemon with PID 1 awakened
and then quits. Unfortunately pid 1 is no longer Monit, but some other process. So yourentrypoint-workaround
is actually what I would recommend. We could inspect the OS’s process table and the actual short command of the pid and check that it is monit, but that is starting to dig a rabbit hole.