Monit doesn't wait x cycles after restarting service.
I'm pretty sure I'm doing this right. This is my current config but have tried a number of variants of times/cycles etc.
check process mytask with pidfile /var/run/mytask.pid
start program = "/etc/init.d/mytask start" with timeout 120 seconds
stop program = "/etc/init.d/mytask stop"
if failed host localhost
port 8080
with timeout 30 seconds
3 times in 3 cycles
then restart
Processing monitoring and restarting works fine.
The port monitoring though doesn't appear to try three times after restarting. Basically I want it to try 3 times on the port before restarting and that should apply every time so even after a restart it should reset its failed counter. Once it's failed, it only tries once even if the service comes good in the mean time. Am I missing something here?
Feb 1 05:48:37 testsys monit[5320]: Starting Monit 5.25.1 daemon with http interface at [0.0.0.0]:2812
Feb 1 05:48:37 testsys monit[5322]: 'sm01' Monit 5.25.1 started
Feb 1 05:50:07 testsys monit[5322]: 'mytask' failed protocol test [DEFAULT] at [localhost]:8080 [TCP/IP] -- Connection refused
Feb 1 05:50:38 testsys monit[5322]: 'mytask' failed protocol test [DEFAULT] at [localhost]:8080 [TCP/IP] -- Connection refused
Feb 1 05:51:08 testsys monit[5322]: 'mytask' failed protocol test [DEFAULT] at [localhost]:8080 [TCP/IP] -- Connection refused
Feb 1 05:51:08 testsys monit[5322]: 'mytask' trying to restart
Feb 1 05:51:08 testsys monit[5322]: 'mytask' stop: '/etc/init.d/mytask.sh stop'
Feb 1 05:51:08 testsys monit[5322]: 'mytask' start: '/etc/init.d/mytask.sh start'
Feb 1 05:53:09 testsys monit[5322]: 'mytask' failed protocol test [DEFAULT] at [localhost]:8080 [TCP/IP] -- Connection refused
Feb 1 05:53:09 testsys monit[5322]: 'mytask' trying to restart
Feb 1 05:53:09 testsys monit[5322]: 'mytask' stop: '/etc/init.d/mytask.sh stop'
Feb 1 05:53:09 testsys monit[5322]: 'mytask' start: '/etc/init.d/mytask.sh start'
Feb 1 05:55:09 testsys monit[5322]: 'mytask' failed protocol test [DEFAULT] at [localhost]:8080 [TCP/IP] -- Connection refused
Feb 1 05:55:09 testsys monit[5322]: 'mytask' trying to restart
Feb 1 05:55:09 testsys monit[5322]: 'mytask' stop: '/etc/init.d/mytask.sh stop'
Feb 1 05:55:10 testsys monit[5322]: 'mytask' start: '/etc/init.d/mytask.sh start'
Feb 1 05:57:10 testsys monit[5322]: 'mytask' failed protocol test [DEFAULT] at [localhost]:8080 [TCP/IP] -- Connection refused
Feb 1 05:57:10 testsys monit[5322]: 'mytask' trying to restart
Feb 1 05:57:10 testsys monit[5322]: 'mytask' stop: '/etc/init.d/mytask.sh stop'
Feb 1 05:57:11 testsys monit[5322]: 'mytask' start: '/etc/init.d/mytask.sh start'
Feb 1 05:59:11 testsys monit[5322]: 'mytask' failed protocol test [DEFAULT] at [localhost]:8080 [TCP/IP] -- Connection refused
Feb 1 05:59:11 testsys monit[5322]: 'mytask' trying to restart
Feb 1 05:59:11 testsys monit[5322]: 'mytask' stop: '/etc/init.d/mytask.sh stop'
Feb 1 05:59:11 testsys monit[5322]: 'mytask' start: '/etc/init.d/mytask.sh start'
Feb 1 06:01:12 testsys monit[5322]: 'mytask' failed protocol test [DEFAULT] at [localhost]:8080 [TCP/IP] -- Connection refused
Feb 1 06:01:12 testsys monit[5322]: 'mytask' trying to restart
Feb 1 06:01:12 testsys monit[5322]: 'mytask' stop: '/etc/init.d/mytask.sh stop'
Feb 1 06:01:12 testsys monit[5322]: 'mytask' start: '/etc/init.d/mytask.sh start'
Feb 1 06:03:13 testsys monit[5322]: 'mytask' connection succeeded to [localhost]:8080 [TCP/IP]
Feb 1 06:08:44 testsys monit[5322]: 'mytask' failed protocol test [DEFAULT] at [localhost]:8080 [TCP/IP] -- Connection refused
Feb 1 06:08:44 testsys monit[5322]: 'mytask' trying to restart
Feb 1 06:08:44 testsys monit[5322]: 'mytask' stop: '/etc/init.d/mytask.sh stop'
Feb 1 06:08:45 testsys monit[5322]: 'mytask' start: '/etc/init.d/mytask.sh start'
I'm using 5.25.1 of monit (but can't select that in the version dropdown) on Yocto Linux (morty). I don't think there's any issue with the general setup though.
Comments (6)
-
-
reporter - changed version to 5.25.1
-
reporter I wanted to use monit to restart some services based on CPU and/or memory but really need the x cycles parameter to work properly.
Any chance of taking a look at this one or at least pointing where to look in the source? If you can give me a nudge in the right direction I may be able to fix it myself.
-
reporter - marked as major
-
reporter If a trigger occurs but doesn’t clear it will keep triggering for every cycle until it clears. This is a bit of a problem for us restarting a process but can be partly overcome by using the “timeout” option used when starting the service. Alerts don’t seem to retrigger for some reason whereas service restarts do.
The logging is a bit confusing particularly when testing. The first time the logs are warning level and the final trigger log is error level. Thereafter, the leadup messages are debug level so don’t appear unless you have debug enabled (-v). There’s not really enough logs to see if that’s your issue @Sebastian
I’m not quite sure how to resolve since in some cases, you don’t want it to keep triggering and some you do (although it should be after y cycles or x times in y cycles)
There is this bug (I’m pretty certain) here that will affect any “x times in y cycles” (where x is 1 for y=1-8, x=1-2 for y=9-16 and so on) monitoring as it fills the state_map with 1 bit in each byte when it should be zeroing it.
https://bitbucket.org/tildeslash/monit/src/c8e3ee8b5b7fa27b8bdc888e1a343b937cb58336/src/event.c#lines-163 should read:
memset(&(E->state_map), 0, sizeof(E->state_map)); // Restart state map on state change, so we'll not flicker on multiple-failures condition (next state change requires full number of cycles to pass)
-
- Log in to comment
Exactly the same problem here with a plain monit 5.16 on Ubuntu 16:
It works the first time and logs two lines before attempting to restart and then logs one line that the process is again within the limit:
But after the first time it only logs one line and restarts directly with the first cycle matching the memory limit:
Any suggestions?