- changed status to invalid
url check starts after the the number of continues cycle meant for failure.
Given is my monit config file for sample app.
check process sample with pidfile /U01/ash/sample-app/sample.pid
start program = "/bin/sh -c 'cd /U01/ash/sample-app && ./start.sh'" as uid 'ash' as gid 'ash' with timeout 220 seconds
stop program = "/bin/sh -c 'cd /U01/ash/sample-app && ./stop.sh'" with timeout 220 seconds
if failed url http://0.0.0.0:5100/health for 8 cycles then restart
my daemon is set to 30 (seconds)
Ideally once I do monit start sample. Monit should start monitoring the process as well as start performing health check.
And if the url fails for 8 continues cycles it should restart the application.
But its was observed that after it was started at 23:50 it stated Not Monitored - Start pending.
and in ths nexy cycle this was the output of monit status sample :
Process 'sample'
status OK
monitoring status Monitored
monitoring mode active
on reboot start
pid 2476
parent pid 1
uid 1005
effective uid 1005
gid 1006
uptime 0m
threads 1
children 0
cpu -
cpu total -
memory 1.9% [17.9 MB]
memory total 1.9% [17.9 MB]
security attribute -
filedescriptors 6 [0.6% of 1024 limit]
total filedescriptors 6
read bytes 0 B/s [3.2 MB total]
disk read bytes 0 B/s [10.2 MB total]
disk read operations 0.0 reads/s [807 reads total]
write bytes 0 B/s [51.1 kB total]
disk write bytes 0 B/s [8 kB total]
disk write operations 0.0 writes/s [20 writes total]
port response time -
data collected Sat, 11 Jul 2020 22:24:06
Sat Jul 11 22:24:14 IST 2020
This was the response after first cycle. As you can see it as not performed the url check beacuse “port response time is -“
This remains the same for the next seven cycle.
On the eighth cycle this was the response:
port response time 55.793 ms to 0.0.0.0:5100/health type TCP/IP protocol HTTP
data collected Sat, 11 Jul 2020 22:28:06
Sat Jul 11 22:28:08 IST 2020
This means it did its first url check after 8 cycles.
and post that it checks after every 30 seconds which is my daemon size.
- Serving Flask app "sample-app" (lazy loading)
- Environment: production
WARNING: Do not use the development server in a production environment.
Use a production WSGI server instead. - Debug mode: off
- Running on http://0.0.0.0:5100/ (Press CTRL+C to quit)
127.0.0.1 - - [11/Jul/2020 22:28:06] "GET /health HTTP/1.1" 200 -
127.0.0.1 - - [11/Jul/2020 22:28:36] "GET /health HTTP/1.1" 200 -
127.0.0.1 - - [11/Jul/2020 22:29:07] "GET /health HTTP/1.1" 200 -
127.0.0.1 - - [11/Jul/2020 22:29:37] "GET /health HTTP/1.1" 200 -
As you can see here first check the application received was at 22:28:06 and post that every 30 sec.
Bug: Monit should start monitoring the url on start up directly and not after 8 cycles.
Enhancement: There should be a parameter for delay in starting the url health check. Because we have few application which starts properly after few seconds from where we want to start monitoring the service after few seconds the start script is completed.
like: check process with pid abc.pid delay 3 cycles
So that i start monitoring sample application after 3 cycles after monit start sample.
Comments (1)
-
repo owner - Log in to comment
Monit starts the url monitoring immediately, but the connection most probably times out. It doesn't mark the service as failure as it waits for 8 cycles and there is no response time as the connection didn't succeed.
The "0.0.0.0" address is invalid - use real address such as "127.0.0.1" in monit configuration.