After update of monit version from 15.12.1 to 5.25.0, failed to start -- could not start required services.
Any changes in depends from 5.12.1 to 5.25.0 version? ndc-ndm depends on ndc-celery and ndc-redis. But when i start ndc-ndm, ndc-celery starts fine, but ndc-ndm fails. This was working fine with monit version 5.12.1.
ndc-ndm Execution failed | Stat... Program
ndc-redis OK Program
ndc-celery OK Program
Please find the sequence of steps executed:
monit stop ndc-ndm
'ndc-ndm' stop on user request
'ndc-ndm' stop: '/bin/bash /etc/ndc/ndc-ndm/bin/ndc-ndm stop'
'ndc-ndm' stop action done
monit stop ndc-celery
'ndc-celery' stop: '/bin/bash /etc/ndc/ndc-ndm/bin/ndc-celery stop'
monit start ndc-ndcm
'ndc-ndm' start on user request
'ndc-celery' start: '/bin/bash /etc/ndc/ndc-ndm/bin/ndc-celery start'
'ndc-ndm' failed to start -- could not start required services: 'ndc-celery'
'ndc-ndm' start action failed
'ndc-ndm' status failed (1) -- NDC-NDM Server is not running
'ndc-celery' start: '/bin/bash /etc/ndc/ndc-ndm/bin/ndc-celery start'
'ndc-ndm' failed to start -- could not start required services: 'ndc-celery'
'ndc-ndm' status failed (1) -- NDC-NDM Server is not running
Comments (7)
-
repo owner -
I think that I'm facing a similar issue. I am using the version 5.24.0
Here is my configuration:
set daemon 60 set logfile /var/log/monit.log set eventqueue basedir /var/monit slots 1000 set mmonit http://monit:monit@mmonit.mydomain.com/collector set httpd port 2812 allow localhost allow mmonit.mydomain.com allow user:password check process httpd with pidfile "/var/run/httpd/httpd.pid" start program = "/usr/bin/systemctl start httpd.service" stop program = "/usr/bin/systemctl stop httpd.service" restart program = "/usr/bin/systemctl restart httpd.service" if cpu is greater than 85% for 10 cycles then restart check process some-service matching "some-service" start program = "/usr/bin/systemctl start some-service.service" stop program = "/usr/bin/systemctl stop some-service.service" restart program = "/usr/bin/systemctl restart some-service.service" if cpu is greater than 85% for 10 cycles then restart check host some-service-webapp with address localhost if failed port 8080 protocol http and request "/healthcheck" with timeout 5 seconds then alert depends on some-service check host some-service.mydomain.com with address some-service.mydomain.com if failed port 80 protocol http and request "/healthcheck" with timeout 5 seconds then alert depends on some-service-webapp depends on httpd
You can see that there is a process running on 8080 and an RP serving it on 80.
The issue I'm facing here is that on server reboot since the process
some-service
takes some time to listen incoming requests the first check ofsome-service-webapps
get aConnection failed
(nothing special here). On the next iteration the this checks goes to OK but the checksome-service.mydomain.com
does notExecution failed
... this one depends on 2 other as shown in the configuration but both are OK.The strange thing here is that a simple
monit reload
put all back in a clean situation with all atOK
...Here are the 2 summary and all the logs I have:
Monit 5.24.0 uptime: 9h 12m ┌─────────────────────────────────┬────────────────────────────┬───────────────┐ │ Service Name │ Status │ Type │ ├─────────────────────────────────┼────────────────────────────┼───────────────┤ │ some-service │ OK │ Process │ ├─────────────────────────────────┼────────────────────────────┼───────────────┤ │ httpd │ OK │ Process │ ├─────────────────────────────────┼────────────────────────────┼───────────────┤ │ some-service-webapp │ Connection failed │ Remote Host │ ├─────────────────────────────────┼────────────────────────────┼───────────────┤ │ some-service.mydomain.com │ Initializing │ Remote Host │ └─────────────────────────────────┴────────────────────────────┴───────────────┘ Monit 5.24.0 uptime: 9h 13m ┌─────────────────────────────────┬────────────────────────────┬───────────────┐ │ Service Name │ Status │ Type │ ├─────────────────────────────────┼────────────────────────────┼───────────────┤ │ some-service │ OK │ Process │ ├─────────────────────────────────┼────────────────────────────┼───────────────┤ │ httpd │ OK │ Process │ ├─────────────────────────────────┼────────────────────────────┼───────────────┤ │ some-service-webapp │ OK │ Remote Host │ ├─────────────────────────────────┼────────────────────────────┼───────────────┤ │ some-service.mydomain.com │ Execution failed │ Remote Host │ └─────────────────────────────────┴────────────────────────────┴───────────────┘ [CEST Apr 3 15:13:53] error : 'some-service' process is not running [CEST Apr 3 15:13:53] info : 'some-service' trying to restart [CEST Apr 3 15:13:53] info : 'some-service' restart: '/usr/bin/systemctl restart some-service.service' [CEST Apr 3 15:13:54] info : 'some-service' process is running with pid 30742 [CEST Apr 3 15:13:54] error : 'some-service-webapp' failed protocol test [HTTP] at [localhost]:8080/healthcheck [TCP/IP] -- Connection refused [CEST Apr 3 15:13:54] error : 'some-service.mydomain.com' failed to start -- could not start required services: 'some-service-webapp' [CEST Apr 3 15:13:54] error : 'some-service-webapp' failed protocol test [HTTP] at [localhost]:8080/healthcheck [TCP/IP] -- Connection refused [CEST Apr 3 15:14:54] info : 'some-service-webapp' connection succeeded to [localhost]:8080/healthcheck [TCP/IP]
-
Here are my debug logs:
[CEST Apr 3 15:29:54] debug : pidfile '/run/monit.pid' does not exist [CEST Apr 3 15:29:54] info : Starting Monit 5.24.0 daemon with http interface at [*]:2812 [CEST Apr 3 15:29:54] debug : Starting Monit HTTP server at [*]:2812 [CEST Apr 3 15:29:54] debug : Monit HTTP server started [CEST Apr 3 15:29:54] info : 'FQDN' Monit 5.24.0 started [CEST Apr 3 15:29:54] debug : M/Monit: event message sent to http://[mmonit.mydomain.com]:80/collector [CEST Apr 3 15:29:54] info : M/Monit heartbeat started [CEST Apr 3 15:29:54] error : 'some-service' process is not running [CEST Apr 3 15:29:54] debug : M/Monit: status message sent to http://[mmonit.mydomain.com]:80/collector [CEST Apr 3 15:29:54] debug : M/Monit: event message sent to http://[mmonit.mydomain.com]:80/collector [CEST Apr 3 15:29:54] info : 'some-service' trying to restart [CEST Apr 3 15:29:54] debug : 'some-service.mydomain.com' stop skipped -- method not defined [CEST Apr 3 15:29:54] debug : 'some-service-webapp' stop skipped -- method not defined [CEST Apr 3 15:29:54] info : 'some-service' restart: '/usr/bin/systemctl restart some-service.service' [CEST Apr 3 15:29:54] debug : 'some-service' restarted [CEST Apr 3 15:29:54] info : 'some-service' process is running with pid 31725 [CEST Apr 3 15:29:54] debug : M/Monit: event message sent to http://[mmonit.mydomain.com]:80/collector [CEST Apr 3 15:29:54] debug : 'some-service' zombie check succeeded [CEST Apr 3 15:29:54] debug : 'some-service' cpu usage check succeeded [current cpu usage = 0.0%] [CEST Apr 3 15:29:54] debug : 'some-service-webapp' start method not defined [CEST Apr 3 15:29:54] debug : 'some-service-webapp' monitoring enabled [CEST Apr 3 15:29:54] debug : pidfile '/var/run/httpd/httpd.pid' does not exist [CEST Apr 3 15:29:54] info : 'httpd' start: '/usr/bin/systemctl start httpd.service' [CEST Apr 3 15:29:55] debug : 'httpd' started [CEST Apr 3 15:29:55] debug : 'httpd' process is running with pid 31736 [CEST Apr 3 15:29:55] debug : 'httpd' zombie check succeeded [CEST Apr 3 15:29:55] debug : 'httpd' cpu usage check succeeded [current cpu usage = 0.0%] [CEST Apr 3 15:29:55] debug : 'some-service' process is running with pid 31725 [CEST Apr 3 15:29:55] debug : 'some-service' zombie check succeeded [CEST Apr 3 15:29:55] debug : 'some-service' cpu usage check succeeded [current cpu usage = 82.9%] [CEST Apr 3 15:29:55] debug : 'some-service-webapp' start method not defined [CEST Apr 3 15:29:55] debug : 'some-service-webapp' monitoring enabled [CEST Apr 3 15:29:55] debug : Socket test failed for [127.0.0.1]:8080 -- Connection refused [CEST Apr 3 15:29:55] error : 'some-service-webapp' failed protocol test [HTTP] at [localhost]:8080/healthcheck [TCP/IP] -- Connection refused [CEST Apr 3 15:29:55] debug : M/Monit: event message sent to http://[mmonit.mydomain.com]:80/collector [CEST Apr 3 15:29:55] error : 'some-service.mydomain.com' failed to start -- could not start required services: 'some-service-webapp' [CEST Apr 3 15:29:55] debug : M/Monit: event message sent to http://[mmonit.mydomain.com]:80/collector [CEST Apr 3 15:29:55] debug : Socket test failed for [127.0.0.1]:8080 -- Connection refused [CEST Apr 3 15:29:55] error : 'some-service-webapp' failed protocol test [HTTP] at [localhost]:8080/healthcheck [TCP/IP] -- Connection refused [CEST Apr 3 15:29:55] debug : 'httpd' process is running with pid 31736 [CEST Apr 3 15:29:55] debug : 'httpd' zombie check succeeded [CEST Apr 3 15:29:55] debug : 'httpd' cpu usage check succeeded [current cpu usage = 0.0%] [CEST Apr 3 15:29:55] debug : Reloading mount information for filesystem '/' [CEST Apr 3 15:29:55] debug : Reloading mount information for filesystem '/var' [CEST Apr 3 15:29:55] debug : Reloading mount information for filesystem '/tmp' [CEST Apr 3 15:29:55] debug : Reloading mount information for filesystem '/var/log' [CEST Apr 3 15:29:55] debug : 'some-service.mydomain.com' test skipped as required service 'some-service-webapp' has errors [CEST Apr 3 15:30:54] debug : M/Monit: status message sent to http://[mmonit.mydomain.com]:80/collector [CEST Apr 3 15:30:55] debug : 'some-service' process is running with pid 31725 [CEST Apr 3 15:30:55] debug : 'some-service' zombie check succeeded [CEST Apr 3 15:30:55] debug : 'some-service' cpu usage check succeeded [current cpu usage = 24.4%] [CEST Apr 3 15:30:55] debug : 'some-service-webapp' succeeded testing protocol [HTTP] at [localhost]:8080/healthcheck [TCP/IP] [response time 132.609 m] [CEST Apr 3 15:30:55] info : 'some-service-webapp' connection succeeded to [localhost]:8080/healthcheck [TCP/IP] [CEST Apr 3 15:30:55] debug : M/Monit: event message sent to http://[mmonit.mydomain.com]:80/collector [CEST Apr 3 15:30:55] debug : 'httpd' process is running with pid 31736 [CEST Apr 3 15:30:55] debug : 'httpd' zombie check succeeded [CEST Apr 3 15:30:55] debug : 'httpd' cpu usage check succeeded [current cpu usage = 0.0%] [CEST Apr 3 15:30:55] debug : 'some-service.mydomain.com' succeeded testing protocol [HTTP] at [some-service.mydomain.com]:80/healthcheck [TCP/IP] [response time 87.674 ms] [CEST Apr 3 15:30:55] debug : 'some-service.mydomain.com' connection succeeded to [some-service.mydomain.com]:80/healthcheck [TCP/IP]
I have no more informations. It seems that the check goes well and succeeds but the result is not taken into account... On each next tick the result is the same the check seems ok but the answer does seems to be taken into account...
-
Same problem on 5.25.
I checked that becasue this condition - https://bitbucket.org/tildeslash/monit/src/b12dc2ace690522e7a2c97255a33cfa49ab09d2d/src/control.c#lines-233
I removed State_Init - and it is work as expected. But i think it wrong fix) I supposed, it is related commit - https://bitbucket.org/tildeslash/monit/commits/8ab768fe
-
Mb it is right place - https://bitbucket.org/tildeslash/monit/src/8ab768fea38fc397391d1e76ec807e262439b93b/src/validate.c#lines-1329
check_program actualy has not tried run programm, but already set inti state. And next, Init state will raise fail in doStart at all depended services.
-
Hello Jonathan Le Bloas,
are you able to do some testing with monit 5.26.0?I do similar things without any problem, based on monit 5.25.2, 5.26.0 on MacOS and AIX.
Lutz
-
Hello Lutz,
Sorry I don’t work with monit since I changed teams.
Regards
- Log in to comment
Please can you provide more data?:
attach full monit configuration of "ndc-ndm", "ndc-redis" and "ndc-celery" services
stop monit and run it in debug mode:
monit -vI
attach the debug log