After update of monit version from 15.12.1 to 5.25.0, failed to start -- could not start required services.

Issue #728 new
Former user created an issue

Any changes in depends from 5.12.1 to 5.25.0 version? ndc-ndm depends on ndc-celery and ndc-redis. But when i start ndc-ndm, ndc-celery starts fine, but ndc-ndm fails. This was working fine with monit version 5.12.1.

 ndc-ndm                          Execution failed | Stat...  Program
 ndc-redis                        OK                          Program
 ndc-celery                       OK                          Program

Please find the sequence of steps executed:

monit stop ndc-ndm

'ndc-ndm' stop on user request
'ndc-ndm' stop: '/bin/bash /etc/ndc/ndc-ndm/bin/ndc-ndm stop'
'ndc-ndm' stop action done

monit stop ndc-celery

'ndc-celery' stop: '/bin/bash /etc/ndc/ndc-ndm/bin/ndc-celery stop'

monit start ndc-ndcm

'ndc-ndm' start on user request
'ndc-celery' start: '/bin/bash /etc/ndc/ndc-ndm/bin/ndc-celery start'
'ndc-ndm' failed to start -- could not start required services: 'ndc-celery'
'ndc-ndm' start action failed
'ndc-ndm' status failed (1) -- NDC-NDM Server is not running
'ndc-celery' start: '/bin/bash /etc/ndc/ndc-ndm/bin/ndc-celery start'
'ndc-ndm' failed to start -- could not start required services: 'ndc-celery'
'ndc-ndm' status failed (1) -- NDC-NDM Server is not running

Comments (7)

  1. Tildeslash repo owner

    Please can you provide more data?:

    1. attach full monit configuration of "ndc-ndm", "ndc-redis" and "ndc-celery" services

    2. stop monit and run it in debug mode:

      monit -vI

    3. attach the debug log

  2. Jonathan Le Bloas

    I think that I'm facing a similar issue. I am using the version 5.24.0

    Here is my configuration:

    set daemon 60
    
    set logfile /var/log/monit.log
    
    set eventqueue basedir /var/monit slots 1000
    set mmonit http://monit:monit@mmonit.mydomain.com/collector
    set httpd port 2812 
        allow localhost
        allow mmonit.mydomain.com
        allow user:password
    
    check process httpd with pidfile "/var/run/httpd/httpd.pid"
        start program = "/usr/bin/systemctl start httpd.service"
        stop program = "/usr/bin/systemctl stop httpd.service"
        restart program = "/usr/bin/systemctl restart httpd.service"
        if cpu is greater than 85% for 10 cycles then restart
    
    check process some-service matching "some-service"
        start program = "/usr/bin/systemctl start some-service.service"
        stop program = "/usr/bin/systemctl stop some-service.service"
        restart program = "/usr/bin/systemctl restart some-service.service"
        if cpu is greater than 85% for 10 cycles then restart
    
    check host some-service-webapp with address localhost
        if failed
            port 8080 protocol http and
            request "/healthcheck"
            with timeout 5 seconds
            then alert
        depends on some-service
    
    check host some-service.mydomain.com with address some-service.mydomain.com
        if failed
            port 80 protocol http and
            request "/healthcheck"
            with timeout 5 seconds
            then alert
        depends on some-service-webapp
        depends on httpd
    

    You can see that there is a process running on 8080 and an RP serving it on 80.

    The issue I'm facing here is that on server reboot since the process some-service takes some time to listen incoming requests the first check of some-service-webapps get a Connection failed (nothing special here). On the next iteration the this checks goes to OK but the check some-service.mydomain.com does not Execution failed... this one depends on 2 other as shown in the configuration but both are OK.

    The strange thing here is that a simple monit reload put all back in a clean situation with all at OK...

    Here are the 2 summary and all the logs I have:

    Monit 5.24.0 uptime: 9h 12m
    ┌─────────────────────────────────┬────────────────────────────┬───────────────┐
     Service Name                     Status                      Type          
    ├─────────────────────────────────┼────────────────────────────┼───────────────┤
     some-service                     OK                          Process       
    ├─────────────────────────────────┼────────────────────────────┼───────────────┤
     httpd                            OK                          Process       
    ├─────────────────────────────────┼────────────────────────────┼───────────────┤
     some-service-webapp              Connection failed           Remote Host   
    ├─────────────────────────────────┼────────────────────────────┼───────────────┤
     some-service.mydomain.com        Initializing                Remote Host   
    └─────────────────────────────────┴────────────────────────────┴───────────────┘
    
    Monit 5.24.0 uptime: 9h 13m
    ┌─────────────────────────────────┬────────────────────────────┬───────────────┐
     Service Name                     Status                      Type          
    ├─────────────────────────────────┼────────────────────────────┼───────────────┤
     some-service                     OK                          Process       
    ├─────────────────────────────────┼────────────────────────────┼───────────────┤
     httpd                            OK                          Process       
    ├─────────────────────────────────┼────────────────────────────┼───────────────┤
     some-service-webapp              OK                          Remote Host   
    ├─────────────────────────────────┼────────────────────────────┼───────────────┤
     some-service.mydomain.com        Execution failed            Remote Host   
    └─────────────────────────────────┴────────────────────────────┴───────────────┘
    
    [CEST Apr  3 15:13:53] error    : 'some-service' process is not running
    [CEST Apr  3 15:13:53] info     : 'some-service' trying to restart
    [CEST Apr  3 15:13:53] info     : 'some-service' restart: '/usr/bin/systemctl restart some-service.service'
    [CEST Apr  3 15:13:54] info     : 'some-service' process is running with pid 30742
    [CEST Apr  3 15:13:54] error    : 'some-service-webapp' failed protocol test [HTTP] at [localhost]:8080/healthcheck [TCP/IP] -- Connection refused
    [CEST Apr  3 15:13:54] error    : 'some-service.mydomain.com' failed to start -- could not start required services: 'some-service-webapp'
    [CEST Apr  3 15:13:54] error    : 'some-service-webapp' failed protocol test [HTTP] at [localhost]:8080/healthcheck [TCP/IP] -- Connection refused
    
    [CEST Apr  3 15:14:54] info     : 'some-service-webapp' connection succeeded to [localhost]:8080/healthcheck [TCP/IP]
    
  3. Jonathan Le Bloas

    Here are my debug logs:

    [CEST Apr  3 15:29:54] debug    : pidfile '/run/monit.pid' does not exist
    [CEST Apr  3 15:29:54] info     : Starting Monit 5.24.0 daemon with http interface at [*]:2812
    [CEST Apr  3 15:29:54] debug    : Starting Monit HTTP server at [*]:2812
    [CEST Apr  3 15:29:54] debug    : Monit HTTP server started
    [CEST Apr  3 15:29:54] info     : 'FQDN' Monit 5.24.0 started
    [CEST Apr  3 15:29:54] debug    : M/Monit: event message sent to http://[mmonit.mydomain.com]:80/collector
    [CEST Apr  3 15:29:54] info     : M/Monit heartbeat started
    [CEST Apr  3 15:29:54] error    : 'some-service' process is not running
    [CEST Apr  3 15:29:54] debug    : M/Monit: status message sent to http://[mmonit.mydomain.com]:80/collector
    [CEST Apr  3 15:29:54] debug    : M/Monit: event message sent to http://[mmonit.mydomain.com]:80/collector
    [CEST Apr  3 15:29:54] info     : 'some-service' trying to restart
    [CEST Apr  3 15:29:54] debug    : 'some-service.mydomain.com' stop skipped -- method not defined
    [CEST Apr  3 15:29:54] debug    : 'some-service-webapp' stop skipped -- method not defined
    [CEST Apr  3 15:29:54] info     : 'some-service' restart: '/usr/bin/systemctl restart some-service.service'
    [CEST Apr  3 15:29:54] debug    : 'some-service' restarted
    [CEST Apr  3 15:29:54] info     : 'some-service' process is running with pid 31725
    [CEST Apr  3 15:29:54] debug    : M/Monit: event message sent to http://[mmonit.mydomain.com]:80/collector
    [CEST Apr  3 15:29:54] debug    : 'some-service' zombie check succeeded
    [CEST Apr  3 15:29:54] debug    : 'some-service' cpu usage check succeeded [current cpu usage = 0.0%]
    [CEST Apr  3 15:29:54] debug    : 'some-service-webapp' start method not defined
    [CEST Apr  3 15:29:54] debug    : 'some-service-webapp' monitoring enabled
    [CEST Apr  3 15:29:54] debug    : pidfile '/var/run/httpd/httpd.pid' does not exist
    [CEST Apr  3 15:29:54] info     : 'httpd' start: '/usr/bin/systemctl start httpd.service'
    [CEST Apr  3 15:29:55] debug    : 'httpd' started
    [CEST Apr  3 15:29:55] debug    : 'httpd' process is running with pid 31736
    [CEST Apr  3 15:29:55] debug    : 'httpd' zombie check succeeded
    [CEST Apr  3 15:29:55] debug    : 'httpd' cpu usage check succeeded [current cpu usage = 0.0%]
    [CEST Apr  3 15:29:55] debug    : 'some-service' process is running with pid 31725
    [CEST Apr  3 15:29:55] debug    : 'some-service' zombie check succeeded
    [CEST Apr  3 15:29:55] debug    : 'some-service' cpu usage check succeeded [current cpu usage = 82.9%]
    [CEST Apr  3 15:29:55] debug    : 'some-service-webapp' start method not defined
    [CEST Apr  3 15:29:55] debug    : 'some-service-webapp' monitoring enabled
    [CEST Apr  3 15:29:55] debug    : Socket test failed for [127.0.0.1]:8080 -- Connection refused
    [CEST Apr  3 15:29:55] error    : 'some-service-webapp' failed protocol test [HTTP] at [localhost]:8080/healthcheck [TCP/IP] -- Connection refused
    [CEST Apr  3 15:29:55] debug    : M/Monit: event message sent to http://[mmonit.mydomain.com]:80/collector
    [CEST Apr  3 15:29:55] error    : 'some-service.mydomain.com' failed to start -- could not start required services: 'some-service-webapp'
    [CEST Apr  3 15:29:55] debug    : M/Monit: event message sent to http://[mmonit.mydomain.com]:80/collector
    [CEST Apr  3 15:29:55] debug    : Socket test failed for [127.0.0.1]:8080 -- Connection refused
    [CEST Apr  3 15:29:55] error    : 'some-service-webapp' failed protocol test [HTTP] at [localhost]:8080/healthcheck [TCP/IP] -- Connection refused
    [CEST Apr  3 15:29:55] debug    : 'httpd' process is running with pid 31736
    [CEST Apr  3 15:29:55] debug    : 'httpd' zombie check succeeded
    [CEST Apr  3 15:29:55] debug    : 'httpd' cpu usage check succeeded [current cpu usage = 0.0%]
    [CEST Apr  3 15:29:55] debug    : Reloading mount information for filesystem '/'
    [CEST Apr  3 15:29:55] debug    : Reloading mount information for filesystem '/var'
    [CEST Apr  3 15:29:55] debug    : Reloading mount information for filesystem '/tmp'
    [CEST Apr  3 15:29:55] debug    : Reloading mount information for filesystem '/var/log'
    [CEST Apr  3 15:29:55] debug    : 'some-service.mydomain.com' test skipped as required service 'some-service-webapp' has errors
    
    [CEST Apr  3 15:30:54] debug    : M/Monit: status message sent to http://[mmonit.mydomain.com]:80/collector
    [CEST Apr  3 15:30:55] debug    : 'some-service' process is running with pid 31725
    [CEST Apr  3 15:30:55] debug    : 'some-service' zombie check succeeded
    [CEST Apr  3 15:30:55] debug    : 'some-service' cpu usage check succeeded [current cpu usage = 24.4%]
    [CEST Apr  3 15:30:55] debug    : 'some-service-webapp' succeeded testing protocol [HTTP] at [localhost]:8080/healthcheck [TCP/IP] [response time 132.609 m]
    [CEST Apr  3 15:30:55] info     : 'some-service-webapp' connection succeeded to [localhost]:8080/healthcheck [TCP/IP]
    [CEST Apr  3 15:30:55] debug    : M/Monit: event message sent to http://[mmonit.mydomain.com]:80/collector
    [CEST Apr  3 15:30:55] debug    : 'httpd' process is running with pid 31736
    [CEST Apr  3 15:30:55] debug    : 'httpd' zombie check succeeded
    [CEST Apr  3 15:30:55] debug    : 'httpd' cpu usage check succeeded [current cpu usage = 0.0%]
    [CEST Apr  3 15:30:55] debug    : 'some-service.mydomain.com' succeeded testing protocol [HTTP] at [some-service.mydomain.com]:80/healthcheck [TCP/IP] [response time 87.674 ms]
    [CEST Apr  3 15:30:55] debug    : 'some-service.mydomain.com' connection succeeded to [some-service.mydomain.com]:80/healthcheck [TCP/IP]
    

    I have no more informations. It seems that the check goes well and succeeds but the result is not taken into account... On each next tick the result is the same the check seems ok but the answer does seems to be taken into account...

  4. Lutz Mader

    Hello Jonathan Le Bloas,
    are you able to do some testing with monit 5.26.0?

    I do similar things without any problem, based on monit 5.25.2, 5.26.0 on MacOS and AIX.

    Lutz

  5. Log in to comment