Monit(5.17.1) program checks freezes in initializing state for more than 45 minutes after stopping and starting them

Issue #494 closed
Former user created an issue

We have a few processes and programs grouped into a common Monit group. We stopped and started this group, all the program checks stuck into "initializing" state for almost an hour. It looks like monit freeze as cluster status comes up on quitting and starting monit again.

Note: With very verbose logging, we see several errors like this in the logs:

[PDT Oct 31 11:52:55] debug    : Cannot open proc file /proc/18422/stat -- No such file or directory
[PDT Oct 31 11:52:55] debug    : freez -- cannot read /proc/18422/stat
[PDT Oct 31 11:55:33] debug    : Cannot open proc file /proc/19030/stat -- No such file or directory
[PDT Oct 31 11:55:33] debug    : system statistic error -- cannot read /proc/19030/stat

Comments (7)

  1. Tildeslash repo owner

    Please send the monit configuration of the whole group.

    Note that if there are service dependencies, monit 5.15 or later will wait with the check until the parent service is up. See issue #249. This behaviour is not bug, but result of the service dependency.

    If your "check program" is root of the whole dependency group, monit won't start subsequent services until its state is OK. If you run the check program less frequently, all children services will have to wait for it.

  2. nshubhy

    Hi Team,

    I have sent required data to support@mmonit.com. There is no child dependency on these program, however, these program are dependent on 2 other program which were running fine and monit state was OK for them. So the parent dependency was already satisfied.

    Thanks

  3. Tildeslash repo owner

    Hello, thank for data.

    It is configuration problem:

    1. high start timeout for "jboss" service (start timeout 10 minutes)
    2. missing dependency of "jboss" service on ejabberd

    The dependency chart for stuck services:

    mysql-status                    Status ok
        jboss                       Initializing                    (start and stop timeout 600s)
            jboss-status            Initializing - start pending
                mediator            Initializing - start pending    (start and stop timeout 600s) 
                    mediator-status Initializing - start pending
    

    The jboss service failed to start (as the start timeout is set to 600 seconds, each start attempt times out after 10 minutes):

    [PDT Oct 31 11:17:28] info     : 'jboss-status' start on user request
    [PDT Oct 31 11:17:28] info     : 'jboss' start on user request
    [PDT Oct 31 11:17:28] info     : 'jboss' start: /opt/vsd/sysmon/jbossStart.sh
    [PDT Oct 31 11:17:29] debug    : JBoss can't start without Ejabberd. Please ensure ejabberd passes and try again.
    [PDT Oct 31 11:27:36] error    : 'jboss' failed to start (exit status 1) -- /opt/vsd/sysmon/jbossStart.sh: node 'vsd-kmurthy-set1-node1.mv.nuagenetworks.net' ...
    [PDT Oct 31 11:27:36] debug    : 'jboss' monitoring enabled
    [PDT Oct 31 11:27:36] info     : 'jboss' start action failed
    

    -> "JBoss can't start without Ejabberd" ... timeout is set to 600 seconds, so the start attempt lasted 11:17-11:27

    jboss-status start triggers start of prerequisite jboss, which is still not running:

    [PDT Oct 31 11:27:36] info     : 'jboss' start: /opt/vsd/sysmon/jbossStart.sh
    [PDT Oct 31 11:27:36] debug    : JBoss can't start without Ejabberd. Please ensure ejabberd passes and try again.
    [PDT Oct 31 11:37:44] error    : 'jboss' failed to start (exit status 1) -- /opt/vsd/sysmon/jbossStart.sh: node 'vsd-kmurthy-set1-node1.mv.nuagenetworks.net' ...
    [PDT Oct 31 11:37:44] error    : 'jboss-status' failed to start -- could not start required services: 'jboss'
    

    -> "JBoss can't start without Ejabberd" ... timeout is set to 600 seconds, so the start attempt lasted 11:27-11:37

    ejbca-status start triggers start of prerequisite jboss, which is still not running:

    [PDT Oct 31 11:37:44] info     : 'jboss' start: /opt/vsd/sysmon/jbossStart.sh
    [PDT Oct 31 11:37:44] debug    : JBoss can't start without Ejabberd. Please ensure ejabberd passes and try again.
    [PDT Oct 31 11:47:52] error    : 'jboss' failed to start (exit status 1) -- /opt/vsd/sysmon/jbossStart.sh: node 'vsd-kmurthy-set1-node1.mv.nuagenetworks.net' ...
    [PDT Oct 31 11:47:52] error    : 'ejbca-status' failed to start -- could not start required services: 'jboss'
    

    -> "JBoss can't start without Ejabberd" ... timeout is set to 600 seconds, so the start attempt lasted 11:37-11:47

    jboss-status start triggers start of prerequisite jboss, which is still not running:

    [PDT Oct 31 11:47:52] info     : 'jboss' start: /opt/vsd/sysmon/jbossStart.sh
    [PDT Oct 31 11:47:52] debug    : JBoss can't start without Ejabberd. Please ensure ejabberd passes and try again.
    [PDT Oct 31 11:58:00] error    : 'jboss' failed to start (exit status 1) -- /opt/vsd/sysmon/jbossStart.sh: node 'vsd-kmurthy-set1-node1.mv.nuagenetworks.net' ...
    [PDT Oct 31 11:58:00] error    : 'jboss-status' failed to start -- could not start required services: 'jboss'
    
    [PDT Oct 31 11:58:00] error    : 'mediator' failed to start -- could not start required services: 'jboss-status'
    [PDT Oct 31 11:58:00] info     : 'mediator' start action failed
    

    -> "JBoss can't start without Ejabberd" ... timeout is set to 600 seconds, so the start attempt lasted 11:47-11:58

    Summary:

    1.) reduce jboss start timeout ... it is set to 600 seconds and on error blocks the testing for 10 minutes. The default is 30 seconds, if it is not enough for jboss start, adjust the limit, but use real jboss startup time, not 10 minutes as rule of thumb.

    2.) fix the jboss dependency ... it seems that jboss refuses to start if ejabberd is not running

    Best regards, The M/Monit team

  4. nshubhy

    Hi ,

    Thanks for your response. Jboss, jboss-status, mediator, mediator-status and ejbca-status are expected to fail when ejabberd is not there but the program mysql-cluster-status, ejabberd-cluster-status and zookeeper-cluster-status are not expected to remain in “initializing” state for about an hour. If we manually run these program they return output in less than a minute.

    Could you please let us know the reason we are seeing mysql-cluster-status, ejabberd-cluster-status and zookeeper-cluster-status stuck in “initializing”?

    Thanks

  5. nshubhy

    mysql-cluster-status, ejabberd-cluster-status and zookeeper-cluster-status are NOT dependent on Jboss. These program are dependent on ntp-status and dns-status. Ntp-status and dns-status are running fine and their monit status is OK.

  6. nshubhy

    I understand the problem is because testing is serial - when jboss blocks , it blocks all other tests.

    thanks Team.

  7. Log in to comment