monit does not block commands even when there is an action pending

Issue #273 resolved
Harivardhan Pyaram created an issue

If a process has an action pending against it, Monit 5.14 would block the user with "Action failed - please try again later!" This doesn't happend in Monit 5.15.

[root@vsd-danny-mvdclnx162-vm ~]# monit start mediator-status
[root@vsd-danny-mvdclnx162-vm ~]# monit stop mysql
[root@vsd-danny-mvdclnx162-vm ~]# monit summary
The Monit daemon 5.15 uptime: 16h 11m

Program 'ntp-status'                Status ok
Process 'mysql'                     Running - stop pending
Program 'mysql-status'              Not monitored
Process 'mediator'                  Not monitored
Program 'mediator-status'           Not monitored
Process 'jboss'                     Not monitored
Program 'jboss-status'              Not monitored
Process 'ejabberd'                  Running
Program 'ejabberd-status'           Status ok
System 'vsd-danny-mvdclnx162-vm.cnaqa.eng.timetra.com' Running
[root@vsd-danny-mvdclnx162-vm ~]# monit start mediator-status
[root@vsd-danny-mvdclnx162-vm ~]# monit stop mysql
[root@vsd-danny-mvdclnx162-vm ~]# monit start mediator-status
[root@vsd-danny-mvdclnx162-vm ~]# monit summary
The Monit daemon 5.15 uptime: 16h 11m

Program 'ntp-status'                Status ok
Process 'mysql'                     Not monitored
Program 'mysql-status'              Not monitored
Process 'mediator'                  Not monitored
Program 'mediator-status'           Not monitored
Process 'jboss'                     Not monitored
Program 'jboss-status'              Not monitored
Process 'ejabberd'                  Running
Program 'ejabberd-status'           Status ok
System 'vsd-danny-mvdclnx162-vm.cnaqa.eng.timetra.com' Running
[root@vsd-danny-mvdclnx162-vm ~]# monit summary
The Monit daemon 5.15 uptime: 16h 12m

Program 'ntp-status'                Status ok
Process 'mysql'                     Not monitored
Program 'mysql-status'              Not monitored
Process 'mediator'                  Not monitored
Program 'mediator-status'           Not monitored
Process 'jboss'                     Not monitored
Program 'jboss-status'              Not monitored
Process 'ejabberd'                  Running
Program 'ejabberd-status'           Status ok
System 'vsd-danny-mvdclnx162-vm.cnaqa.eng.timetra.com' Running

It just logs like this:

[PDT Oct 27 10:21:02] info     : 'mediator-status' start on user request
[PDT Oct 27 10:21:02] info     : Monit daemon with PID 9987 awakened
[PDT Oct 27 10:21:02] info     : Awakened by User defined signal 1
[PDT Oct 27 10:21:02] debug    : Cannot open proc file /proc/14201/stat -- No such file or directory
[PDT Oct 27 10:21:02] debug    : system statistic error -- cannot read /proc/14201/stat
[PDT Oct 27 10:21:02] debug    : 'mysql-status' start method not defined
[PDT Oct 27 10:21:02] debug    : 'mysql-status' monitoring enabled
[PDT Oct 27 10:21:02] debug    : 'mysql-status' monitoring enabled
[PDT Oct 27 10:21:02] debug    : 'mysql-status' status succeeded [status=0] -- Percona XtraDB Status: PASS
[PDT Oct 27 10:21:02] debug    : 'mysql-status' program started
[PDT Oct 27 10:21:03] debug    : 'mysql-status' status succeeded [status=0] -- Percona XtraDB Status: PASS
[PDT Oct 27 10:21:03] debug    : 'mysql-status' program started
[PDT Oct 27 10:21:03] debug    : pidfile '/var/run/vsd/pid/jboss.pid' does not exist
[PDT Oct 27 10:21:03] info     : 'jboss' start: /opt/vsd/sysmon/jbossStart.sh
[PDT Oct 27 10:21:03] debug    : pidfile '/var/run/vsd/pid/jboss.pid' does not exist
[PDT Oct 27 10:21:11] info     : 'mysql' stop on user request
[PDT Oct 27 10:21:11] info     : Monit daemon with PID 9987 awakened
[PDT Oct 27 10:21:17] info     : 'mediator-status' start on user request
[PDT Oct 27 10:21:17] info     : Monit daemon with PID 9987 awakened
[PDT Oct 27 10:21:26] info     : 'mysql' stop on user request
[PDT Oct 27 10:21:26] info     : Monit daemon with PID 9987 awakened
[PDT Oct 27 10:21:31] info     : 'mediator-status' start on user request
[PDT Oct 27 10:21:31] info     : Monit daemon with PID 9987 awakened

Comments (8)

  1. Tildeslash repo owner

    When you request some action via CLI/GUI, monit (http thread) marks the service action but doesn't perform it in its own context - it wakes up the main thread, which is responsible for performing these actions and all tests.

    The actions have priority over tests, but if it action is requested during the test cycle, it may be delayed - during this timeframe, monit <= 5.14 refused to schedule any other action, but technically the previous action didn't started yet and was just queued. If you decide to change the action before it even started, it wasn't possible - it was necessary to wait till the first action will finish. This limitation is not really necessary - it is more flexible to allow to change/cancel the action when it was not done yet, hence we have allowed it in monit 5.15.

    If you have some scenario where it makes problems, please can you provide more details?

  2. Harivardhan Pyaram reporter

    This is creating problems when there are multiple commands generated by the user. I think it fills up the queue or something and monit freezes up. I am attaching a few logs and a screenshot. It has verbose level logging.

  3. Tildeslash repo owner

    Thank you for logs.

    Please can you send yet the monit configuration to support@mmonit.com, so we can see the relationship between services? (we need to see the checks with their "depends on" statement + "start program").

    There is no command queue literally - when you request a service action, a flag is set in the service context and the main thread will perform the action corresponding to that flag. If you change the action before it was processed, you just change the flag (i.e. the action to do) - the new action overrides the old one which didn't started yet.

    The (temporary) freeze can be related to the following MySQL related log entries, which can be observed in all logs - in this case monit was waiting for MySQL start for 10 minutes from 12:25:10-12:35:10:

    [PST Nov  4 12:25:10] info     : 'mysql' start: /opt/vsd/sysmon/perconaStart.sh
    [PST Nov  4 12:25:10] debug    : pidfile '/var/run/mysql/mysql.pid' does not exist
    [PST Nov  4 12:25:13] debug    : Warning: Permanently added 'vsdha-2.dc.nuagedemo.net' (RSA) to the list of known hosts.
    Warning: Permanently added 'vsdha-3.dc.nuagedemo.net' (RSA) to the list of known hosts.
    sudo: sorry, you must have a tty to run sudo
    [PST Nov  4 12:25:13] debug    : NTP Status: PASS
    VSD DNS check: PASS
    Percona XtraDB Status: FAIL Percona XtraDB Test Fail. MySQL Percona XtraDB is not running...
    mysql cluster check failed
    [PST Nov  4 12:25:13] debug    : pidfile '/var/run/mysql/mysql.pid' does not exist
    ...
    [PST Nov  4 12:35:09] debug    : pidfile '/var/run/mysql/mysql.pid' does not exist
    [PST Nov  4 12:35:10] error    : 'mysql' failed to start (exit status 1) -- /opt/vsd/sysmon/perconaStart.sh: Warning:    Permanently added 'vsdha-2.dc.nuagedemo.net' (RSA) to the list of known hosts.
    Warning: Permanently added 'vsdha-3.dc.nuagedemo.net' (RSA) to the list of known hosts.
    sudo: sorry, you must have a tty to run sudo
    

    It seems that the MySQL start program "/opt/vsd/sysmon/perconaStart.sh" uses "sudo" during startup - this fails as the program doesn't have controlling terminal/tty. If monit is running as root, it doesn't need to use "sudo" - if switch to other user is necessary, please use either "as uid <user>" monit's option or "su" instead of "sudo". When the MySQL start program will be fixed, the action processing should be almost instant and there won't be any freeze.

    Note regarding the freeze: when the start program exits, monit doesn't assume anything about its exit status and tries to wait for MySQL to start up to the given timeout. The wait is skipped only if it wasn't possible to execute the start program at all.

  4. Tildeslash repo owner

    Yet one note - it could be better to use shorter start timeout too ... MySQL should be able to start much faster then in 10 minutes, using the default 30s should be sufficient and if the MySQL start will fail, it won't freeze for too long.

    (note however that the MySQL program's "sudo" issue still needs to be fixed)

  5. Harivardhan Pyaram reporter

    Marking this as resolved since we changed the implementation to work on the new way of Monit

  6. Log in to comment