Stop monit command kills my processes

Issue #106 resolved
Alexander Litvak created an issue

I couldn't find any information on it but it looks like when I change configuration via puppet the module restarts monit, i.e.

/sbin/stop monit
/sbin/start monit

However crude it might be, I think it produces undesirable effect on my system which needs to be avoided.

Monit kills some monitored processes during exit and starts it again . This is of course not intended especially when it is done during production time.

I noticed that some how if those processes are started when monit executed in foreground, they are not killed when a simple kill is issued to a monit process. Neither does later subsequent restart with upstart init has any effect.

I monitor rsyslogd , sshd , mysql, ntpd, winbindd, and our application x. Only rsyslogd and application x are affected, i.e. killed on /sbin/stop monit

Thanks,

Comments (18)

  1. Tildeslash repo owner

    This doesn't look like Monit issue - Monit doesn't stop any processes when it quits. In order to stop the processes Monit must be explicitly instructed to do so (for example by "monit stop all") => it seems that the service stop is externally driven (by puppet too?)

    Please check your Monit logs (you can enable it using "set logfile" statement).

  2. Alexander Litvak reporter

    Monit doesn't tell much in logs

    Oct 22 02:03:56 node1cl2-chi monit[4403]: Shutting down monit HTTP server
    Oct 22 02:03:56 node1cl2-chi rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="4452" x-info="http://www.rsyslog.com"] exiting on signal 15.
    Oct 22 02:05:29 node1cl2-chi rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="6296" x-info="http://www.rsyslog.com"] start
    Oct 22 02:05:44 node1cl2-chi monit[6257]: 'xbroker' process is running with pid 6282
    Oct 22 02:05:44 node1cl2-chi monit[6257]: 'rsyslogd' process is running with pid 6296
    

    I have CentOS 6.5

    This is my upstart file

    # File managed by puppet on node1cl2-chi.siptalk.com.
    # Changes made to this file outside of puppet will be lost on the next puppet run!
    # xcast::roles::xbroker::inittab
    #
    
    
    # To install disable the old way of doing things:
    #
    #   /etc/init.d/monit stop && update-rc.d -f monit remove
    #
    # then put this script here:
    #
    #   /etc/init/monit.conf
    #
    # and reload upstart configuration:
    #
    #   initctl reload-configuration
    #
    # You can manually start and stop monit like this:
    # 
    # start monit
    # stop monit
    #
    
    description "Monit service manager"
    
    limit core unlimited unlimited
    limit nofile 131072 196608
    
    start on runlevel [2345]
    stop on runlevel [!2345]
    
    expect daemon
    respawn
    
    exec /usr/bin/monit -c /etc/monitrc
    
    pre-stop exec /usr/bin/monit -c /etc/monitrc quit
    

    I can reproduce the issue just doing stop monit / start monit from command line. I don't need puppet to execute it.

  3. Alexander Litvak reporter

    Looks like if monit starts the processes below, they are killed with signal 15 at the time monit stopped.

    Oct 22 02:54:43 node1cl2-chi monit[12202]: Shutting down monit HTTP server
    Oct 22 02:54:43 node1cl2-chi rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="12241" x-info="http://www.rsyslog.com"] exiting on signal 15.
    Oct 22 02:56:19 node1cl2-chi rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="12554" x-info="http://www.rsyslog.com"] start
    Oct 22 02:56:34 node1cl2-chi monit[12515]: 'xbroker' process is running with pid 12540
    Oct 22 02:56:34 node1cl2-chi monit[12515]: 'rsyslogd' process is running with pid 12554
    

    I ran strace -ff on the stop monit, but I don't see anything special and related to external processes. However notice how upon shutting down monit , rsyslog is exiting on signal 15 (xborker does the same thing just doesn't write into syslog log locally).

    Unfortunately monit -v doesn't produce more detailed logs.

  4. Tildeslash repo owner

    Can you please provide output of the following command:

     ps -ef | egrep "(rsyslogd|monit)"
    

    Which Monit version it is?

  5. Alexander Litvak reporter
    ps -ef | egrep "(rsyslogd|monit)"
    root     15433     1  0 03:19 ?        00:00:40 /usr/bin/monit -vvv -c /etc/monitrc
    root     15465     1  0 03:19 ?        00:00:02 /sbin/rsyslogd -i /var/run/syslogd.pid -c 5
    
    monit -V
    This is Monit version 5.9
    
  6. Alexander Litvak reporter

    If you have a debugging version of monit I can run it to reproduce the issue and send you logs.

  7. Alexander Litvak reporter

    I updated the monit to the latest, still the issue remains. If process was started outside of monit (from shell via command line), then stop / start of monit has no effect on the process. However if process was started by monit, stop of the monit causes the process to exit via signal 15.

  8. Alexander Litvak reporter

    I looked at my scripts and upstart config and I see nothing. Can the state of the monit have anything to do with the issue? I am providing configs just in case

    /etc/monitrc
    
    set daemon  15              # check services at 15-sec intervals
    #
    set logfile syslog facility log_daemon
    set idfile /var/run/.monit.id
    set mailserver  localhost
    set eventqueue
         basedir /var/monit  # set the base directory where events will be stored
         slots 100           # optionally limit the queue size
    set alert xxxxxxxx not on { instance, action }
    ## Monit has an embedded web server which can be used to view status of 
    ## services monitored and manage services from a web interface. See the
    ## Monit Wiki if you want to enable SSL for the web server. 
    #
    set httpd port 2812 and
        #use address localhost  # only accept connection from localhost
        allow localhost        # allow localhost to connect to the server and
        allow x.x.x.x/24       # allow x.x.x.x/24 to connect to the server and
        allow w.w.w.w/24        # allow w.w.w.w/24 to connect to the server and
        allow y.y.y.y/24        # allow y.y.y.y/24 to connect to the server and
        allow admin:monit      # require user 'admin' with password xxxxx
        allow @xxxx           # allow users of group 'monit' to connect (rw)
        allow @users readonly  # allow users of group 'users' to connect readonly
    ## Check general system resources such as load average, cpu and memory
    ## usage. Each test specifies a resource, conditions and the action to be
    ## performed should a test fail.
    #
    check system sbc11n2-la.siptalk.com
        if loadavg (1min) > 12 then alert
        if loadavg (5min) > 10 then alert
        if memory usage > 89% then alert
        if swap usage > 0% then alert
    #    if cpu usage (user) > 70% then alert
    #    if cpu usage (system) > 30% then alert
    #    if cpu usage (wait) > 20% then alert
        if cpu usage (user) > 60% then alert
        if cpu usage (system) > 60% then alert
    # Includes
    include /etc/monit.d/*
    
    /etc/monit.d/xbroker
    
    check process xbroker matching "/usr/local/registrator/bin/xbroker -c"
        start program = "/usr/local/registrator/bin/xbroker_ctl start"
            as uid xcast and gid xcast
        stop program = "/usr/local/registrator/bin/xbroker_ctl stop"
            as uid xcast and gid xcast
        alert xxxxxx not on { instance, action }
        alert xxxxxz not on { instance, action }
        if failed
                    host hostxxxx port 5060 type udp protocol sip
                    with target test@hostxxxx and maxforward 0
                    with timeout 5 seconds
                    retry 2
    
    /etc/monit.d/system
    
    check process ntpd with pidfile "/var/run/ntpd.pid"
        every 4 cycles
       start program = "/etc/init.d/ntpd start"
       stop  program = "/etc/init.d/ntpd stop"
       if 5 restarts within 20 cycles then timeout
    check process rsyslogd with pidfile "/var/run/syslogd.pid"
       start program = "/etc/init.d/rsyslog start"
       stop  program = "/etc/init.d/rsyslog stop"
       if 5 restarts within 5 cycles then timeout
    check process sshd with pidfile "/var/run/sshd.pid"
       start program = "/etc/init.d/sshd start"
       stop  program = "/etc/init.d/sshd stop" sync
       #restart program = "/etc/init.d/sshd restart"
       if failed port 22 protocol ssh then restart
       if 5 restarts within 5 cycles then timeout
    check process mysql with pidfile "/var/lib/mysql/hostnamexxxx.pid"
        every 4 cycles
       start program = "/etc/init.d/mysql start"
       stop  program = "/etc/init.d/mysql stop"
       if failed host 127.0.0.1 port 3306 protocol mysql then alert
       if 5 restarts within 20 cycles then timeout
    check process winbindd with pidfile "/var/run/samba/winbindd.pid"
       start program = "/etc/init.d/winbind start"
       stop  program = "/etc/init.d/winbind stop"
       if 5 restarts within 5 cycles then timeout
    
  9. Alexander Litvak reporter

    And upstart

    /etc/init/monit.conf
    
    # To install disable the old way of doing things:
    #
    #   /etc/init.d/monit stop && update-rc.d -f monit remove
    #
    # then put this script here:
    #
    #   /etc/init/monit.conf
    #
    # and reload upstart configuration:
    #
    #   initctl reload-configuration
    #
    # You can manually start and stop monit like this:
    # 
    # start monit
    # stop monit
    #
    
    description "Monit service manager"
    
    limit core unlimited unlimited
    limit nofile 131072 196608
    
    start on runlevel [2345]
    stop on runlevel [!2345]
    
    expect daemon
    respawn
    
    exec /usr/bin/monit -c /etc/monitrc
    
    pre-stop script 
        exec /usr/bin/monit -c /etc/monitrc quit
    end script
    
  10. Tildeslash repo owner

    It is very strange ... the "ps" output shows that monit and rsyslogd (started via monit) are independent (PPID is 1/init in both cases). If monit stops, it doesn't send any signals to the monitored processes itself and as it's not parent to rsyslogd (rsyslogd runs as daemon too), it won't trigger any signal to rsyslogd.

    I have tried to replicate the issue on CentOS 6.5 - added monit to upstart using simple configuration, tried to restart rsyslogd via monit and then stopped monit ... works fine, rsyslogd keeps running.

    set daemon 5
    set httpd port 2812 allow monit:monit
    set logfile syslog
    
    check process rsyslogd with pidfile "/var/run/syslogd.pid"
       start program = "/etc/init.d/rsyslog start"
       stop  program = "/etc/init.d/rsyslog stop"
       if 5 restarts within 5 cycles then timeout
    

    The stop seems to be driven externally - by some 3rd party SW (puppet?)

    Can you start Monit outside of upstart control?:

    1. /sbin/stop monit #stop monit via upstart
    2. /usr/bin/monit #start monit manually
    3. /usr/bin/monit restart rsyslogd #restart rsyslogd process via monit
    4. /usr/bin/monit quit #stop monit
    5. check if rsyslogd was restarted at the same time or after monit stopped
  11. Alexander Litvak reporter

    Sorry for not updating this. It is still the mystery. I have tried all things you suggested and couldn't see anything abnormal. The problem was intermittent to say the list. Since I have not seen the re-occurrence of this lately, I suggest you close the issue. I can revisit it when it happens again hopefully with the better information next time.

  12. chelskyboy@gmail.com

    Hi,

    I had this issue when using Monit on centos7. When I stop or restart Monit by command "service monit stop|restart", my monitored applications are killed

  13. Log in to comment