Monit in LXC containers

Issue #310 resolved
Alexander Litvak
created an issue

I am running monit in LXC container and no matter what service I monitor it doesn't work.

When running monit status I get

Process 'mysql'
  status                            Running
  monitoring status                 Monitored
  pid                               1939
  parent pid                        1160
  uid                               497
  effective uid                     497
  gid                               497
  uptime                            
  children                          0
  memory                            489.2 MB
  memory total                      489.2 MB
  memory percent                    5.9%
  memory percent total              5.9%
  cpu percent                       0.0%
  cpu percent total                 0.0%
  port response time                FAILED to [127.0.0.1]:3306 type TCP/IP protocol MYSQL
  data collected                    Thu, 07 Jan 2016 02:22:19

or for sshd

Process 'sshd'
  status                            Running
  monitoring status                 Monitored
  pid                               1135
  parent pid                        1
  uid                               0
  effective uid                     0
  gid                               0
  uptime                            
  children                          0
  memory                            2.7 MB
  memory total                      2.7 MB
  memory percent                    0.0%
  memory percent total              0.0%
  cpu percent                       0.0%
  cpu percent total                 0.0%
  port response time                FAILED to [localhost]:22 type TCP/IP protocol SSH
  data collected                    Thu, 07 Jan 2016 02:22:19

or sip

Process 'xbroker'
  status                            Running
  monitoring status                 Monitored
  pid                               3480
  parent pid                        1
  uid                               500
  effective uid                     0
  gid                               500
  uptime                            
  children                          0
  memory                            110.0 MB
  memory total                      110.0 MB
  memory percent                    1.3%
  memory percent total              1.3%
  cpu percent                       0.0%
  cpu percent total                 0.0%
  port response time                FAILED to [xbroker8-chi.siptalk.com]:5060 type UDP/IP protocol SIP
  data collected                    Thu, 07 Jan 2016 02:22:19

Running in debugging mode shows nothing related.

Comments (12)

  1. Alexander Litvak reporter

    I have limited monit to a ssh monitoring

    monit -vvI
    Adding host allow 'localhost'
    Adding credentials for user 'admin'
    Runtime constants:
     Control file       = /etc/monit.conf
     Log file           = /var/log/monit
     Pid file           = /var/run/monit.pid
     Id file            = /root/.monit.id
     State file         = /root/.monit.state
     Debug              = True
     Log                = True
     Use syslog         = False
     Is Daemon          = True
     Use process engine = True
     Poll time          = 60 seconds with start delay 0 seconds
     Expect buffer      = 256 bytes
     Mail server(s)     = localhost:25 with timeout 30 seconds
     Mail from          = (not defined)
     Mail subject       = (not defined)
     Mail message       = (not defined)
     Start monit httpd  = True
     httpd bind address = localhost
     httpd portnumber   = 2812
     httpd ssl          = Disabled
     httpd signature    = Enabled
     httpd auth. style  = Basic Authentication and Host/Net allow list
     Alert mail to      = alexander.v.litvak@gmail.com
       Alert on         = All events
    
    The service list contains the following entries:
    
    Process Name          = sshd
     Pid file             = /var/run/sshd.pid
     Monitoring mode      = active
     Start program        = '/etc/init.d/sshd start' timeout 30 second(s)
     Stop program         = '/etc/init.d/sshd stop' timeout 30 second(s)
     Existence            = if does not exist then restart
     Port                 = if failed [localhost]:22 type TCP/IP protocol SSH with timeout 5 seconds then alert
     Timeout              = If restarted 5 times within 5 cycle(s) then unmonitor
    
    System Name           = tth212.teletownhall.us
     Monitoring mode      = active
    
    -------------------------------------------------------------------------------
    pidfile '/var/run/monit.pid' does not exist
    Starting Monit 5.14 daemon with http interface at [localhost]:2812
    Starting Monit HTTP server at [localhost]:2812
    Monit HTTP server started
    'tth212.teletownhall.us' Monit 5.14 started
    Sending Monit instance changed notification to alexander.v.litvak@gmail.com
    'sshd' process is running with pid 758
    'sshd' zombie check succeeded
    'sshd' process is running with pid 758
    'sshd' zombie check succeeded
    

    So basically this is what happens. No changes after that, it keeps repeating (process running and zombie check). Nothing looks abnormal at the backup screen However on the status screen I see:

    The Monit daemon 5.14 uptime: 
    
    Process 'sshd'
      status                            Running
      monitoring status                 Monitored
      pid                               758
      parent pid                        1
      uid                               0
      effective uid                     0
      gid                               0
      uptime                            
      children                          31
      memory                            2.6 MB
      memory total                      166.9 MB
      memory percent                    0.0%
      memory percent total              0.2%
      cpu percent                       0.0%
      cpu percent total                 0.0%
      port response time                FAILED to [localhost]:22 type TCP/IP protocol SSH
      data collected                    Thu, 07 Jan 2016 22:30:20
    
    System 'tth212.teletownhall.us'
      status                            Running
      monitoring status                 Monitored
      load average                      [0.24] [0.13] [0.09]
      cpu                               0.0%us 0.0%sy 0.0%wa
      memory usage                      19.1 GB [34.1%]
      swap usage                        6.7 MB [0.0%]
      data collected                    Thu, 07 Jan 2016 22:30:20
    

    No uptime displayed either. Log shows the same messages as standard out, no additional information. Using lxc 1.1.5 container running centos 6

  2. Tildeslash repo owner

    It seems that the process uptime read failed ... the status shows empty uptime, which results in connection test skip, as monit delays the connection test until the "start program" timeout will pass (to allow slow starting processes to start listening).

    Please can you check monit log for errors? (it is enabled using "set logfile <path>" or ""set logfile syslog" statement).

    Please can you try the following command?:

    cat /proc/uptime
    

    and for the process uptime:

    cat /proc/`cat /var/run/sshd.pid`/stat
    

    Could it be possible to get access to some LXC container, so we can test? We don't have LXC environment installed currently and it would help us if you can prepare a test environment.

  3. Tildeslash repo owner

    We have prepared a test version, which uses sysinfo() to get a system uptime API instead of /proc/uptime (which seems to fail in LXC).

    Please can you test it? You can get the release here: https://mmonit.com/tmp/monit-5.16_issue310.tar.gz

    You can compile it either as RPM:

    rpmbuild -tb monit-5.16_issue310.tar.gz
    

    Or manually:

    tar -xzf monit-5.16_issue310.tar.gz
    cd monit-5.16_issue310
    ./bootstrap
    ./configure
    make
    
  4. Alexander Litvak reporter

    Built from master:

    monit status
    The Monit daemon 5.16 uptime: 4m 
    
    Process 'sshd'
      status                            Running
      monitoring status                 Monitored
      pid                               758
      parent pid                        1
      uid                               0
      effective uid                     0
      gid                               0
      uptime                            12d 2h 19m 
      children                          24
      memory                            2.6 MB
      memory total                      138.0 MB
      memory percent                    0.0%
      memory percent total              0.2%
      cpu percent                       0.0%
      cpu percent total                 0.0%
      port response time                8.811 ms to [localhost]:22 type TCP/IP protocol SSH
      data collected                    Fri, 08 Jan 2016 13:11:36
    
    System 'tth212.teletownhall.us'
      status                            Running
      monitoring status                 Monitored
      load average                      [0.12] [0.10] [0.15]
      cpu                               0.1%us 0.1%sy 0.0%wa
      memory usage                      19.1 GB [34.1%]
      swap usage                        6.7 MB [0.0%]
      data collected                    Fri, 08 Jan 2016 13:11:36
    

    I see that in lxc uptime shows

    cat /proc/uptime 0.0 0.0

  5. Log in to comment