Monit in LXC containers

Issue #310 resolved

Alexander Litvak created an issue 2016-01-07

I am running monit in LXC container and no matter what service I monitor it doesn't work.

When running monit status I get

Process 'mysql'
  status                            Running
  monitoring status                 Monitored
  pid                               1939
  parent pid                        1160
  uid                               497
  effective uid                     497
  gid                               497
  uptime                            
  children                          0
  memory                            489.2 MB
  memory total                      489.2 MB
  memory percent                    5.9%
  memory percent total              5.9%
  cpu percent                       0.0%
  cpu percent total                 0.0%
  port response time                FAILED to [127.0.0.1]:3306 type TCP/IP protocol MYSQL
  data collected                    Thu, 07 Jan 2016 02:22:19

or for sshd

Process 'sshd'
  status                            Running
  monitoring status                 Monitored
  pid                               1135
  parent pid                        1
  uid                               0
  effective uid                     0
  gid                               0
  uptime                            
  children                          0
  memory                            2.7 MB
  memory total                      2.7 MB
  memory percent                    0.0%
  memory percent total              0.0%
  cpu percent                       0.0%
  cpu percent total                 0.0%
  port response time                FAILED to [localhost]:22 type TCP/IP protocol SSH
  data collected                    Thu, 07 Jan 2016 02:22:19

or sip

Process 'xbroker'
  status                            Running
  monitoring status                 Monitored
  pid                               3480
  parent pid                        1
  uid                               500
  effective uid                     0
  gid                               500
  uptime                            
  children                          0
  memory                            110.0 MB
  memory total                      110.0 MB
  memory percent                    1.3%
  memory percent total              1.3%
  cpu percent                       0.0%
  cpu percent total                 0.0%
  port response time                FAILED to [xbroker8-chi.siptalk.com]:5060 type UDP/IP protocol SIP
  data collected                    Thu, 07 Jan 2016 02:22:19

Running in debugging mode shows nothing related.

Comments (12)

Alexander Litvak reporter
- edited description
- 2016-01-07T07:28:05+00:00
Alexander Litvak reporter
- assigned issue to
  
  Tildeslash
- 2016-01-07T08:55:51+00:00
Alexander Litvak reporter
Old version show monitoring as successful but nothing happens when I use tcpdump
- 2016-01-07T09:48:50+00:00
Alexander Litvak reporter
Any updates on this one?
- 2016-01-07T22:25:24+00:00
Tildeslash repo owner
please can you stop monit and start it in debug mode + send output?:
```
monit -vI
```
- 2016-01-08T01:03:17+00:00

Alexander Litvak reporter

I have limited monit to a ssh monitoring

monit -vvI
Adding host allow 'localhost'
Adding credentials for user 'admin'
Runtime constants:
 Control file       = /etc/monit.conf
 Log file           = /var/log/monit
 Pid file           = /var/run/monit.pid
 Id file            = /root/.monit.id
 State file         = /root/.monit.state
 Debug              = True
 Log                = True
 Use syslog         = False
 Is Daemon          = True
 Use process engine = True
 Poll time          = 60 seconds with start delay 0 seconds
 Expect buffer      = 256 bytes
 Mail server(s)     = localhost:25 with timeout 30 seconds
 Mail from          = (not defined)
 Mail subject       = (not defined)
 Mail message       = (not defined)
 Start monit httpd  = True
 httpd bind address = localhost
 httpd portnumber   = 2812
 httpd ssl          = Disabled
 httpd signature    = Enabled
 httpd auth. style  = Basic Authentication and Host/Net allow list
 Alert mail to      = alexander.v.litvak@gmail.com
   Alert on         = All events

The service list contains the following entries:

Process Name          = sshd
 Pid file             = /var/run/sshd.pid
 Monitoring mode      = active
 Start program        = '/etc/init.d/sshd start' timeout 30 second(s)
 Stop program         = '/etc/init.d/sshd stop' timeout 30 second(s)
 Existence            = if does not exist then restart
 Port                 = if failed [localhost]:22 type TCP/IP protocol SSH with timeout 5 seconds then alert
 Timeout              = If restarted 5 times within 5 cycle(s) then unmonitor

System Name           = tth212.teletownhall.us
 Monitoring mode      = active

-------------------------------------------------------------------------------
pidfile '/var/run/monit.pid' does not exist
Starting Monit 5.14 daemon with http interface at [localhost]:2812
Starting Monit HTTP server at [localhost]:2812
Monit HTTP server started
'tth212.teletownhall.us' Monit 5.14 started
Sending Monit instance changed notification to alexander.v.litvak@gmail.com
'sshd' process is running with pid 758
'sshd' zombie check succeeded
'sshd' process is running with pid 758
'sshd' zombie check succeeded

So basically this is what happens. No changes after that, it keeps repeating (process running and zombie check). Nothing looks abnormal at the backup screen However on the status screen I see:

The Monit daemon 5.14 uptime: 

Process 'sshd'
  status                            Running
  monitoring status                 Monitored
  pid                               758
  parent pid                        1
  uid                               0
  effective uid                     0
  gid                               0
  uptime                            
  children                          31
  memory                            2.6 MB
  memory total                      166.9 MB
  memory percent                    0.0%
  memory percent total              0.2%
  cpu percent                       0.0%
  cpu percent total                 0.0%
  port response time                FAILED to [localhost]:22 type TCP/IP protocol SSH
  data collected                    Thu, 07 Jan 2016 22:30:20

System 'tth212.teletownhall.us'
  status                            Running
  monitoring status                 Monitored
  load average                      [0.24] [0.13] [0.09]
  cpu                               0.0%us 0.0%sy 0.0%wa
  memory usage                      19.1 GB [34.1%]
  swap usage                        6.7 MB [0.0%]
  data collected                    Thu, 07 Jan 2016 22:30:20

No uptime displayed either. Log shows the same messages as standard out, no additional information. Using lxc 1.1.5 container running centos 6

2016-01-08T03:33:13+00:00

Tildeslash repo owner
- changed version to 5.14
- 2016-01-08T09:05:29+00:00
Tildeslash repo owner
It seems that the process uptime read failed ... the status shows empty uptime, which results in connection test skip, as monit delays the connection test until the "start program" timeout will pass (to allow slow starting processes to start listening).

Please can you check monit log for errors? (it is enabled using "set logfile <path>" or ""set logfile syslog" statement).

Please can you try the following command?:
```
cat /proc/uptime
```
and for the process uptime:
```
cat /proc/`cat /var/run/sshd.pid`/stat
```
Could it be possible to get access to some LXC container, so we can test? We don't have LXC environment installed currently and it would help us if you can prepare a test environment.
- 2016-01-08T13:05:34+00:00
Tildeslash repo owner
We have prepared a test version, which uses sysinfo() to get a system uptime API instead of /proc/uptime (which seems to fail in LXC).

Please can you test it? You can get the release here: https://mmonit.com/tmp/monit-5.16_issue310.tar.gz

You can compile it either as RPM:
```
rpmbuild -tb monit-5.16_issue310.tar.gz
```
Or manually:
```
tar -xzf monit-5.16_issue310.tar.gz
cd monit-5.16_issue310
./bootstrap
./configure
make
```
- 2016-01-08T13:52:12+00:00

Alexander Litvak reporter

Built from master:

monit status
The Monit daemon 5.16 uptime: 4m 

Process 'sshd'
  status                            Running
  monitoring status                 Monitored
  pid                               758
  parent pid                        1
  uid                               0
  effective uid                     0
  gid                               0
  uptime                            12d 2h 19m 
  children                          24
  memory                            2.6 MB
  memory total                      138.0 MB
  memory percent                    0.0%
  memory percent total              0.2%
  cpu percent                       0.0%
  cpu percent total                 0.0%
  port response time                8.811 ms to [localhost]:22 type TCP/IP protocol SSH
  data collected                    Fri, 08 Jan 2016 13:11:36

System 'tth212.teletownhall.us'
  status                            Running
  monitoring status                 Monitored
  load average                      [0.12] [0.10] [0.15]
  cpu                               0.1%us 0.1%sy 0.0%wa
  memory usage                      19.1 GB [34.1%]
  swap usage                        6.7 MB [0.0%]
  data collected                    Fri, 08 Jan 2016 13:11:36

I see that in lxc uptime shows

cat /proc/uptime 0.0 0.0

2016-01-08T18:12:58+00:00

Tildeslash repo owner
- changed status to resolved
Fix Issue ~~#310~~: Linux LXC container: the connection test was skipped and port status showed a failure (problem fixed by https://bitbucket.org/tildeslash/monit/commits/44464d60a812db2e1e16c75b64aada70ab9afea4)

→ <<cset 49c183b4939c>>
- 2016-01-08T19:31:44+00:00
Tildeslash repo owner
- removed version
Removing version: 5.14 (automated comment)
- 2016-06-19T18:47:48+00:00
Log in to comment

Assignee: Tildeslash

Type: bug

Priority: major

Status: resolved

Component: Monit

Version: –

Votes: 0

Watchers: 1