- edited description
Monit in LXC containers
I am running monit in LXC container and no matter what service I monitor it doesn't work.
When running monit status I get
Process 'mysql' status Running monitoring status Monitored pid 1939 parent pid 1160 uid 497 effective uid 497 gid 497 uptime children 0 memory 489.2 MB memory total 489.2 MB memory percent 5.9% memory percent total 5.9% cpu percent 0.0% cpu percent total 0.0% port response time FAILED to [127.0.0.1]:3306 type TCP/IP protocol MYSQL data collected Thu, 07 Jan 2016 02:22:19
or for sshd
Process 'sshd' status Running monitoring status Monitored pid 1135 parent pid 1 uid 0 effective uid 0 gid 0 uptime children 0 memory 2.7 MB memory total 2.7 MB memory percent 0.0% memory percent total 0.0% cpu percent 0.0% cpu percent total 0.0% port response time FAILED to [localhost]:22 type TCP/IP protocol SSH data collected Thu, 07 Jan 2016 02:22:19
or sip
Process 'xbroker' status Running monitoring status Monitored pid 3480 parent pid 1 uid 500 effective uid 0 gid 500 uptime children 0 memory 110.0 MB memory total 110.0 MB memory percent 1.3% memory percent total 1.3% cpu percent 0.0% cpu percent total 0.0% port response time FAILED to [xbroker8-chi.siptalk.com]:5060 type UDP/IP protocol SIP data collected Thu, 07 Jan 2016 02:22:19
Running in debugging mode shows nothing related.
Comments (12)
-
reporter -
reporter -
assigned issue to
-
assigned issue to
-
reporter Old version show monitoring as successful but nothing happens when I use tcpdump
-
reporter Any updates on this one?
-
repo owner please can you stop monit and start it in debug mode + send output?:
monit -vI
-
reporter I have limited monit to a ssh monitoring
monit -vvI Adding host allow 'localhost' Adding credentials for user 'admin' Runtime constants: Control file = /etc/monit.conf Log file = /var/log/monit Pid file = /var/run/monit.pid Id file = /root/.monit.id State file = /root/.monit.state Debug = True Log = True Use syslog = False Is Daemon = True Use process engine = True Poll time = 60 seconds with start delay 0 seconds Expect buffer = 256 bytes Mail server(s) = localhost:25 with timeout 30 seconds Mail from = (not defined) Mail subject = (not defined) Mail message = (not defined) Start monit httpd = True httpd bind address = localhost httpd portnumber = 2812 httpd ssl = Disabled httpd signature = Enabled httpd auth. style = Basic Authentication and Host/Net allow list Alert mail to = alexander.v.litvak@gmail.com Alert on = All events The service list contains the following entries: Process Name = sshd Pid file = /var/run/sshd.pid Monitoring mode = active Start program = '/etc/init.d/sshd start' timeout 30 second(s) Stop program = '/etc/init.d/sshd stop' timeout 30 second(s) Existence = if does not exist then restart Port = if failed [localhost]:22 type TCP/IP protocol SSH with timeout 5 seconds then alert Timeout = If restarted 5 times within 5 cycle(s) then unmonitor System Name = tth212.teletownhall.us Monitoring mode = active ------------------------------------------------------------------------------- pidfile '/var/run/monit.pid' does not exist Starting Monit 5.14 daemon with http interface at [localhost]:2812 Starting Monit HTTP server at [localhost]:2812 Monit HTTP server started 'tth212.teletownhall.us' Monit 5.14 started Sending Monit instance changed notification to alexander.v.litvak@gmail.com 'sshd' process is running with pid 758 'sshd' zombie check succeeded 'sshd' process is running with pid 758 'sshd' zombie check succeeded
So basically this is what happens. No changes after that, it keeps repeating (process running and zombie check). Nothing looks abnormal at the backup screen However on the status screen I see:
The Monit daemon 5.14 uptime: Process 'sshd' status Running monitoring status Monitored pid 758 parent pid 1 uid 0 effective uid 0 gid 0 uptime children 31 memory 2.6 MB memory total 166.9 MB memory percent 0.0% memory percent total 0.2% cpu percent 0.0% cpu percent total 0.0% port response time FAILED to [localhost]:22 type TCP/IP protocol SSH data collected Thu, 07 Jan 2016 22:30:20 System 'tth212.teletownhall.us' status Running monitoring status Monitored load average [0.24] [0.13] [0.09] cpu 0.0%us 0.0%sy 0.0%wa memory usage 19.1 GB [34.1%] swap usage 6.7 MB [0.0%] data collected Thu, 07 Jan 2016 22:30:20
No uptime displayed either. Log shows the same messages as standard out, no additional information. Using lxc 1.1.5 container running centos 6
-
repo owner - changed version to 5.14
-
repo owner It seems that the process uptime read failed ... the status shows empty uptime, which results in connection test skip, as monit delays the connection test until the "start program" timeout will pass (to allow slow starting processes to start listening).
Please can you check monit log for errors? (it is enabled using "set logfile <path>" or ""set logfile syslog" statement).
Please can you try the following command?:
cat /proc/uptime
and for the process uptime:
cat /proc/`cat /var/run/sshd.pid`/stat
Could it be possible to get access to some LXC container, so we can test? We don't have LXC environment installed currently and it would help us if you can prepare a test environment.
-
repo owner We have prepared a test version, which uses sysinfo() to get a system uptime API instead of /proc/uptime (which seems to fail in LXC).
Please can you test it? You can get the release here: https://mmonit.com/tmp/monit-5.16_issue310.tar.gz
You can compile it either as RPM:
rpmbuild -tb monit-5.16_issue310.tar.gz
Or manually:
tar -xzf monit-5.16_issue310.tar.gz cd monit-5.16_issue310 ./bootstrap ./configure make
-
reporter Built from master:
monit status The Monit daemon 5.16 uptime: 4m Process 'sshd' status Running monitoring status Monitored pid 758 parent pid 1 uid 0 effective uid 0 gid 0 uptime 12d 2h 19m children 24 memory 2.6 MB memory total 138.0 MB memory percent 0.0% memory percent total 0.2% cpu percent 0.0% cpu percent total 0.0% port response time 8.811 ms to [localhost]:22 type TCP/IP protocol SSH data collected Fri, 08 Jan 2016 13:11:36 System 'tth212.teletownhall.us' status Running monitoring status Monitored load average [0.12] [0.10] [0.15] cpu 0.1%us 0.1%sy 0.0%wa memory usage 19.1 GB [34.1%] swap usage 6.7 MB [0.0%] data collected Fri, 08 Jan 2016 13:11:36
I see that in lxc uptime shows
cat /proc/uptime 0.0 0.0
-
repo owner - changed status to resolved
Fix Issue
#310: Linux LXC container: the connection test was skipped and port status showed a failure (problem fixed by https://bitbucket.org/tildeslash/monit/commits/44464d60a812db2e1e16c75b64aada70ab9afea4)→ <<cset 49c183b4939c>>
-
repo owner - removed version
Removing version: 5.14 (automated comment)
- Log in to comment