The problem with checking Unix-socket - Monit
The problem with checking Unix-socket - Monit
Hello. In service a Monit we was detect the problem. Between version 5.8 and 5.8.1 in this commit ( https://bitbucket.org/tildeslash/monit/commits/5aa93749973e632371cb5632c6c435f246331dcc ) when Monit checks the uptime of the process before perform the test socket availability. The source code in src/validate.c it looks like this:
if (s->portlist)
/* skip further tests during startup timeout */
if (s->start)
if (s->inf->priv.process.uptime < s->start->timeout) return TRUE;
for (pp = s->portlist; pp; pp = pp->next)
check_connection(s, pp);
The logic of this test is that a timeout has not yet occurred in the full launch of the process and check sockets early. At the same time, for unknown reasons, within the LXC container (presumably only 2.x) monit can not get the uptime of the process. The output monit status, it looks like this:
root@debian-8:~/monit-5.9/src# monit status
The Monit daemon 5.9 uptime: 20m
Process 'dockerd'
status Running
monitoring status Monitored
pid 224
parent pid 1
uid 0
effective uid 0
gid 0
uptime <<< No uptime info
children 1
memory kilobytes 15.9 MB
memory kilobytes total 20.7 MB
memory percent 1.5%
memory percent total 2.0%
cpu percent 0.0%
cpu percent total 0.0%
unix socket response time 0.000s to /var/run/docker.sock [HTTP]
data collected Mon, 27 Feb 2017 22:43:07
We used a test configuration:
check process dockerd with pidfile /var/run/docker.pid
start program = "/bin/systemctl start docker"
stop program = "/bin/systemctl stop docker"
if failed unixsocket /var/run/docker.sock protocol HTTP request "/version" then alert
And with «monit -c / etc / monit / monitrc -vv -I» can be seen as a monit parse the config:
Process Name = dockerd
Pid file = /var/run/docker.pid
Monitoring mode = active
Start program = '/bin/systemctl start docker' timeout 30 second(s)
Stop program = '/bin/systemctl stop docker' timeout 30 second(s)
Existence = if does not exist then restart
Pid = if changed then alert
PPid = if changed then alert
Unix Socket = if failed [/var/run/docker.sock [protocol HTTP] with timeout 5000 seconds] then restart
That is, by default monit waits 30 seconds per process run. But because uptime it does not parse, then 30 seconds is always more and call check_connection () is never executed.
The problem found in the following systems:
This issue is not detect to Ubuntu 14.04 with monit 5.6 and Debian 7 monit 5.4. At the same CentOS 6 has a monit 5.14 and then the bug is already too. brute force managed to find that 5.18 monit already able to get the uptime (in this case 5.17 are not going), but we discovered another problem. Here is:
if failed unixsocket /var/run/docker.sock protocol HTTP request "/version" then alert
In version 5.18 performs a HEAD request (instead of the usual GET), and API docker this demon does not know (but understands GET / version) and provides 404.
Comments (4)
-
repo owner -
repo owner We have implemented fallback to the GET method if HEAD failed in the next Monit release + also support for overriding the automatic method, for example:
if failed unixsocket /var/run/docker.sock protocol HTTP method GET #note: this is new for monit 5.22.0 or later, defaults to HEAD if not used (and no content/checksum test is enabled) request "/version" then alert
The next Monit release should work even without the "method GET" option due to the automatic fallback.
If you want to test it, you can get development snapshot:
wget https://bitbucket.org/tildeslash/monit/get/master.tar.gz tar -xzf master.tar.gz cd tildeslash* ./bootstrap ./configure make
-
repo owner cset 8584ce1f0a2a update: dropped the automatic HEAD->GET fallback which was added in cset e1c01a39af2e ... it could break some scenarios, such as if the user sets test for HEAD method that is expected to fail (because he want to make sure HEAD is not supported), example:
if failed unixsocket /var/run/docker.sock protocol HTTP method GET request "/version" then alert
-
repo owner - changed status to duplicate
Duplicate of
#500. - Log in to comment
Hello,
please can you test with the latest Monit version? (5.21.0) Lot of things has changed since Monit 5.8.1.
Regarding the HEAD method ... it's true Monit now prefers the HEAD method to save bandwidth, it switches to GET if either the response content test or checksum test is enabled. For example:
We will add support for changing the request method in the future.