hosts check is too long

Issue #254 resolved
Alexey Elfman created an issue

I'm using monit to checking uptime of several websites.

My sample config is:

check host with address
    if failed
    port 80 protocol http and content = ".*example.*"
    with timeout 2 seconds
    for 2 times within 3 cycles
    then alert

Looks like monit tooks 5 to 10 seconds for each hosts. Here are apache logs: 5.9.xx.xx - - [23/Sep/2015:18:42:22 +0200] "GET / HTTP/1.1" 200 13296 "-" "Monit/5.14" 0s 726995us 5.9.xx.xx - - [23/Sep/2015:18:42:31 +0200] "GET / HTTP/1.1" 200 14844 "-" "Monit/5.14" 0s 549957us 5.9.xx.xx - - [23/Sep/2015:18:42:41 +0200] "GET / HTTP/1.1" 200 14827 "-" "Monit/5.14" 0s 501825us 5.9.xx.xx - - [23/Sep/2015:18:42:50 +0200] "GET / HTTP/1.1" 200 15221 "-" "Monit/5.14" 0s 855563us 5.9.xx.xx - - [23/Sep/2015:18:43:00 +0200] "GET / HTTP/1.1" 200 15366 "-" "Monit/5.14" 0s 817877us 5.9.xx.xx - - [23/Sep/2015:18:43:09 +0200] "GET / HTTP/1.1" 200 14877 "-" "Monit/5.14" 0s 981314us 5.9.xx.xx - - [23/Sep/2015:18:43:19 +0200] "GET / HTTP/1.1" 200 14800 "-" "Monit/5.14" 0s 391129us 5.9.xx.xx - - [23/Sep/2015:18:43:23 +0200] "GET / HTTP/1.1" 200 14553 "-" "Monit/5.14" 0s 312863us 5.9.xx.xx - - [23/Sep/2015:18:43:30 +0200] "GET / HTTP/1.1" 200 12018 "-" "Monit/5.14" 0s 514986us

Last 2 columns is page generation time. All pages are generated in 0.3 - 0.8 seconds. But delays between page loads are 8-10 seconds.

What monit did the rest?

Looks like monit is busy with website checks. Reload/restart is only done after all websites have been checked (so, reload applied to monit in 60-80 seconds at my server).

Server is not busy at the moment. It's 4-cores i7 with 64gb of ram and almost no CPU and IO load. Monit is at latest version - 5.14.

Looks like bad sleeps somewhere in source code.

Comments (5)

  1. Tildeslash repo owner

    Please can you provide the following data?:

    1. run monit in debug mode ("monit -vI") and send output to
    2. if it is possible to access the target webservers from internet, please send their list (or monit configuration) to, so we can try to reproduce the issue
  2. Tildeslash repo owner

    Thanks for data. The problem is related to chunked transfer encoding, which is currently not implemented in the http protocol test, so when reading the data, monit doesn't know how large it is and waits for read timeout after last byte was received. We'll fix.

  3. Tildeslash repo owner

    Fix Issue #254 : The HTTP protocol test pauses monit for few seconds when content match is used and the server sends response using chunked encoding (note: this is workaround for the read timeout, final solution will come with refactoring - will use input stream with chunked encoding support).

    → <<cset 6aeafddb594a>>

  4. Log in to comment