hosts check is too long

Issue #254 resolved
Alexey Elfman created an issue

I'm using monit to checking uptime of several websites.

My sample config is:

check host example.com with address example.com
    if failed
    port 80 protocol http and content = ".*example.*"
    with timeout 2 seconds
    for 2 times within 3 cycles
    then alert

Looks like monit tooks 5 to 10 seconds for each hosts. Here are apache logs:

example1.com 5.9.xx.xx - - [23/Sep/2015:18:42:22 +0200] "GET / HTTP/1.1" 200 13296 "-" "Monit/5.14" 0s 726995us
example2.com 5.9.xx.xx - - [23/Sep/2015:18:42:31 +0200] "GET / HTTP/1.1" 200 14844 "-" "Monit/5.14" 0s 549957us
example3.com 5.9.xx.xx - - [23/Sep/2015:18:42:41 +0200] "GET / HTTP/1.1" 200 14827 "-" "Monit/5.14" 0s 501825us
example4.com 5.9.xx.xx - - [23/Sep/2015:18:42:50 +0200] "GET / HTTP/1.1" 200 15221 "-" "Monit/5.14" 0s 855563us
example5.com 5.9.xx.xx - - [23/Sep/2015:18:43:00 +0200] "GET / HTTP/1.1" 200 15366 "-" "Monit/5.14" 0s 817877us
example6.com 5.9.xx.xx - - [23/Sep/2015:18:43:09 +0200] "GET / HTTP/1.1" 200 14877 "-" "Monit/5.14" 0s 981314us
example7.com 5.9.xx.xx - - [23/Sep/2015:18:43:19 +0200] "GET / HTTP/1.1" 200 14800 "-" "Monit/5.14" 0s 391129us
example8.com 5.9.xx.xx - - [23/Sep/2015:18:43:23 +0200] "GET / HTTP/1.1" 200 14553 "-" "Monit/5.14" 0s 312863us
example9.com 5.9.xx.xx - - [23/Sep/2015:18:43:30 +0200] "GET / HTTP/1.1" 200 12018 "-" "Monit/5.14" 0s 514986us

Last 2 columns is page generation time. All pages are generated in 0.3 - 0.8 seconds. But delays between page loads are 8-10 seconds.

What monit did the rest?

Looks like monit is busy with website checks. Reload/restart is only done after all websites have been checked (so, reload applied to monit in 60-80 seconds at my server).

Server is not busy at the moment. It's 4-cores i7 with 64gb of ram and almost no CPU and IO load. Monit is at latest version - 5.14.

Looks like bad sleeps somewhere in source code.

Comments (5)

  1. Tildeslash repo owner

    Please can you provide the following data?:

    1. run monit in debug mode ("monit -vI") and send output to support@mmonit.com
    2. if it is possible to access the target webservers from internet, please send their list (or monit configuration) to support@mmonit.com, so we can try to reproduce the issue
  2. Tildeslash repo owner

    Thanks for data. The problem is related to chunked transfer encoding, which is currently not implemented in the http protocol test, so when reading the data, monit doesn't know how large it is and waits for read timeout after last byte was received. We'll fix.

  3. Tildeslash repo owner

    Fix Issue #254 : The HTTP protocol test pauses monit for few seconds when content match is used and the server sends response using chunked encoding (note: this is workaround for the read timeout, final solution will come with refactoring - will use input stream with chunked encoding support).

    → <<cset 6aeafddb594a>>

  4. Log in to comment