Update for issue #115
Seems issue #115 isn't completely resolved, more information is in the latest comment of it.
Comments (26)
-
repo owner -
repo owner - changed status to resolved
-
reporter Excuse me, but it's not only about reporting values in syslog.
When I try to use non-default values for timeout (or count), ping check simply doesn't work as expected.
-
repo owner What exactly if the problem with non-default values?
Tried the following configuration, works fine:
check host myhost with address 127.0.0.1 if failed ping count 5 with timeout 10 seconds then alert
-
reporter Have you tried to check non-available host? i.e. 127.0.0.2. It doesn't alert for me.
-
reporter oh, 127.0.0.2 is not so good example. 155.155.155.155 should be fine.
-
repo owner yes, it works:
Ping response for 127.0.0.2 1/5 timed out -- no response within 10 seconds Ping response for 127.0.0.2 2/5 timed out -- no response within 10 seconds Ping response for 127.0.0.2 3/5 timed out -- no response within 10 seconds Ping response for 127.0.0.2 4/5 timed out -- no response within 10 seconds Ping response for 127.0.0.2 5/5 timed out -- no response within 10 seconds 'localhost' ping test failed 'localhost' icmp ping failed, skipping any port connection tests
-
reporter too bad, but for me it doesn't:
Remote Host Name = myhost Address = 155.155.155.155 Monitoring mode = active Existence = if does not exist then restart Ping = if failed [count 5 with timeout 10 seconds] then alert
version 5.11
-
reporter Build info:
| Monit has been configured with the following options: | | | | PAM support: DISABLED | | SSL support: ENABLED | | Large files support: ENABLED | | Optimized: ENABLED |
Architecture: LINUX SSL include directory: /usr/include SSL library directory: /usr/lib Compiler flags: -Wno-address -Wno-pointer-sign -march=core2 -fPIC -DPIC -pipe -Os -O3 -Wall -Wunused -Wno-unused-label -funsigned-char -D_GNU_SOURCE -std=c99 -D _REENTRANT -I/usr/include Linker flags: -lpthread -lcrypt -lresolv -lnsl -L/usr/lib -lssl -lcrypto pid file location: /run Install directory: /usr
-
repo owner It's just configuration dump - not error. Please can you run Monit in debug mode and post the output related to ping tests?:
monit -vI
-
reporter sure (
/sbin/monit -c /etc/sysconfig/monitrc -p /var/run/monit.pid -vI
): -
repo owner Are you running Monit as root user or regular user? Some platforms require root privilege to be able to create raw socket, which is used for ping.
Please also run Monit in debug mode at least for 1 minute ... each (unsuccessful) ping attempt will block for 10s before producing any output.
-
reporter Yes, it is running as root.
I was waiting for >10 minutes.
I previously posted build info. Maybe optimized build is the cause? In my experience, sometimes it was.
-
reporter Hah, it had been running for more than an hour, and what do I see in the logs:
Jan 06 14:40:00 [monit] Starting Monit HTTP server at [localhost:2812]_ Jan 06 14:40:00 [monit] Monit HTTP server started_ Jan 06 14:40:00 [monit] 'a' Monit started_ Jan 06 15:04:02 [monit] Ping response for 155.155.155.155 1/5 timed out -- no response within 10000 seconds_ Jan 06 15:18:02 [monit] Ping response for 155.155.155.155 2/5 timed out -- no response within 10000 seconds_ Jan 06 15:33:52 [monit] Ping response for 155.155.155.155 3/5 timed out -- no response within 10000 seconds_
so it's definetely about seconds conversion.
-
repo owner Thanks for update. The seconds conversion is fixed in the development version.
-
reporter Full log this time:
Jan 06 15:47:32 [monit] Starting Monit 5.11 daemon with http interface at [localhost:2812]_ Jan 06 15:47:32 [monit] Starting Monit HTTP server at [localhost:2812]_ Jan 06 15:47:32 [monit] Monit HTTP server started_ Jan 06 15:47:32 [monit] 'a' Monit started_ Jan 06 16:15:32 [monit] Ping response for 155.155.155.155 1/5 timed out -- no response within 10000 seconds_ Jan 06 16:22:41 [monit] Ping response for 155.155.155.155 2/5 failed -- received 28 bytes, expected at least 48 bytes_ Jan 06 16:39:54 [monit] Ping response for 155.155.155.155 3/5 failed -- received 40 bytes, expected at least 48 bytes_ Jan 06 16:52:22 [monit] Ping response for 155.155.155.155 4/5 timed out -- no response within 10000 seconds_ Jan 06 17:08:42 [monit] Ping response for 155.155.155.155 5/5 timed out -- no response within 10000 seconds_ Jan 06 17:08:42 [monit] 'myhost' ping test failed_
-
reporter Oh, awesome, let me apply the patch to 5.11 and test it.
-
repo owner you can get the development version:
git clone git@bitbucket.org:tildeslash/monit.git --recursive cd monit ./bootstrap ./configure make
-
reporter Thanks,
but could you show me a commit between 5.11 and devel that fixes the issue? I still cannot find it except those which fix only display values of timeout.
-
reporter Oh, I thought you've just fixed it in devel. I've cloned the devel branch and recompiled monit with the same configure options. Maybe display values of timeout in syslog are fixed, but ping test still doesn't work.
-
reporter Here's yet another log for 5.12-devel from starting till the first ping test failure e-mail:
Jan 06 17:48:54 [monit] Starting Monit 5.12_devel daemon with http interface at [localhost:2812]_ Jan 06 17:48:54 [monit] Starting Monit HTTP server at [localhost:2812]_ Jan 06 17:48:54 [monit] Monit HTTP server started_ Jan 06 17:48:54 [monit] 'a' Monit started_ Jan 06 18:00:42 [monit] Ping response for 155.155.155.155 1/5 timed out -- no response within 10 seconds_ Jan 06 18:19:59 [monit] Ping response for 155.155.155.155 2/5 failed -- received 28 bytes, expected at least 48 bytes_ Jan 06 18:39:02 [monit] Ping response for 155.155.155.155 3/5 timed out -- no response within 10 seconds_ Jan 06 18:54:22 [monit] Ping response for 155.155.155.155 4/5 timed out -- no response within 10 seconds_ Jan 06 19:14:02 [monit] Ping response for 155.155.155.155 5/5 timed out -- no response within 10 seconds_ Jan 06 19:14:02 [monit] 'myhost' ping test failed_
notice the timestamps
-
reporter And you can see what happens once I remove
count 5 with timeout 10 seconds
from the config:Jan 06 19:59:04 [monit] Starting Monit 5.12_devel daemon with http interface at [localhost:2812]_ Jan 06 19:59:04 [monit] Starting Monit HTTP server at [localhost:2812]_ Jan 06 19:59:04 [monit] Monit HTTP server started_ Jan 06 19:59:04 [monit] 'a' Monit started_ Jan 06 19:59:10 [monit] Ping response for 155.155.155.155 1/3 timed out -- no response within 5 seconds_ Jan 06 19:59:18 [monit] Ping response for 155.155.155.155 2/3 timed out -- no response within 5 seconds_ Jan 06 19:59:28 [monit] Ping response for 155.155.155.155 3/3 timed out -- no response within 5 seconds_ Jan 06 19:59:28 [monit] 'myhost' ping test failed_
notice the timestamps as well
-
reporter Yes, it's only about specifying timeout. If I add only
count 5
:check host myhost with address 155.155.155.155 if failed ping count 5 # with timeout 10 seconds then alert
everything works fine.
-
repo owner There was problem with timeout subtraction on read (seconds were not converted to milliseconds), the following patch fixes it: https://bitbucket.org/tildeslash/monit/commits/3ee50311b87b/
Please can you test it? I'm unable to replicate the problem.
-
reporter yes, indeed. the patch above fixes the problem.
thanks a lot for cooperation!
-
repo owner - removed version
Removing version: 5.11 (automated comment)
- Log in to comment
thanks for report, fixed