Connection testing: RETRY has no effect in case of failure

Issue #211 resolved

Vovodroid created an issue 2015-06-06

Documentation says:
retry: RETRY number. Optionally specifies the number of consecutive retries within the same testing cycle in the case that the connection failed. The default is fail on first error.

But it seems that RETRY option has effect only in the case of connection timeout. If connection test fails (whether on TCP reject or status response) test is considered failed and ACTION is performed.

This can cause interference with auto-restarting services. Assume that service (daemon) is being restarted (by user or system), monit recognizes connection test failure and issue new stop/start command.

Step to reproduce: assume following monit.rc:

.........................................
set daemon  5 
.........................................
check process nginx with pidfile /var/run/nginx.pid
  start program = "/usr/bin/systemctl start nginx"
  stop  program = "/usr/bin/systemctl stop nginx"
  every 12 cycles
  if failed port 80 protocol http retry 10  then alert
..........................................

So nginx is monitored once per minute.

nginx configuration:

server {
    listen 80;
    return 200;
}

If change listen port or return code to 500 and run nginx -s reload monit recognizes failure instantly at the next checking cycle, without giving to nginx any chance. I expected that test will be tried RETRY times with some delay between them.

I suggest to apply TIMEOUT options also to failed test, i.e. in such case wait TIMEOUT value, counting this time from very beginning of the test, and perform RETRY attempts. And only after time=TIMEOUT * RETRY consider test failed.

Regards.

Comments (6)

Tildeslash repo owner
- changed status to resolved
Hello,

if you want to use retry with delay, then use "for X cycles", for example:
```
if failed port 80 protocol http for 10 cycles then alert
```
The RETRY option allows to retry the connection only in the same cycle with no delay between the attempts.
- 2015-06-08T10:44:07+00:00
Vovodroid reporter
------The RETRY option allows to retry the connection only in the same cycle with no delay between the attempts.
That's exactly what I complain about )))

----use "for X cycles"
Well, it's not the same. For example one could like to test some service one per hour, and restart if it doesn't response in five minutes (i.e. in cycle, but still with some fault tolerance), not in two hours.
- 2015-06-08T11:00:47+00:00
Tildeslash repo owner
Currently the pause between checks is given by the cycle length and "every" statement only. The RETRY option was designed to allow to retry in the same cycle and doesn't have its own retry scheduler.

We're in the process of starting work on new test scheduler, which will be more flexible then the current cycle+every scheduling.
- 2015-06-08T11:17:28+00:00
Tildeslash repo owner
- removed version
Removing version: 5.13 (automated comment)
- 2016-06-19T18:47:46+00:00

Vovodroid reporter

It seems that for X cycles uses not global daemon cycle, but final service cycle (global period * service cycle).
Config:

set daemon  60 

check host example.com address 127.0.0.1
  every 10 cycles
  start  = "/usr/bin/docker start example.com"
  stop   = "/usr/bin/docker kill   example.com"
  if failed host example.com port 443 protocol https for 1 cycles then restart

Result:

[UTC Jul 16 11:08:32] error    : 'example.com' failed protocol test [HTTP] at [example.com]:443 [TCP/IP SSL] -- Connection refused
[UTC Jul 16 11:08:32] info     : 'example.com' trying to restart
[UTC Jul 16 11:08:32] info     : 'example.com' stop: /usr/bin/docker
[UTC Jul 16 11:08:32] info     : 'example.com' start: /usr/bin/docker
[UTC Jul 16 11:18:34] info     : 'example.com' connection succeeded to [example.com]:443 [TCP/IP SSL]

So it took ten minutes to discover that service is alive after it was successfully started. Is it worth to use global cycle in for X cycles ?

2016-07-16T11:35:45+00:00

Tildeslash repo owner
@to_vova yes, there is standalone task to clarify the "for X cycles" in combination with "every" statement: https://bitbucket.org/tildeslash/monit/issues/174/the-for-x-cycles-is-confusing-if-the-test
- 2016-08-03T08:36:18+00:00
Log in to comment

Assignee: Tildeslash

Type: bug

Priority: major

Status: resolved

Component: Monit

Version: –

Votes: 0

Watchers: 2