We had a situation a while ago where a deployment caused a client stampede, causing multiple servers to stop responding to health checks while they serviced clients, which caused monit to put them in a reboot loop.

While we are now writing more forgiving health checks, something like exponential or linear backoff on checks or restarts would be quite useful, with parameters for min/max period.

    Hey, thanks for the reply. I'm aware of this option (and we use it now), just thinking that backoff would be a more graceful way to do it, such that monitoring doesn't get completely shut off (in case the failure was due to, say, a local or remote dependency).

