Regression, failed link check generates superfluous alerts
With a config containing:
check network External
interface eth0
if failed link then alert
and then remove the ethernet cable from eth0, I now receive many emails …
Link down Service External
Date: Thu, 25 Jul 2019 09:10:53
Action: alert
Host: pbx4
Description: link down
Link up Service External
Date: Thu, 25 Jul 2019 09:11:53
Action: alert
Host: pbx4
Description: link data collection succeeded
Link down Service External
Date: Thu, 25 Jul 2019 09:11:54
Action: alert
Host: pbx4
Description: link down
Link up Service External
Date: Thu, 25 Jul 2019 09:12:54
Action: alert
Host: pbx4
Description: link data collection succeeded
(each entry a separate email alert) all while the eth0 interface has no cable connected.
BTW, I tried “changed” instead of “failed” with the same issue.
I’m currently using 5.26.0, 5.25.3 has the same issue, but I’m not sure when the regression occurred.
Since I compile from source, I can easily test a patch fix.
Comments (3)
-
Account Deleted reporter -
Account Deleted reporter I tested this patch, and it fixes this issue.
--- monit-5.26.0/src/validate.c.orig 2019-07-25 14:34:01.725453914 -0500 +++ monit-5.26.0/src/validate.c 2019-07-25 14:34:54.548704707 -0500 @@ -1762,9 +1762,6 @@ END_TRY; if (! havedata) return State_Failed; // Terminate test if no data are available - for (LinkStatus_T link = s->linkstatuslist; link; link = link->next) { - Event_post(s, Event_Link, State_Succeeded, link->action, "link data collection succeeded"); - } // State if (! Link_getState(s->inf.net->stats)) { for (LinkStatus_T link = s->linkstatuslist; link; link = link->next)
Before commit 5dc268139 this
for
loop generated an invalid event, so "fixing" the event type actually causes a problem.Possibly this code was added for debugging at one time, but was never removed ?
-
repo owner - changed status to resolved
Fixed: Issue
#840: Network check: superfluous alerts when the link is down. Thanks to Lonnie Abelbeck.→ <<cset f3bea23a52db>>
- Log in to comment
More info:
This regression occurs between versions 5.25.1 and 5.25.2
I’m thinking this commit may be the culprit:
https://bitbucket.org/tildeslash/monit/commits/5dc268139ca2f4cb68b9aff55fe8d0dea0e95070
Possibly this needed to be fixed, (
Event_Size
→Event_Link
) but shouldn’t there be some test forState_Succeeded
or is thisfor
loop not needed at all ?