check network

Issue #137 closed
Lonnie Abelbeck created an issue

Hi,

Thanks much for the new "check network", very useful in 5.11, but a few minor issues...

1) Need a "check network" example in the sample monitrc

2) Your examples on your web site are missing a label (assume "with" is not the desired label" :-) )

check network with interface eth0

should be something like...

check network WAN with interface eth0

or I prefer the format...

check network WAN
  interface eth0

3) The "check network" link test is implicit, I would suggest it should not be. As such an interface can't be monitored without using the global email for the link status. You fixed the "check process" pid/ppid in 5.11 for the same reason.

It is confusing if there are implicit alerts and not all are specified in the config.

Comments (22)

  1. Tildeslash repo owner

    Hi,

    thanks for feedback and suggestion, we have added the 'check network' example to sample monitrc.

    The missing service name in the Monit presentation is fixed.

    Regarding the implicit link failure test ... the "existence" test is always implicit even if not defined in the configuration file for each service type, for example:

    1.) 'check process' fires alert if the alert doesn't exist 2.) 'check file' fires alert if the file doesn't exist

    We though the link status test is always interesting and added it by default along the same lines, if the real-world usage will not match, we'll remove the implicit link status test.

    Best regards, The Monit team

  2. Lonnie Abelbeck reporter

    Thanks for the quick resolution.

    As far as item 3) and implicit tests... there are situations where using Monit to simply monitor and display status info in the web interface is all that is desired, no external alerts. The "hidden" implicit tests make this difficult. Previous to 5.11 I wanted to simply monitor a process, say dnsmasq

    check process dnsmasq
      pidfile /var/run/dnsmasq.pid
    

    but that generated an alert every time dnsmasq was restarted (prior to 5.11)

    so I tried...

    check process dnsmasq
      pidfile /var/run/dnsmasq.pid
      noalert
    

    and...

    check process dnsmasq
      pidfile /var/run/dnsmasq.pid
      noalert alert
    

    neither of the two above disable all global alerts, only if you know what the actual global alert value(s) are can "noalert" be used.

    Another possible solution would be to add a global keyword that would enable/disable all implicit tests, defaulting to the current for backward compatibility.

  3. Tildeslash repo owner

    The "noalert" statement requires the email-address which shouldn't receive alerts, correct syntax is:

    check process dnsmasq
        pidfile /var/run/dnsmasq.pid
        noalert foo@bar
    
  4. Lonnie Abelbeck reporter

    Yes, I understand that, but that requires knowing what the global alert(s) are to disable them. Additionally if the global alert(s) were changed then a lot of work is rquired to update all the "noalert" entries.

    If either only "noalert" or "noalert alert" would remove all the global alert(s) for that check, that would be very useful.

    Though, optionally being able to disable all implicit tests would be the most useful solution, then only the alerts specified would be used.

  5. Lonnie Abelbeck reporter

    That doesn't seem to work, with:

    set alert foo@bar not on { action, instance, nonexist }
    

    and only...

    check network internel_1
      interface eth1
    

    when the eth1 link is down I still get emails because of the implicit tests of "check network".

  6. Lonnie Abelbeck reporter

    Yes, I know how to ignore all "link" events for a global alert, but that is not this issue...

    I want to specify one "check network" to "if failed link then alert" and another "check network" to not alert.

    check network External
      interface eth0
      if failed link then alert
    
    check network Internal
      interface eth1
    

    The automatic implicit tests are getting in the way.

    I understand that if I know what the global alert(s) are I can disable them manually, but breaks when the global alert(s) change.

    Being able to (optionally) disable the automatic implicit tests seems like a good solution to me.

  7. Tildeslash repo owner

    you can do it using the same number of lines with current syntax:

    # set global alert target
    set alert foo@bar
    
    # will notify foo@bar on link down (implicit)
    check network External
      interface eth0
    
    # will not notify foo@bar (suppressed)
    check network Internal
      interface eth1
      noalert foo@bar
    

    If the implicit link failure test will be problem for more users, we can change the behaviour, but so far the link up/down test is simpler for majority of users i think + it can be suppressed if needed as explained above. Need to suppress the link up/down seems to me as corner case currently.

  8. Lonnie Abelbeck reporter

    Again, when foo@bar is changed to foo2@bar, all of the noalert's have to be changed, this is a big problem.

    I think we have discussed this completely. :-)

  9. Tildeslash repo owner

    The automatic (implicit) checks are old and originally comes from the check process statement and the wish to write a simple statement to monitor and get an alert:

    check process apache with pidfile /var/run/httpd.pid
    

    We do see your point and I think it is a good one. As more checks are added (with automatic checks, such as recently added check network) it will be less surprising (which is important) and clearer if you only get an alert when you explicit ask for it with if failed X then alert, otherwise it is only noted in the GUI, and in the logs. As you correctly noted, we have kind of started the refactor process by removing automatic tests for PID changes and we should be consequent. We'll discuss and think about this..

  10. Tildeslash repo owner

    We have discussed the automatic checks ... the conclusion is, that automatic link up/down is correct behaviour, as when the "check network ..." statement is added, the user is interested in testing the network interface, where the link up/down is key indicator - similar to "check process" which enables the process existence test.

    If the user doesn't want to check for link up/down, then probably the "check network" for this interface is not needed at all.

    elRadix: the bonded interface monitoring is fixed in the development version, will be part of next Monit release.

  11. Lonnie Abelbeck reporter

    The question is all about email alerts.

    What if up/down link is not interesting at all (annoying actually) but "if total upload > 10 GB in last hour then alert" is important.

    Give users a choice, don't force alerts in a secret, hidden fashion.

  12. Tildeslash repo owner

    In Monit using check <T> is a full statement for instance

      check file http_log with path /var/log/http_log
    
      check process apache with pidfile /var/run/httpd.pid
    

    or for this discussion

    check network eth0 with address 10.0.1.3 
    

    A check-statement is a basic check and test if the object in question exist and alert if not. I think this behaviour is both reasonable and expected. The opposite, if a single check statement did nothing, that would be more surprising. The basic existence test associated with a check-statement can then be further refined using

    if not exist for 2 cycles then alert
    

    We believe that this, together with the ability to filter out alerts for basic checks using set alert foo@bar not on { nonexist } makes for a coherent and orthogonal system. It could maybe be better documented and more concise. For example, check host does not include this basic exist check, but we'll fix this. If you have a better idea or suggestion, please share. The premise though, unless there are good arguments against, is to keep the ability to use check <T> as a complete check-statement by itself.

  13. Lonnie Abelbeck reporter

    I guess we have to agree to disagree :-)

    Getting "check network" to work on 32-bit systems is probably more pressing as per issue #138

  14. Tildeslash repo owner

    Getting "check network" to work on 32-bit systems is probably more pressing as per issue #138

    Yes, we're looking into this. Anyway,thank you for the discussion and we'll keep this in mind as we are working on refactoring the config language.

  15. Log in to comment