Alert when Filesystem is unavailable

Issue #468 resolved
Drew OConnor
created an issue

I was monitoring a mounted file system. The file system was forcefully detached and monit never alerted. I sort of get it.. the disk space was not technically higher than 80% .. but it was gone which is also an issue :)

Monit 5.19.0 uptime: 4d 23h 13m
┌─────────────────────────────────┬────────────────────────────┬───────────────┐
 Service Name                     Status                      Type          
├─────────────────────────────────┼────────────────────────────┼───────────────┤
 vert-backup.x.net                Running                     System        
├─────────────────────────────────┼────────────────────────────┼───────────────┤
 root                             Accessible                  Filesystem    
├─────────────────────────────────┼────────────────────────────┼───────────────┤
 backup_node_1_storage            Data access error           Filesystem    
├─────────────────────────────────┼────────────────────────────┼───────────────┤
 backup_node_2_storage            Data access error           Filesystem    
├─────────────────────────────────┼────────────────────────────┼───────────────┤
 backup_node_3_storage            Data access error           Filesystem    
└─────────────────────────────────┴────────────────────────────┴───────────────┘

Is there a way to alert if disk space is too low OR the file system vanishes entirely?

Thanks, Drew

Comments (10)

  1. Drew OConnor reporter
    # {{ ansible_managed }}
    
    set daemon {{ monit_cycle }}
    set logfile {{ monit_log_destination }}
    set statefile {{ monit_state_file }}
    set idfile {{ monit_id_file }}
    {% if monit_eventqueue_dir is defined %}
    set eventqueue
      basedir {{ monit_eventqueue_dir | default('/var/lib/monit/events') }}
      slots {{ monit_eventqueue_slots | default(1000) }}
    {% endif %}
    
    set mmonit xxxx
    set httpd port 2812
    allow localhost
    allow xxx
    allow xxx
    
    check filesystem root with path /dev/xvda1
        if SPACE usage > 80% for 2 cycles then exec "/etc/monit/slack_notifications.sh" repeat every 2 cycles else if succeeded then exec "/etc/monit/slack_notifications.sh good"
    
    check filesystem backup_node_1_storage with path /dev/xvdb
        if SPACE usage > 95% for 2 cycles then exec "/etc/monit/slack_notifications.sh" repeat every 2 cycles else if succeeded then exec "/etc/monit/slack_notifications.sh good"
    
    check filesystem backup_node_2_storage with path /dev/xvdc
        if SPACE usage > 95% for 2 cycles then exec "/etc/monit/slack_notifications.sh" repeat every 2 cycles else if succeeded then exec "/etc/monit/slack_notifications.sh good"
    
    check filesystem backup_node_3_storage with path /dev/xvdd
        if SPACE usage > 95% for 2 cycles then exec "/etc/monit/slack_notifications.sh" repeat every 2 cycles else if succeeded then exec "/etc/monit/slack_notifications.sh good"
    

    Thanks @Tildeslash !

  2. Tildeslash repo owner

    Thanks. The problem is, that the space usage test can trigger only the "resource" event type (if space usage is exceeded").

    If the filesystem vanishes, monit generates different event type ("data"), for which you don't have a rule in the configuration and unfortunately there's just implicit rule for it in monit, with an alert action, which sends event to all "set alert <address>" and "alert <address>" targets by mail. As there's no "[set] alert" statement in your monit configuration (the alert is send via exec action only), the alert is not send.

    We can add the possibility to set custom action for vanished filesystem event in the next monit release.

    As a workaround you can add "set alert <address>" and "set mailserver ..." so you can get notification via mail.

  3. Drew OConnor reporter

    Thanks @Tildeslash -- makes sense. There is a lot of challenges around networking/firewalls for me to add the mail servers in my environment.

    Instead of adding a custom action for the vanished filesystem event, could we add the ability to override the implicit rule so that I can catch all the other stuff that might use the implicit rule?

    All I would need is the ability to call a script versus sending the email in the implicit rule.

    BTW - I'm also purchasing MMONIT soon, perhaps there is something at that level that can alert when a system has less than all its services available? This would have also caught the issue, as MMONIT was displaying this system with less than all services available -- which is what turned me onto the issue in the first place.

    Thanks again, Drew

  4. Drew OConnor reporter

    Thanks @Tildeslash !

    I think it would be really useful to Instead of adding a custom action for the vanished filesystem event, could we add the ability to override the implicit rule so that I can catch all the other stuff that might use the implicit rule?

    All we would need is the ability to call a script versus sending the email in the implicit rule...

    Thanks again, Drew

  5. Log in to comment