- edited description
Alert when Filesystem is unavailable
I was monitoring a mounted file system. The file system was forcefully detached and monit never alerted. I sort of get it.. the disk space was not technically higher than 80% .. but it was gone which is also an issue :)
Monit 5.19.0 uptime: 4d 23h 13m
┌─────────────────────────────────┬────────────────────────────┬───────────────┐
│ Service Name │ Status │ Type │
├─────────────────────────────────┼────────────────────────────┼───────────────┤
│ vert-backup.x.net │ Running │ System │
├─────────────────────────────────┼────────────────────────────┼───────────────┤
│ root │ Accessible │ Filesystem │
├─────────────────────────────────┼────────────────────────────┼───────────────┤
│ backup_node_1_storage │ Data access error │ Filesystem │
├─────────────────────────────────┼────────────────────────────┼───────────────┤
│ backup_node_2_storage │ Data access error │ Filesystem │
├─────────────────────────────────┼────────────────────────────┼───────────────┤
│ backup_node_3_storage │ Data access error │ Filesystem │
└─────────────────────────────────┴────────────────────────────┴───────────────┘
Is there a way to alert if disk space is too low OR the file system vanishes entirely?
Thanks, Drew
Comments (10)
-
reporter -
reporter - edited description
-
repo owner Please can you send your monit configuration for this filesystem check?
-
reporter # {{ ansible_managed }} set daemon {{ monit_cycle }} set logfile {{ monit_log_destination }} set statefile {{ monit_state_file }} set idfile {{ monit_id_file }} {% if monit_eventqueue_dir is defined %} set eventqueue basedir {{ monit_eventqueue_dir | default('/var/lib/monit/events') }} slots {{ monit_eventqueue_slots | default(1000) }} {% endif %} set mmonit xxxx set httpd port 2812 allow localhost allow xxx allow xxx check filesystem root with path /dev/xvda1 if SPACE usage > 80% for 2 cycles then exec "/etc/monit/slack_notifications.sh" repeat every 2 cycles else if succeeded then exec "/etc/monit/slack_notifications.sh good" check filesystem backup_node_1_storage with path /dev/xvdb if SPACE usage > 95% for 2 cycles then exec "/etc/monit/slack_notifications.sh" repeat every 2 cycles else if succeeded then exec "/etc/monit/slack_notifications.sh good" check filesystem backup_node_2_storage with path /dev/xvdc if SPACE usage > 95% for 2 cycles then exec "/etc/monit/slack_notifications.sh" repeat every 2 cycles else if succeeded then exec "/etc/monit/slack_notifications.sh good" check filesystem backup_node_3_storage with path /dev/xvdd if SPACE usage > 95% for 2 cycles then exec "/etc/monit/slack_notifications.sh" repeat every 2 cycles else if succeeded then exec "/etc/monit/slack_notifications.sh good"
Thanks @tildeslash !
-
repo owner Thanks. The problem is, that the space usage test can trigger only the "resource" event type (if space usage is exceeded").
If the filesystem vanishes, monit generates different event type ("data"), for which you don't have a rule in the configuration and unfortunately there's just implicit rule for it in monit, with an alert action, which sends event to all "set alert <address>" and "alert <address>" targets by mail. As there's no "[set] alert" statement in your monit configuration (the alert is send via exec action only), the alert is not send.
We can add the possibility to set custom action for vanished filesystem event in the next monit release.
As a workaround you can add "set alert <address>" and "set mailserver ..." so you can get notification via mail.
-
reporter Thanks @tildeslash -- makes sense. There is a lot of challenges around networking/firewalls for me to add the mail servers in my environment.
Instead of adding a custom action for the vanished filesystem event, could we add the ability to override the implicit rule so that I can catch all the other stuff that might use the implicit rule?
All I would need is the ability to call a script versus sending the email in the implicit rule.
BTW - I'm also purchasing MMONIT soon, perhaps there is something at that level that can alert when a system has less than all its services available? This would have also caught the issue, as MMONIT was displaying this system with less than all services available -- which is what turned me onto the issue in the first place.
Thanks again, Drew
-
repo owner Yes, M/Monit will solve the problem - you can setup an alert rule (Admin -> Alerts -> Rules) that will execute script for any event type.
-
reporter Thanks @tildeslash !
I think it would be really useful to Instead of adding a custom action for the vanished filesystem event, could we add the ability to override the implicit rule so that I can catch all the other stuff that might use the implicit rule?
All we would need is the ability to call a script versus sending the email in the implicit rule...
Thanks again, Drew
-
repo owner - changed status to resolved
Fixed: Issue
#468: If the filesystem doesn't exist, Monit now triggers an "nonexist" event instead of "data" event, so it's possible to override the default action using "if does not exist then <action>".→ <<cset 5c5b82d6a563>>
-
reporter You rock @tildeslash !!
- Log in to comment