incorrect (way too low) filesystem service time when NFS failed

I had configured a “check filesystem” with “if service time > 50 milliseconds for 3 times within 5 cycles then alert”.

Accidentally we had a NFS failure this morning that lasted over 1000 seconds (that is there was no response within more than 1000 seconds).

However the monit log logged “… service time 1.380 s/operation matches resource limit [service time > 50 ms/operation]“

I think there is a big difference between 1.3 seconds and 1000 seconds. Also, more importantly, the issue was logged when the NFS server responded again, not when it had begun failing.

Comments (3)