incorrect (way too low) filesystem service time when NFS failed

Issue #1039 new
Ulrich Windl created an issue

I had configured a “check filesystem” with “if service time > 50 milliseconds for 3 times within 5 cycles then alert”.

Accidentally we had a NFS failure this morning that lasted over 1000 seconds (that is there was no response within more than 1000 seconds).

However the monit log logged “… service time 1.380 s/operation matches resource limit [service time > 50 ms/operation]“

I think there is a big difference between 1.3 seconds and 1000 seconds. Also, more importantly, the issue was logged when the NFS server responded again, not when it had begun failing.

Comments (3)

  1. Lutz Mader

    Hello Ulrich,
    the output use "%.3f ms/operation" to format the output sometimes. Therefore 1.380 use a thousands seperator with "." also (I think).

    See src/validate.c resource servicetime format handling with "Convert_time2str(serviceTime" and in addition libmonit/src/util/Convert.c with "Convert_time2str" to use a useful time format.

    I think,
    Lutz

  2. Log in to comment