Monit doesn't keep state between reloads

Issue #316 resolved
jvrplmlmn created an issue

I'm trying to understand how Monit keeps the state between reloads/restarts, using the statefile.

I'm monit with the following configuration:

set daemon 10
    with start delay 1

set logfile /var/log/monit/service.log

set pidfile /var/run/monit.pid

set idfile /var/monit/.monit.id

set statefile /var/monit/.monit.state

check file test_file with path /tmp/test.txt
    if changed timestamp then alert

If /tmp/test.txt is updated (touch /tmp/test.txt), this triggers an alert, as expected:

[UTC Jan 18 14:42:51] debug    : M/Monit: status message sent to http://[mmonit.example.com]:8080/collector
[UTC Jan 18 14:42:52] debug    : 'test_file' file exists
[UTC Jan 18 14:42:52] debug    : 'test_file' is a regular file or socket
[UTC Jan 18 14:42:52] debug    : 'test_file' actual system time obtained
[UTC Jan 18 14:42:52] error    : 'test_file' timestamp was changed for /tmp/test.txt
[UTC Jan 18 14:42:52] debug    : -------------------------------------------------------------------------------
[UTC Jan 18 14:42:52] debug    :     /opt/monit/bin/monit() [0x41a5a3]
[UTC Jan 18 14:42:52] debug    :     /opt/monit/bin/monit() [0x41ad7f]
[UTC Jan 18 14:42:52] debug    :     /opt/monit/bin/monit() [0x4158ed]
[UTC Jan 18 14:42:52] debug    :     /opt/monit/bin/monit() [0x42b4f9]
[UTC Jan 18 14:42:52] debug    :     /opt/monit/bin/monit() [0x42d49d]
[UTC Jan 18 14:42:52] debug    :     /opt/monit/bin/monit() [0x42c0c5]
[UTC Jan 18 14:42:52] debug    :     /opt/monit/bin/monit() [0x4115d9]
[UTC Jan 18 14:42:52] debug    :     /opt/monit/bin/monit() [0x412331]
[UTC Jan 18 14:42:52] debug    :     /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7f7b0844a76d]
[UTC Jan 18 14:42:52] debug    :     /opt/monit/bin/monit() [0x404b2a]
[UTC Jan 18 14:42:52] debug    : -------------------------------------------------------------------------------
[UTC Jan 18 14:42:52] debug    : M/Monit: event message sent to http://[mmonit.example.com]:8080/collector

Waited for one cycle with no action:

[UTC Jan 18 14:43:01] debug    : M/Monit: status message sent to http://[mmonit.example.com]:8080/collector
[UTC Jan 18 14:43:02] debug    : 'test_file' file exists
[UTC Jan 18 14:43:02] debug    : 'test_file' is a regular file or socket
[UTC Jan 18 14:43:02] debug    : 'test_file' actual system time obtained
[UTC Jan 18 14:43:02] info     : 'test_file' timestamp was not changed for /tmp/test.txt

Executed touch /tmp/test.txt && sv reload monit, and the state gets lost between restarts:

[UTC Jan 18 14:43:06] info     : Awakened by the SIGHUP signal
Reinitializing Monit - Control file '/opt/monit/conf/monitrc'
[UTC Jan 18 14:43:06] info     : M/Monit heartbeat stopped
[UTC Jan 18 14:43:06] info     : Shutting down Monit HTTP server
[UTC Jan 18 14:43:07] info     : Monit HTTP server stopped
[UTC Jan 18 14:43:07] debug    : Adding host allow 'localhost'
[UTC Jan 18 14:43:07] debug    : Adding host allow '<EDITED>'
[UTC Jan 18 14:43:07] debug    : Adding host allow 'mmonit.example.com'
[UTC Jan 18 14:43:07] debug    : Adding credentials for user 'monit'
[UTC Jan 18 14:43:07] info     : Starting Monit HTTP server at [*]:3737
[UTC Jan 18 14:43:07] info     : Monit HTTP server started
[UTC Jan 18 14:43:07] info     : '<EDITED HOSTNAME>' Monit reloaded
[UTC Jan 18 14:43:07] debug    : M/Monit: event message sent to http://[mmonit.example.com]:8080/collector
[UTC Jan 18 14:43:07] info     : M/Monit heartbeat started
[UTC Jan 18 14:43:07] debug    : M/Monit: status message sent to http://[mmonit.example.com]:8080/collector
[UTC Jan 18 14:43:07] debug    : 'test_file' file exists
[UTC Jan 18 14:43:07] debug    : 'test_file' is a regular file or socket
[UTC Jan 18 14:43:07] debug    : 'test_file' actual system time obtained
[UTC Jan 18 14:43:07] debug    : 'test_file' timestamp was not changed for /tmp/test.txt

Is this expected behavior? Shouldn't I get an alert?

I reproduced the same scenario with Monit 5.14 and Monit 5.15, with identical output.

Comments (6)

  1. Duke Bartholomew

    @jvrplmlmn I also added restore of the timestamp on my PR.
    A quick a verification indicates a solution for your problem, but I'm still waiting for comments of @tildeslash.
    I might go completely the wrong way with this, or break certain use cases I'm not aware of.

  2. Log in to comment