"onreboot nostart" services started anyway when rebooting without statefile

Issue #1089 open
Richard Roth created an issue

I am not specifying a custom statefile in monitrc. (nor starting monit with -s <statefile>).

Service files containing the line “onreboot nostart" are started anyway when booting the system.

Reverting https://bitbucket.org/tildeslash/monit/commits/1a414506d931 fixes the issue

Comments (23)

  1. Lutz Mader

    Hello Richard,
    I try to check what is going wrong, but I can not.

    To get some more information and check your configuration start monit with “-vv” manualy and append the failed check to the ticket, please.

    No idea what is going wrong, sorry,
    Lutz

  2. Richard Roth reporter

    Hi,

    no worries, I know how difficult debugging with little information can be

    Below is the output with -vv. real service/process name is redacted with <AFFECTED_SERVICE> and ellipses (….) have been inserted to remove irrelevant parts of the output

    >> Starting service monit.
    [   16.783180] random: monit: uninitialized urandom read (256 bytes read)
     New Monit id: 5eeaa7b89320cba5076eae0e4289ceee
     Stored in '/root/.monit.id'
    Runtime constants:
     Control file       = /etc/monitrc
     Log file           = syslog
     Pid file           = /var/run/monit.pid
     Id file            = /root/.monit.id
     State file         = /root/.monit.state
     Debug              = True
     Log                = True
     Use syslog         = True
     Is Daemon          = True
     Use process engine = True
     Limits             = {
                        =   programOutput:     512 B
                        =   sendExpectBuffer:  256 B
                        =   fileContentBuffer: 512 B
                        =   httpContentBuffer: 1 MB
                        =   networkTimeout:    5 s
                        =   programTimeout:    5 m
                        =   stopTimeout:       30 s
                        =   startTimeout:      30 s
                        =   restartTimeout:    30 s
                        = }
     On reboot          = start
     Poll time          = 60 seconds with start delay 0 seconds
     Start monit httpd  = True
     httpd bind address = localhost
     httpd portnumber   = 2813
     httpd net readonly = Disabled
     httpd signature    = Enabled
     httpd auth. style  = Host/Net allow list
    
    The service list contains the following entries:
    
    Process Name          = klogd
     Pid file             = /var/run/klogd.pid
     Monitoring mode      = active
     On reboot            = start
     Start program        = '/etc/init.d/klogd start' timeout 30 s
     Stop program         = '/etc/init.d/klogd stop' timeout 30 s
     Existence            = if does not exist then restart
    
    Process Name          = <AFFECTED_SERVICE>
     Pid file             = /var/run/<AFFECTED_SERVICE>.pid
     Monitoring mode      = active
     On reboot            = nostart
     Start program        = '/etc/init.d/<AFFECTED_SERVICE> start' timeout 30 s
     Stop program         = '/etc/init.d/<AFFECTED_SERVICE> stop' timeout 30 s
     Existence            = if does not exist then restart
    
    
     ....
    
     pidfile '/var/run/monit.pid' does not exist
    Starting Monit 5.33.0 daemon with http interface at [localhost]:2813
    +++ Successfully started monit.
    
    
    ....
    
    Starting <AFFECTED_SERVICE>
    

  3. Richard Roth reporter

    Note:

    this only occurs when rebooting the system and monit is started during the boot process

    If the system is allowed to fully boot without monit, and monit is then started at a later time, this issue does not occur

    Our system consists of various partitions that are mounted during bootup. Perhaps this is somehow related to some required directory for monit state files (or similar) not being available at the required time?

  4. Lutz Mader

    Hello Richard,
    thanks for the additional information, I do some additional checks/tests in my environment.

    Lutz

    p.s.

    A suggestion only, I use the start deplay in production systems (AIX, Linux) to fix some interface/filesystem trouble at startup.

    set daemon  60              # check services at 30 seconds intervals
        with start delay 240    # optional: delay the first check by 4-minutes (by
    #                           # default Monit check immediately after Monit start)
    

  5. Lutz Mader

    Hello Richard,
    are you sure, the message "Starting <AFFECTED_SERVICE>" is a Monit start message.
    From my point of view the message should look like "<AFFECTED_SERVICE> start: <THE COMMAND> and "<AFFECTED_SERVICE> process is running with pid <THE PID>", if Monit start the application.

    Check your Systemd/InitD configuration also, please.

    Lutz

  6. Richard Roth reporter

    Hi,

    Yes.

     Start program        = '/etc/init.d/<AFFECTED_SERVICE> start' timeout 30 s
     Stop program         = '/etc/init.d/<AFFECTED_SERVICE> stop' timeout 30 s
    

    executes bash scripts which then start/stop the process

    The “Starting …” line is from the bash script

  7. Richard Roth reporter

    Update

    set daemon  60              # check services at 30 seconds intervals
        with start delay 20    # optional: delay the first check by 4-minutes (by
    #                           # default Monit check immediately after Monit start)
    

    also resolves my issue.

    I will test for a lower limit. But I also need to check internally if this solution is OK as we have timing bootup time requirements that we must maintain

    edit: I have just noticed some other unintended side effects…services not started/being monitored

  8. Lutz Mader

    Hello Richard,
    nice to know, I started without a start delay also, but every thing worked well.

    My problem of understanding is, I find messages like "<AFFECTED_SERVICE> start: <THE COMMAND> and "<AFFECTED_SERVICE> process is running with pid <THE PID>" if Monit is involved, in the log file. You shold find these messages in the syslog also, I think. Could you configure a monit log file instead of the syslog, to see Monit messages only.

    Lutz

  9. Richard Roth reporter

    AFFECTED_SERVICE from my previous comment is affected_service_2 in the attached log file

    note: my previous comment about the delay solving the issue does not seem to be accurate. my apologies

  10. Lutz Mader

    Snippet from the attached monit.redac.log file:

    [1970-01-01T00:00:21+0000] info     :  New Monit id: 61dcc48797223efb825f156fb348f72a
     Stored in '/root/.monit.id'
    [1970-01-01T00:00:21+0000] debug    : pidfile '/var/run/monit.pid' does not exist
    [1970-01-01T00:00:21+0000] info     : Starting Monit 5.33.0 daemon with http interface at [localhost]:2813
    [1970-01-01T00:00:22+0000] debug    : Starting Monit HTTP server at [localhost]:2813
    [1970-01-01T00:00:22+0000] debug    : Monit HTTP server started
    [1970-01-01T00:00:22+0000] info     : '<hostname>' Monit 5.33.0 started
    

    The timestamp from your log file.

    Lutz

  11. Lutz Mader

    Nice to know.
    You should start Monit only when your system knows the correct time.
    The "startup delay" should not be considered, because Monit does not recognize the reboot correctly.

    Lutz

    p.s.
    Sorry, my system use the hardware clock after the reboot and synchronise with the NTP as soon as possible.

  12. Richard Roth reporter

    so can you confirm that the epoch timestamp is the root cause oft he issue described in this ticket then?

    what about the case when a system is not connected to the internet / able to update the time from epoch via NTP or a similar method? Seems odd that this would result in the service configuration files not being honored

  13. Lutz Mader

    Sorry, no, no,
    the "startup delay" should not be considered correctly.
    The epoch time after a reboot should not a problem to handle "onreboot nostart" and seems to work as expected, because the application is not started.
    But "onreboot start" will start the application only, if the application was not stopped via Monit before.

    Lutz

    p.s.
    The boottime is stored in the Monit state file.

    hexdump -C -n 16 monit.state
    00000000  00 00 00 00 04 00 00 00  18 17 0b 65 00 00 00 00  |...........e....|
    00000010
    

    "65 0b 17 18" is the stored boot timestamp, 2023-09-20 16:00:24.

  14. Richard Roth reporter

    anything else I can provided to help debug the issue then? If not, I will start doing some code debugging myself

    As mentioned in the description, reverting 1a414506d931 resolves the issue.

  15. Lutz Mader

    Sorry Richard,
    I can not find a proper test system (I think).

    My suggestion, define a state file (see “set statefile”) in a RAM/temporary file system (a “.monit.state” file should be available in the root home folder, based on your Monit output). The state file will created after a reboot and Monit detect the reboot correctly (because the information in the state file are not available).

    You point to the right code, but this problem start based on some changes in Monit 5.26.0 to handle the “start delay”.

    I found a similar problem in “time travel” systems and I fixed the start glitches by removing the state file after a reboot and before Monit start.

    I do not know a good way how to detect a “reboot” properly. Monit use/compare the boottime and a timestamp stored in the state file to test if this is a Monit restart or system reboot. Based on the result, the “onreboot” definitions are handled. Unfortunately, your boottime is always something like “1970-01-01T00:00:10“ and the “reboot” test does not recognize this is a “reboot” because the time was not changed.

    Sorry,
    Lutz

  16. Lutz Mader

    Thanks for your response,
    deleting the Monit stat file solved my problems based on the clock/time in the past.

    Nice to know Monit 5.31.0 works well and handle the “onreboot” proper.

    Lutz

    p.s.

    I still stuck in Monit 5.29.0 on some systems, :-(.

  17. Log in to comment