- edited description
"onreboot nostart" services started anyway when rebooting without statefile
I am not specifying a custom statefile in monitrc. (nor starting monit with -s <statefile>).
Service files containing the line “onreboot nostart
" are started anyway when booting the system.
Reverting https://bitbucket.org/tildeslash/monit/commits/1a414506d931 fixes the issue
Comments (23)
-
reporter -
reporter - edited description
-
repo owner - changed status to open
-
Hello Richard,
I try to check what is going wrong, but I can not.To get some more information and check your configuration start monit with “-vv” manualy and append the failed check to the ticket, please.
No idea what is going wrong, sorry,
Lutz -
reporter Hi,
no worries, I know how difficult debugging with little information can be
Below is the output with -vv. real service/process name is redacted with
<AFFECTED_SERVICE>
and ellipses (….) have been inserted to remove irrelevant parts of the output>> Starting service monit. [ 16.783180] random: monit: uninitialized urandom read (256 bytes read) New Monit id: 5eeaa7b89320cba5076eae0e4289ceee Stored in '/root/.monit.id' Runtime constants: Control file = /etc/monitrc Log file = syslog Pid file = /var/run/monit.pid Id file = /root/.monit.id State file = /root/.monit.state Debug = True Log = True Use syslog = True Is Daemon = True Use process engine = True Limits = { = programOutput: 512 B = sendExpectBuffer: 256 B = fileContentBuffer: 512 B = httpContentBuffer: 1 MB = networkTimeout: 5 s = programTimeout: 5 m = stopTimeout: 30 s = startTimeout: 30 s = restartTimeout: 30 s = } On reboot = start Poll time = 60 seconds with start delay 0 seconds Start monit httpd = True httpd bind address = localhost httpd portnumber = 2813 httpd net readonly = Disabled httpd signature = Enabled httpd auth. style = Host/Net allow list The service list contains the following entries: Process Name = klogd Pid file = /var/run/klogd.pid Monitoring mode = active On reboot = start Start program = '/etc/init.d/klogd start' timeout 30 s Stop program = '/etc/init.d/klogd stop' timeout 30 s Existence = if does not exist then restart Process Name = <AFFECTED_SERVICE> Pid file = /var/run/<AFFECTED_SERVICE>.pid Monitoring mode = active On reboot = nostart Start program = '/etc/init.d/<AFFECTED_SERVICE> start' timeout 30 s Stop program = '/etc/init.d/<AFFECTED_SERVICE> stop' timeout 30 s Existence = if does not exist then restart .... pidfile '/var/run/monit.pid' does not exist Starting Monit 5.33.0 daemon with http interface at [localhost]:2813 +++ Successfully started monit. .... Starting <AFFECTED_SERVICE>
-
reporter Note:
this only occurs when rebooting the system and monit is started during the boot process
If the system is allowed to fully boot without monit, and monit is then started at a later time, this issue does not occur
Our system consists of various partitions that are mounted during bootup. Perhaps this is somehow related to some required directory for monit state files (or similar) not being available at the required time?
-
Hello Richard,
thanks for the additional information, I do some additional checks/tests in my environment.Lutz
p.s.
A suggestion only, I use the start deplay in production systems (AIX, Linux) to fix some interface/filesystem trouble at startup.
set daemon 60 # check services at 30 seconds intervals with start delay 240 # optional: delay the first check by 4-minutes (by # # default Monit check immediately after Monit start)
-
Hello Richard,
are you sure, the message "Starting <AFFECTED_SERVICE>" is a Monit start message.
From my point of view the message should look like "<AFFECTED_SERVICE> start: <THE COMMAND> and "<AFFECTED_SERVICE> process is running with pid <THE PID>", if Monit start the application.Check your Systemd/InitD configuration also, please.
Lutz
-
reporter Hi,
Yes.
Start program = '/etc/init.d/<AFFECTED_SERVICE> start' timeout 30 s Stop program = '/etc/init.d/<AFFECTED_SERVICE> stop' timeout 30 s
executes bash scripts which then start/stop the process
The “Starting …” line is from the bash script
-
reporter Update
set daemon 60 # check services at 30 seconds intervals with start delay 20 # optional: delay the first check by 4-minutes (by # # default Monit check immediately after Monit start)
also resolves my issue.
I will test for a lower limit. But I also need to check internally if this solution is OK as we have timing bootup time requirements that we must maintain
edit: I have just noticed some other unintended side effects…services not started/being monitored
-
Hello Richard,
nice to know, I started without a start delay also, but every thing worked well.My problem of understanding is, I find messages like "<AFFECTED_SERVICE> start: <THE COMMAND> and "<AFFECTED_SERVICE> process is running with pid <THE PID>" if Monit is involved, in the log file. You shold find these messages in the syslog also, I think. Could you configure a monit log file instead of the syslog, to see Monit messages only.
Lutz
-
reporter - attached monit.redac.log
AFFECTED_SERVICE
from my previous comment isaffected_service_2
in the attached log file
note: my previous comment about the delay solving the issue does not seem to be accurate. my apologies
-
Hello Richard,
is the used timestamp in the file obfuscated?Lutz
-
reporter I am not sure what timestamp you mean
-
Snippet from the attached monit.redac.log file:
[1970-01-01T00:00:21+0000] info : New Monit id: 61dcc48797223efb825f156fb348f72a Stored in '/root/.monit.id' [1970-01-01T00:00:21+0000] debug : pidfile '/var/run/monit.pid' does not exist [1970-01-01T00:00:21+0000] info : Starting Monit 5.33.0 daemon with http interface at [localhost]:2813 [1970-01-01T00:00:22+0000] debug : Starting Monit HTTP server at [localhost]:2813 [1970-01-01T00:00:22+0000] debug : Monit HTTP server started [1970-01-01T00:00:22+0000] info : '<hostname>' Monit 5.33.0 started
The timestamp from your log file.
Lutz
-
reporter no. this is the unix epoch time before NTP has started and updated it
-
Nice to know.
You should start Monit only when your system knows the correct time.
The "startup delay" should not be considered, because Monit does not recognize the reboot correctly.Lutz
p.s.
Sorry, my system use the hardware clock after the reboot and synchronise with the NTP as soon as possible. -
reporter so can you confirm that the epoch timestamp is the root cause oft he issue described in this ticket then?
what about the case when a system is not connected to the internet / able to update the time from epoch via NTP or a similar method? Seems odd that this would result in the service configuration files not being honored
-
Sorry, no, no,
the "startup delay" should not be considered correctly.
The epoch time after a reboot should not a problem to handle "onreboot nostart" and seems to work as expected, because the application is not started.
But "onreboot start" will start the application only, if the application was not stopped via Monit before.Lutz
p.s.
The boottime is stored in the Monit state file.hexdump -C -n 16 monit.state 00000000 00 00 00 00 04 00 00 00 18 17 0b 65 00 00 00 00 |...........e....| 00000010
"65 0b 17 18" is the stored boot timestamp, 2023-09-20 16:00:24.
-
reporter anything else I can provided to help debug the issue then? If not, I will start doing some code debugging myself
As mentioned in the description, reverting 1a414506d931 resolves the issue.
-
Sorry Richard,
I can not find a proper test system (I think).My suggestion, define a state file (see “set statefile”) in a RAM/temporary file system (a “.monit.state” file should be available in the root home folder, based on your Monit output). The state file will created after a reboot and Monit detect the reboot correctly (because the information in the state file are not available).
You point to the right code, but this problem start based on some changes in Monit 5.26.0 to handle the “start delay”.
I found a similar problem in “time travel” systems and I fixed the start glitches by removing the state file after a reboot and before Monit start.
I do not know a good way how to detect a “reboot” properly. Monit use/compare the boottime and a timestamp stored in the state file to test if this is a Monit restart or system reboot. Based on the result, the “onreboot” definitions are handled. Unfortunately, your boottime is always something like “1970-01-01T00:00:10“ and the “reboot” test does not recognize this is a “reboot” because the time was not changed.
Sorry,
Lutz -
reporter hi.
creating/using a statefile does not resolve the issue, nor does removing it and then rebooting.
Unfortunately this is a showstopper for us and we will be forced to revert tp the monit release prior to the https://bitbucket.org/tildeslash/monit/commits/1a414506d931 commit
-
Thanks for your response,
deleting the Monit stat file solved my problems based on the clock/time in the past.Nice to know Monit 5.31.0 works well and handle the “onreboot” proper.
Lutz
p.s.
I still stuck in Monit 5.29.0 on some systems, :-(.
- Log in to comment