- edited description
monit exits after starting with exit code 1
I am seeing occassionally that the main monit process exits with exit-code 1. I am attaching monit log and monitrc files herewith along with the output of systemctl status monit.
[root@localhost ~]# systemctl status monit
? monit.service - SYSV: Monit is a utility for managing and monitoring processes,
Loaded: loaded (/etc/rc.d/init.d/monit)
Active: failed (Result: exit-code) since Tue 2016-08-23 17:14:52 UTC; 1 day 5h ago
Docs: man:systemd-sysv-generator(8)
Process: 3173 ExecStop=/etc/rc.d/init.d/monit stop (code=exited, status=0/SUCCESS)
Process: 868 ExecStart=/etc/rc.d/init.d/monit start (code=exited, status=0/SUCCESS)
Main PID: 1097 (code=exited, status=1/FAILURE)
Aug 23 16:28:51 localhost.localdomain systemd[1]: Starting SYSV: Monit is a utility for managing and monitoring processes,...
Aug 23 16:28:51 localhost.localdomain monit[868]: Starting monit: Starting Monit 5.17.1 daemon with http interface at [localhost]:2812
Aug 23 16:28:51 localhost.localdomain monit[868]: [ OK ]
Aug 23 16:28:51 localhost.localdomain systemd[1]: PID file /var/run/monit.pid not readable (yet?) after start.
Aug 23 16:28:51 localhost.localdomain systemd[1]: Started SYSV: Monit is a utility for managing and monitoring processes,.
Aug 23 17:14:52 host-97-77.openstacklocal systemd[1]: monit.service: main process exited, code=exited, status=1/FAILURE
Aug 23 17:14:52 host-97-77.openstacklocal monit[3173]: Stopping monit: [FAILED]
Aug 23 17:14:52 host-97-77.openstacklocal systemd[1]: Unit monit.service entered failed state.
Aug 23 17:14:52 host-97-77.openstacklocal systemd[1]: monit.service failed.
[root@localhost ~]#
This is in a VM environment. The VM boots with localhost.localdomain as the fqdn and monit is started as a service on boot-up. After this I am editing the hostname on the Linux CLI:
hostname host-97-77.openstacklocal
And then doing a monit reload. This leads to the crash as seen from the monit.log. Is this a known issue? Is there any way to turn on more debugging to find out what is causing this crash?
Thanks, Aniket.
Comments (11)
-
reporter -
repo owner - changed status to closed
There is configuration error, so monit stopped on reload:
[UTC Aug 23 17:14:52] error : Depend service 'etcd-status' is not defined in the control file
Please remove the "depend on etcd-status" from your monit configuration.
-
reporter I am not sure I understand. The depend on etcd-status is a valid dependency specified in one of the monit files that I have to monitor a process. Why is that a configuration error? Is there a specific order in which monit reads the .monit files from the include directory?
I have a monit file for etcd called etcd.monit which defines the etcd-status check.
check process etcd matching "etcd --name" start program "/usr/local/bin/docker-compose -p nuage -f /opt/vsd/docker/etcd-ha.yml up -d" stop program "/usr/bin/docker stop nuage_etcd_1" group common check program etcd-status with path /opt/vsd/docker/test-scripts/etcd-status.sh if status != 0 then alert group common group check
and I have another monit file in the same include directory called zookeper.monit which has the dependency set:
check process zookeeper matching "nuageZookeeper" start program "/usr/local/bin/docker-compose -p nuage -f /opt/vsd/docker/zookeeper.yml up -d" stop program "/usr/bin/docker stop nuage_zookeeper_1" depends on etcd-status group check
Is it possible that zookeeper.monit is read before etcd.monit? Are they read in alphabetical order?
Thanks, Aniket.
-
reporter - changed status to open
Re-opening for clarification from the repo-owner.
-
reporter From monit documentation, the monit files are read in a non-sorted manner. I am thinking of including the etcd.monit specifically first and then include all other monit control files with a wild carded globstring. Would this be the recommended way?
-
repo owner The dependency reference is checked at the end of configuration file parsing, when all files were included, the include order is not significant.
I tried to reproduce the problem, but it works fine in our lab with the latest monit release. Can you reproduce the issue?
-
reporter The issue is intermittent. The document here: https://mmonit.com/monit/documentation/monit.html#INCLUDE-FILES seems to suggest that the control files are loaded in a non-sorted manner. I am not sure how that should be interpreted as far as dep. reference check is concerned.
-
repo owner As mentioned in previous update, the order is not significant. All "depends on <service>" statements are evaluated in the postparse() function, which is executed after all files were included.
For example if files will be included in the following order:
/etc/monit.d/include/00-A:
check process A matching "A" depends on B
/etc/monit.d/include/01-B:
check process B matching "B"
Then it'll work normally, even though B is included after A.
-
reporter Got it. If there is any additional debugging I can turn on to get the monit logs from when the crash happens, I can do that. Not sure what the root cause is of the exit-status 1. A backtrace or something that indicates the root cause would have been helpful.
-
repo owner The root cause is, as mentioned, that the service "etcd-status", which is required for "zookeeper" was not found during configuration parsing (triggered by "monit reload"). Invalid "depends on" reference is configuration error and monit exits.
Is it possible that the file "etcd.monit" which contains definition of "etcd-status" was temporarily removed? (after all "monit reload" was called probably as part of configuration change).
You can start monit with "-v" option to get debug output.
You can also run "monit -t" periodically for example from cron and log output ... the "-t" option just validates the configuration, if "etcd-status" will be missing, it'll display error.
-
repo owner - changed status to closed
- Log in to comment