Check process failed when monit starts
Hi,
Some processes (elaticsearch) monitored by monit are considered dead when monit starts
Regarding this configuration :
check process elasticsearch_v4_hot with pidfile /var/run/elasticsearch/v4-hot/elasticsearch.pid
start program = "/sbin/service elasticsearch-v4-hot start"
stop program = "/sbin/service elasticsearch-v4-hot stop"
Some verification
[root@opelastnas2 v4-hot]# ps -auxwww|grep monit
root 16636 0.0 0.0 133552 4244 ? Ssl 09:40 0:00 /usr/bin/monit -I
[root@opelastnas2 v4-hot]# cat /var/run/elasticsearch/v4-hot/elasticsearch.pid
12111
ps -ax|grep 12111
12111 ? Sl 9:42 /bin/java -Xms16g -Xmx16g ........
Monit restart
[root@opelastnas2 v4-hot]# service monit restart
Redirecting to /bin/systemctl restart monit.service
[root@opelastnas2 v4-hot]# tail /var/log/monit.log
[AST Jul 15 09:40:47] info : Monit daemon with pid [11968] stopped
[AST Jul 15 09:40:47] info : 'opelastnas2' Monit 5.26.0 stopped
[AST Jul 15 09:40:53] info : Starting Monit 5.26.0 daemon with http interface at [*]:2812
[AST Jul 15 09:40:53] info : 'opelastnas2' Monit 5.26.0 started
[AST Jul 15 09:40:53] error : 'elasticsearch_v4_hot' process is not running
[AST Jul 15 09:40:53] info : 'elasticsearch_v4_hot' trying to restart
[AST Jul 15 09:40:53] info : 'elasticsearch_v4_hot' start: '/sbin/service elasticsearch-v4-hot start
[AST Jul 15 09:41:43] info : 'elasticsearch_v4_hot' process is running with pid 16779
At this point, the monitoring is OK. if I kill the “elasticsearch_v4_hot”
process, it will be restarted by monit
On the same box I have configurations for other kind of processes that don’t suffer of this problem
Franck
Comments (12)
-
repo owner -
reporter For sure.
Complete log here : https://pastebin.com/BsU8426X
- Existence of pids has been controlled before monit start (see beginning of pastebin)
- We can see :
pidfile '/var/run/elasticsearch/v4-hot/elasticsearch.pid' does not exist
Sending Does not exist notification to
Trying to send mail via
'elasticsearch_v4_hot' trying to restart
pidfile '/var/run/elasticsearch/v4-hot/elasticsearch.pid' does not exist'elasticsearch_v4_hot' start: '/sbin/service elasticsearch-v4-hot start'then later
'elasticsearch_v4_hot' process is running with pid 7964
'elasticsearch_v4_hot' zombie check succeeded -
reporter Please note that the problem is not fully reproducable .
After the test, when I restarted monit as a service, the anomaly didn’t happen
Last week, before my post, I had several successive tests with the anomaly
-
Hello Franck,
I try to find some contradictions for the monit status too.A question of understanding, what do you mean by "considered dead".
Some processes (elaticsearch) monitored by monit are considered dead when monit starts
Is the shown status wrong or the service not handled by monit?
Some of the problems are gone after I delete the ".monit.state" and restart monit in the past.
With regards,
Lutz -
reporter Hi Lutz
A question of understanding, what do you mean by "considered dead".
In my case, the anomaly is about an elasticSearch process which
- is up and alive
- has a pid file present and coherent with the process pid
But when monit starts, this process is restarted and we can see in monit debug mode a message that says the pid file (with the good path) can’t be found
monit run as root, so I think it can’t be a permission problem. Moreover, the restarted process has been initially started by monit.
-
Hello Franck,
this does not fit to my problem.My monit is running in the user context and monit get the right process pid (from the pid file), but sometimes, after a restart, not the right status. And I do not use systemd to start the java process.
Thanks for your answer,
Lutzp.s.
Do you know this problem, see
https://discuss.elastic.co/t/jna-temporary-directory-tmp-is-not-writable/143413https://github.com/elastic/elasticsearch/issues/11594
But you use /var/run and not /tmp.
-
repo owner Hello Franck, i have reviewed the data, it is really strange. There is nothing in the log that can explain why monit cannot read the file.
Please can you check the system log for any errors in the same timeframe when monit wasn’t able to read the pidfile?
Is there some LSM module such as AppArmor or SELinux on the system which can limit Monit file access?
It could be worth to try to use the latest Monit version, compiled with AddressSanitizer. See the compilation instrictions in the README.md (https://bitbucket.org/tildeslash/monit/src/master/ bellow source tree), but enable the AddressSanitizer:
./bootstrap ./configure --with-asan make
then:
1.) stop monit
2.) start it in debug mode:
./monit -vI
-
reporter Hi,
I came back on this issue.
It’s the same as https://bitbucket.org/tildeslash/monit/issues/106/stop-monit-command-kills-my-processes and the @baraabasata woraround is OK.
This should be somewhere in the docs cause it looks frequent in systemd/redhat environments
Franck
-
reporter I’ve just seen your last comment @Tildeslash in ticket 106.
The monit.service didn’t have the killMode=Process, despite I’ve installed the monit last version
You can see that in the EPEL package https://download-ib01.fedoraproject.org/pub/epel/7/x86_64/Packages/m/monit-5.26.0-1.el7.x86_64.rpm
-
reporter I’ve open that one : https://bugzilla.redhat.com/show_bug.cgi?id=1874909
-
repo owner thanks for update, it’s problem in 3rd party package then … the monit systemd template (system/startup/monit.service.in) contains KillMode=Process since 2016
-
repo owner - changed status to resolved
- Log in to comment
Please can you run monit in debug mode?
1.) stop monit:
service monit stop
2.) run it in debug mode / foreground:
monit -vI