Check process failed when monit starts

Issue #919 resolved
Franck Lefebure created an issue

Hi,

Some processes (elaticsearch) monitored by monit are considered dead when monit starts

Regarding this configuration :

check process elasticsearch_v4_hot  with pidfile /var/run/elasticsearch/v4-hot/elasticsearch.pid
    start program = "/sbin/service elasticsearch-v4-hot start"
    stop program = "/sbin/service elasticsearch-v4-hot stop"

Some verification

[root@opelastnas2 v4-hot]# ps -auxwww|grep monit
root     16636  0.0  0.0 133552  4244 ?        Ssl  09:40   0:00 /usr/bin/monit -I
[root@opelastnas2 v4-hot]# cat /var/run/elasticsearch/v4-hot/elasticsearch.pid
12111
ps -ax|grep 12111
12111 ?        Sl     9:42 /bin/java -Xms16g -Xmx16g ........

Monit restart

[root@opelastnas2 v4-hot]# service monit restart
Redirecting to /bin/systemctl restart monit.service
[root@opelastnas2 v4-hot]# tail /var/log/monit.log
[AST Jul 15 09:40:47] info     : Monit daemon with pid [11968] stopped
[AST Jul 15 09:40:47] info     : 'opelastnas2' Monit 5.26.0 stopped
[AST Jul 15 09:40:53] info     : Starting Monit 5.26.0 daemon with http interface at [*]:2812
[AST Jul 15 09:40:53] info     : 'opelastnas2' Monit 5.26.0 started
[AST Jul 15 09:40:53] error    : 'elasticsearch_v4_hot' process is not running
[AST Jul 15 09:40:53] info     : 'elasticsearch_v4_hot' trying to restart
[AST Jul 15 09:40:53] info     : 'elasticsearch_v4_hot' start: '/sbin/service elasticsearch-v4-hot start
[AST Jul 15 09:41:43] info     : 'elasticsearch_v4_hot' process is running with pid 16779

At this point, the monitoring is OK. if I kill the “elasticsearch_v4_hot”process, it will be restarted by monit

On the same box I have configurations for other kind of processes that don’t suffer of this problem

Franck

Comments (12)

  1. Tildeslash repo owner

    Please can you run monit in debug mode?

    1.) stop monit: service monit stop

    2.) run it in debug mode / foreground: monit -vI

  2. Franck Lefebure reporter

    For sure.

    Complete log here : https://pastebin.com/BsU8426X

    • Existence of pids has been controlled before monit start (see beginning of pastebin)
    • We can see :

    pidfile '/var/run/elasticsearch/v4-hot/elasticsearch.pid' does not exist
    Sending Does not exist notification to
    Trying to send mail via
    'elasticsearch_v4_hot' trying to restart
    pidfile '/var/run/elasticsearch/v4-hot/elasticsearch.pid' does not exist'elasticsearch_v4_hot' start: '/sbin/service elasticsearch-v4-hot start'

    then later

    'elasticsearch_v4_hot' process is running with pid 7964
    'elasticsearch_v4_hot' zombie check succeeded

  3. Franck Lefebure reporter

    Please note that the problem is not fully reproducable .

    After the test, when I restarted monit as a service, the anomaly didn’t happen

    Last week, before my post, I had several successive tests with the anomaly

  4. Lutz Mader

    Hello Franck,
    I try to find some contradictions for the monit status too.

    A question of understanding, what do you mean by "considered dead".

    Some processes (elaticsearch) monitored by monit are considered dead when monit starts

    Is the shown status wrong or the service not handled by monit?

    Some of the problems are gone after I delete the ".monit.state" and restart monit in the past.

    With regards,
    Lutz

  5. Franck Lefebure reporter

    Hi Lutz

    A question of understanding, what do you mean by "considered dead".

    In my case, the anomaly is about an elasticSearch process which

    • is up and alive
    • has a pid file present and coherent with the process pid

    But when monit starts, this process is restarted and we can see in monit debug mode a message that says the pid file (with the good path) can’t be found

    monit run as root, so I think it can’t be a permission problem. Moreover, the restarted process has been initially started by monit.

  6. Lutz Mader

    Hello Franck,
    this does not fit to my problem.

    My monit is running in the user context and monit get the right process pid (from the pid file), but sometimes, after a restart, not the right status. And I do not use systemd to start the java process.

    Thanks for your answer,
    Lutz

    p.s.
    Do you know this problem, see
    https://discuss.elastic.co/t/jna-temporary-directory-tmp-is-not-writable/143413

    https://github.com/elastic/elasticsearch/issues/11594

    But you use /var/run and not /tmp.

  7. Tildeslash repo owner

    Hello Franck, i have reviewed the data, it is really strange. There is nothing in the log that can explain why monit cannot read the file.

    Please can you check the system log for any errors in the same timeframe when monit wasn’t able to read the pidfile?

    Is there some LSM module such as AppArmor or SELinux on the system which can limit Monit file access?

    It could be worth to try to use the latest Monit version, compiled with AddressSanitizer. See the compilation instrictions in the README.md (https://bitbucket.org/tildeslash/monit/src/master/ bellow source tree), but enable the AddressSanitizer:

    ./bootstrap
    ./configure --with-asan
    make
    

    then:

    1.) stop monit

    2.) start it in debug mode:

    ./monit -vI

  8. Tildeslash repo owner

    thanks for update, it’s problem in 3rd party package then … the monit systemd template (system/startup/monit.service.in) contains KillMode=Process since 2016

  9. Log in to comment