monit stop lumberjack is not successfull getting stop failed even after timeout value of 180 seconds

Issue #433 closed
Former user created an issue

Hi Experts, i need your help on overcoming a issue with monit stopping the service. We have a lumberjack service on one of the node and whenever we call "monit stop lumberjack" monit stop is failing and the lumberjack service is not stopped. when we cann stop action the monit state changes to unmonitor and there is not pid in /var/run however upon doing ps -ef |grep lumberjack i do see the process is still running.

i referred some of the fix suggested in the community like overriding time out value to 180 sec and also tried using restart instead of stop start as mentioned in below links. http://serverfault.com/questions/393031/make-monit-wait-longer-before-thinking-something-is-dead https://bitbucket.org/tildeslash/monit/issues/347/monit-fails-to-stop-apache-processes

however i dont get this issue is being fixed. please help me with the solution.

ps -ef |grep lumberjack
root       873     1  0 Jul18 ?        00:58:23 bin/lumberjack -config /etc//lumberjack.conf -spool-size 100 -log-to-syslog
root      1565     1  0 10:27 ?        00:00:02 bin/lumberjack -config /etc//lumberjack.conf -spool-size 100 -log-to-syslog
root      3863     1  0 10:06 ?        00:00:08 bin/lumberjack -config /etc//lumberjack.conf -spool-size 100 -log-to-syslog
root     13904     1  0 10:13 ?        00:00:06 bin/lumberjack -config /etc//lumberjack.conf -spool-size 100 -log-to-syslog
root     15218  3368  0 10:36 pts/0    00:00:00 grep lumberjack
root     27508     1  0 10:22 ?

from /var/log/message

Aug  2 10:18:43 updm-01-internal monit[1254]: 'lumberjack' stop on user request
Aug  2 10:18:43 updm-01-internal monit[1254]: Monit daemon with PID 1254 awakened
Aug  2 10:18:43 updm-01-internal monit[1254]: Awakened by User defined signal 1
Aug  2 10:18:43 updm-01-internal monit[1254]: 'lumberjack' stop: /etc/init.d/lumberjack
Aug  2 10:21:43 updm-01-internal monit[1254]: 'lumberjack' failed to stop (exit status 0) -- /etc/init.d/lumberjack: Stopping lumberjack                               Ok#012
Aug  2 10:21:43 updm-01-internal monit[1254]: 'lumberjack' stop action done

we are using monit 5.13.

regards, Chandrashekhar

Comments (5)

  1. Tildeslash repo owner

    The timeout should be set based on the typical length of given service stop. Setting it to 180 seconds makes no sense unless the service stop really takes such time. I recommend to revert back to defaults (30 seconds), so monit doesn't block on stop.

    Replacing stop with restart is non-sense if you want to stop the service - please revert back to stop.

    Make sure the stop method works if you execute it manually.

  2. crhiremat

    Hi Thanks for the quick help.

    i tried initially with default time(30sec) but i didnt get any success.

    Jul 8 15:06:45 updm-01-internal monit[7551]: 'lumberjack' stop: /etc/init.d/lumberjack Jul 8 15:07:15 updm-01-internal monit[7551]: 'lumberjack' failed to stop (exit status 0) -- /etc/init.d/lumberjack: Stopping lumberjack Ok#012 Jul 8 15:07:15 updm-01-internal monit[7551]: 'lumberjack' stop action done

    Afterwards checked and lumberjack was still there hanging: [root@updm-01-internal rafaelga]# ps -efH | grep lumberjack root 28566 26983 0 15:08 pts/1 00:00:00 grep lumberjack root 11152 1 0 Jul07 ? 00:00:09 bin/lumberjack -config /etc//lumberjack.conf -spool-size 100 -log-to-syslog

    because of which i wanted to wait till the process stops so increased the timeout value.

    can you please help.

  3. Tildeslash repo owner

    As mentioned, please try to run the "stop program" as configured in monit manually ... does it stop lumberjack?

    1.) If lumberjack is not stopped, the problem is in the stop script itself, which you need to fix - it is not related to monit at all

    2.) If lumberjack was stopped, how long did it take? If just few seconds, it makes no sense to rise monit timeout to 180 seconds - the default 30 seconds is sufficient in this case

    Last but not least ... we use BitBucket issues to track bugs - it is not intended as general forum for configuration questions and problems - please post your question to monit-general mailing list, we will not update this issue anymore unless it'll turn that it was caused by monit bug (unlikely).

  4. Log in to comment