Version 5.14. The setup is very simple, don't know where to search for problems.
Monit works for some time and then just gets stuck.
CentOS release 6.7 (Final)
Please can you send your monit log?
It can be enabled with a "set logfile" statement ... either to specific file:
set logfile <path>
set logfile syslog
I noticed that it was stuck on Feb 2. It's last lines were:
[MSK Jan 28 05:39:30] info : Reinitializing monit daemon
[MSK Jan 28 06:06:15] info : Starting Monit 5.14 daemon with http interface at [127.0.0.1]:2812
[MSK Jan 28 06:06:15] info : Starting Monit HTTP server at [127.0.0.1]:2812
[MSK Jan 28 06:06:15] info : Monit HTTP server started
[MSK Jan 28 06:06:15] info : 'mariadb-02.local' Monit 5.14 started
[MSK Jan 28 06:06:16] error : 'memcached' process is not running
[MSK Jan 28 06:06:16] info : 'memcached' trying to restart
[MSK Jan 28 06:06:16] info : 'memcached' start: /etc/init.d/memcached
[MSK Jan 28 06:07:17] info : 'memcached' process is running with pid 1029
ps aux shows:
root 987 0.0 0.0 120768 1948 ? D Jan28 0:06 monit
Please can you take a backtrace?:
gdb <path to monit binary> <monit's PID>
(gdb) thread apply all backtrace full
I restarted the LXC it was running on. Will let you know once I see this problem again.
If you use monit in LXC, we recommend to upgrade to the upcoming monit 5.16 release (should be available today) - it comes with fix related to LXC (if you have some process check with a connection test, the connection test was skipped, as it was not possible to collect part of data inside LXC container).
Thanks for data. It seems that the problem is in FUSE driver (not monit bug) ... monit just performs read and it seems that it stuck in the driver.
The debug mode will help to trace which read triggers the issue.
I think it should be possible to trigger the problem without involving monit, the following script collects the data from the /proc filesystem each 5 seconds similarly to monit, can you try to run it and see if it'll stuck as well?
do cat /proc/meminfo /proc/stat /proc/[1-9]*/stat /proc/[1-9]*/status > /dev/null