Monit spawn a lot of process during trying to start monitored service
Hi all, thank you for the nice tool.
I have an issue, I have tried to google and to search here, but nothing was find.
I am trying to keep Sidekiq under Monit. After Monit start it runs a lot of processes in one moment and them consumes all CPU
Processes look like this:
/bin/su - deploy -c cd /data/carsharing/current && bundle exec sidekiq --config /data/carsharing/current/config/sidekiq.yml --index 0 --pidfile /data/carsharing/shared/tmp/sidekiq-0.pid --environment production --logfile /data/carsharing/shared/log/sidekiq.log -d
The is a lot of them:
root@sidekiq-1:~# ps aux | grep sidekiq | wc -l
252
It's critically slows OS.
And after several minutes I have couple of Sidekiq instances runned, one under monit and another "illegal"
Processes
root@sidekiq-1:~# ps aux | grep sidekiq
deploy 12136 10.4 23.9 1598280 243244 ? Sl 19:52 0:15 sidekiq 3.4.2 carsharing [0 of 5 busy]
deploy 19962 28.4 20.4 1594876 207784 ? Sl 19:54 0:10 sidekiq 3.4.2 carsharing [0 of 5 busy]
root 19995 0.0 0.1 11960 1956 pts/0 S+ 19:54 0:00 grep --color=auto sidekiq
Monit
root@sidekiq-1:~# monit status
The Monit daemon 5.6 uptime: 13m
Process 'sidekiq_production0'
status Running
monitoring status Monitored
pid 19962
parent pid 1
uptime 7m
children 0
memory kilobytes 325876
memory kilobytes total 325876
memory percent 32.0%
memory percent total 32.0%
cpu percent 0.0%
cpu percent total 0.0%
data collected Sun, 20 Nov 2016 20:01:36
System 'sidekiq-1'
status Running
monitoring status Monitored
load average [0.06] [0.36] [0.34]
cpu 8.2%us 0.9%sy 0.0%wa
memory usage 715148 kB [70.3%]
swap usage 0 kB [0.0%]
data collected Sun, 20 Nov 2016 20:00:52
monitrc file
set daemon 30 # check services at 2-minute intervals
set httpd port 2812 and
use address localhost # only accept connection from localhost
allow localhost # allow localhost to connect to the server and
include /etc/monit/conf.d/*
/etc/monit/conf.d/sidekiq_carsharing_production.conf
check process sidekiq_carsharing_production0
with pidfile "/data/carsharing/shared/tmp/sidekiq-0.pid"
start program = "/bin/su - deploy -c 'cd /data/carsharing/current && bundle exec sidekiq --config /data/carsharing/current/config/sidekiq.yml --index 0 --pidfile /data/carsharing/shared/tmp/sidekiq-0.pid --environment production --logfile /data/carsharing/shared/log/sidekiq.log -d'" with timeout 30 seconds
stop program = "/bin/su - deploy -c 'cd /data/carsharing/current && bundle exec sidekiqctl stop /data/carsharing/shared/tmp/sidekiq-0.pid'" with timeout 30 seconds
group carsharing-sidekiq
root@sidekiq-1:~# uname -a
Linux sidekiq-1 4.4.0-47-generic #68~14.04.1-Ubuntu SMP Wed Oct 26 19:42:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Comments (5)
-
reporter -
repo owner Please can you send monit log?
It doesn't seem like monit bug. Monit executes only the "start program" and then waits for the process to start (in your case "/data/carsharing/shared/tmp/sidekiq-0.pid" to be created and expects matching PID is found between running processes). The default start timeout is 30 seconds - if the process is starting slowly (more then 30s), monit will try to restart it. If the process is not running, stop method is skipped (which may probably lead to several slow starting instances to be running in parallel).
Recommendation:
1.) check monit log to see if service start times out
2.) if start timed out: rise start program timeout using the "timeout" option (https://mmonit.com/monit/documentation/monit.html#SERVICE-METHODS).
-
reporter There is no waiting of timeout a lot of processes start immediately.
cat /var/log/monit.log
[UTC Nov 28 22:26:43] info : Starting monit daemon with http interface at [localhost:2812] [UTC Nov 28 22:26:43] info : Starting monit HTTP server at [localhost:2812] [UTC Nov 28 22:26:43] info : monit HTTP server started [UTC Nov 28 22:26:43] info : 'sidekiq-1' Monit started [UTC Nov 28 22:26:43] error : 'sidekiq_carsharing_production0' process is not running [UTC Nov 28 22:26:43] info : 'sidekiq_carsharing_production0' trying to restart [UTC Nov 28 22:26:43] info : 'sidekiq_carsharing_production0' start: /bin/su
ps aux | grep sidekiq | wc -l
197
ps aux | grep sidekiq
... deploy 4636 18.0 1.6 35280 16468 ? S 22:27 0:00 -su -c cd /data/carsharing/current && bundle exec sidekiq --config /data/carsharing/current/config/sidekiq.yml --index 0 --pidfile /data/carsharing/shared/tmp/sidekiq-0.pid --environment production --logfile /data/carsharing/shared/log/sidekiq.log -d deploy 4655 22.0 1.6 35344 16532 ? S 22:27 0:00 -su -c cd /data/carsharing/current && bundle exec sidekiq --config /data/carsharing/current/config/sidekiq.yml --index 0 --pidfile /data/carsharing/shared/tmp/sidekiq-0.pid --environment production --logfile /data/carsharing/shared/log/sidekiq.log -d deploy 4680 0.0 1.6 35408 16596 ? S 22:27 0:00 -su -c cd /data/carsharing/current && bundle exec sidekiq --config /data/carsharing/current/config/sidekiq.yml --index 0 --pidfile /data/carsharing/shared/tmp/sidekiq-0.pid --environment production --logfile /data/carsharing/shared/log/sidekiq.log -d deploy 4699 0.0 1.6 35468 16656 ? S 22:27 0:00 -su -c cd /data/carsharing/current && bundle exec sidekiq --config /data/carsharing/current/config/sidekiq.yml --index 0 --pidfile /data/carsharing/shared/tmp/sidekiq-0.pid --environment production --logfile /data/carsharing/shared/log/sidekiq.log -d root 4712 0.0 0.2 11964 2040 pts/0 S+ 22:27 0:00 grep --color=auto sidekiq ...
-
repo owner Then it is most probably caused by the sidekiq itself (executed via 'start program') - we don't know how sidekiq is implemented, it seems it probably forks high number of processes. Googled a little bit for sidekiq paralelism and it seems to be sidekiq's feature: https://github.com/mperham/sidekiq/wiki/Best-Practices#3-embrace-concurrency Maybe sidekiq allows to tune the paralelism somehow - monit cannot throttle monitored program's fork frequency, please see sidekiq's manual if you can limit the paralelism somehow.
-
repo owner - changed status to closed
sidekiq's issue
- Log in to comment
I have upgraded to v5.20.0 looks the same