Monit spawn a lot of process during trying to start monitored service

Issue #504 closed
Alex Emelyanov created an issue

Hi all, thank you for the nice tool.

I have an issue, I have tried to google and to search here, but nothing was find.

I am trying to keep Sidekiq under Monit. After Monit start it runs a lot of processes in one moment and them consumes all CPU

Processes look like this: /bin/su - deploy -c cd /data/carsharing/current && bundle exec sidekiq --config /data/carsharing/current/config/sidekiq.yml --index 0 --pidfile /data/carsharing/shared/tmp/sidekiq-0.pid --environment production --logfile /data/carsharing/shared/log/sidekiq.log -d

The is a lot of them:

root@sidekiq-1:~# ps aux | grep sidekiq | wc -l
252

It's critically slows OS.

And after several minutes I have couple of Sidekiq instances runned, one under monit and another "illegal"

Processes

root@sidekiq-1:~# ps aux | grep sidekiq
deploy   12136 10.4 23.9 1598280 243244 ?      Sl   19:52   0:15 sidekiq 3.4.2 carsharing [0 of 5 busy]
deploy   19962 28.4 20.4 1594876 207784 ?      Sl   19:54   0:10 sidekiq 3.4.2 carsharing [0 of 5 busy]
root     19995  0.0  0.1  11960  1956 pts/0    S+   19:54   0:00 grep --color=auto sidekiq

Monit

root@sidekiq-1:~# monit status
The Monit daemon 5.6 uptime: 13m

Process 'sidekiq_production0'
  status                            Running
  monitoring status                 Monitored
  pid                               19962
 parent pid                        1
 uptime                            7m
 children                          0
 memory kilobytes                  325876
 memory kilobytes total            325876
 memory percent                    32.0%
 memory percent total              32.0%
 cpu percent                       0.0%
 cpu percent total                 0.0%
 data collected                    Sun, 20 Nov 2016 20:01:36

System 'sidekiq-1'
  status                            Running
  monitoring status                 Monitored
  load average                      [0.06] [0.36] [0.34]
  cpu                               8.2%us 0.9%sy 0.0%wa
  memory usage                      715148 kB [70.3%]
  swap usage                        0 kB [0.0%]
  data collected                    Sun, 20 Nov 2016 20:00:52

monitrc file

set daemon 30            # check services at 2-minute intervals

set httpd port 2812 and
   use address localhost  # only accept connection from localhost
   allow localhost        # allow localhost to connect to the server and

include /etc/monit/conf.d/*

/etc/monit/conf.d/sidekiq_carsharing_production.conf

check process sidekiq_carsharing_production0
  with pidfile "/data/carsharing/shared/tmp/sidekiq-0.pid"
  start program = "/bin/su - deploy -c 'cd /data/carsharing/current && bundle exec sidekiq --config /data/carsharing/current/config/sidekiq.yml --index 0 --pidfile /data/carsharing/shared/tmp/sidekiq-0.pid --environment production  --logfile /data/carsharing/shared/log/sidekiq.log   -d'" with timeout 30 seconds

  stop program = "/bin/su - deploy -c 'cd /data/carsharing/current && bundle exec sidekiqctl stop /data/carsharing/shared/tmp/sidekiq-0.pid'" with timeout 30 seconds
  group carsharing-sidekiq

root@sidekiq-1:~# uname -a

Linux sidekiq-1 4.4.0-47-generic #68~14.04.1-Ubuntu SMP Wed Oct 26 19:42:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Comments (5)

  1. Tildeslash repo owner

    Please can you send monit log?

    It doesn't seem like monit bug. Monit executes only the "start program" and then waits for the process to start (in your case "/data/carsharing/shared/tmp/sidekiq-0.pid" to be created and expects matching PID is found between running processes). The default start timeout is 30 seconds - if the process is starting slowly (more then 30s), monit will try to restart it. If the process is not running, stop method is skipped (which may probably lead to several slow starting instances to be running in parallel).

    Recommendation:

    1.) check monit log to see if service start times out

    2.) if start timed out: rise start program timeout using the "timeout" option (https://mmonit.com/monit/documentation/monit.html#SERVICE-METHODS).

  2. Alex Emelyanov reporter

    There is no waiting of timeout a lot of processes start immediately.

    cat /var/log/monit.log

    [UTC Nov 28 22:26:43] info     : Starting monit daemon with http interface at [localhost:2812]
    [UTC Nov 28 22:26:43] info     : Starting monit HTTP server at [localhost:2812]
    [UTC Nov 28 22:26:43] info     : monit HTTP server started
    [UTC Nov 28 22:26:43] info     : 'sidekiq-1' Monit started
    [UTC Nov 28 22:26:43] error    : 'sidekiq_carsharing_production0' process is not running
    [UTC Nov 28 22:26:43] info     : 'sidekiq_carsharing_production0' trying to restart
    [UTC Nov 28 22:26:43] info     : 'sidekiq_carsharing_production0' start: /bin/su
    

    ps aux | grep sidekiq | wc -l

    197
    

    ps aux | grep sidekiq

    ...
    deploy    4636 18.0  1.6  35280 16468 ?        S    22:27   0:00 -su -c cd /data/carsharing/current && bundle exec sidekiq --config /data/carsharing/current/config/sidekiq.yml --index 0 --pidfile /data/carsharing/shared/tmp/sidekiq-0.pid --environment production  --logfile /data/carsharing/shared/log/sidekiq.log   -d
    deploy    4655 22.0  1.6  35344 16532 ?        S    22:27   0:00 -su -c cd /data/carsharing/current && bundle exec sidekiq --config /data/carsharing/current/config/sidekiq.yml --index 0 --pidfile /data/carsharing/shared/tmp/sidekiq-0.pid --environment production  --logfile /data/carsharing/shared/log/sidekiq.log   -d
    deploy    4680  0.0  1.6  35408 16596 ?        S    22:27   0:00 -su -c cd /data/carsharing/current && bundle exec sidekiq --config /data/carsharing/current/config/sidekiq.yml --index 0 --pidfile /data/carsharing/shared/tmp/sidekiq-0.pid --environment production  --logfile /data/carsharing/shared/log/sidekiq.log   -d
    deploy    4699  0.0  1.6  35468 16656 ?        S    22:27   0:00 -su -c cd /data/carsharing/current && bundle exec sidekiq --config /data/carsharing/current/config/sidekiq.yml --index 0 --pidfile /data/carsharing/shared/tmp/sidekiq-0.pid --environment production  --logfile /data/carsharing/shared/log/sidekiq.log   -d
    root      4712  0.0  0.2  11964  2040 pts/0    S+   22:27   0:00 grep --color=auto sidekiq
    ...
    
  3. Tildeslash repo owner

    Then it is most probably caused by the sidekiq itself (executed via 'start program') - we don't know how sidekiq is implemented, it seems it probably forks high number of processes. Googled a little bit for sidekiq paralelism and it seems to be sidekiq's feature: https://github.com/mperham/sidekiq/wiki/Best-Practices#3-embrace-concurrency Maybe sidekiq allows to tune the paralelism somehow - monit cannot throttle monitored program's fork frequency, please see sidekiq's manual if you can limit the paralelism somehow.

  4. Log in to comment