high cpu load

Issue #868 resolved
Former user created an issue

I'm using RhodeCode 1.7.1 on Ubuntu Server 12.04.

It's turned out that the "paster server" process is the bottle neck for the performance of the system. Unfortunately, the high CPU usage slows down everything. Is there a solution to speed up the system?

Comments (24)

  1. hipp

    Thanks for your answer. I have read the page and I'm using a MySQL database on another server. The repositories are all not very large. The call of the pages in the browser is fast enough. Only the access via mecurial is very slow, resulting in 100% CPU utilization. A reboot of the Service did not yield any improvement.

  2. Marcin Kuzminski repo owner

    That's odd since it's not really CPU heavy operation, what kind of resources that machine have ? Also make sure there aren't a lot of concurrent pull operations on the same time, if so you need to scale RhodeCode

  3. hipp

    I's a virtual server with 2x2.4GHz and 8GB RAM. Even one pull operation yields such a high cpu load.

    Don't hesitate to ask for further information! I would really like to use RhodeCode!

  4. Marcin Kuzminski repo owner

    did you try to use pure hgweb and check if the CPU load is also there ? Also check the rhodecode logs for any information, that could give some clue on the high CPU usage.

  5. Alexey Larikov

    I had similar issue couple of time since upgrading to 1.7.1. Python process just eats 100% CPU and the whole thing is unresponsive. (hardware server: 6x3GHz xeon 96GB RAM). Restart solves the issue. Nothing unexpected was in logs. Issue is intermittent and happened only two or three times for a month.

  6. Marcin Kuzminski repo owner

    @zinzila Please check if there aren't any hangit git processes when that happens for you. I recommend using uwsgi with process pool, long running workers will be recycled after specified amount of time

  7. Alexey Larikov

    @marcinkuzminski I'll check for git process next time it happens. As a note, we don't use git at all. Only hg. As for wsgi, as I understand it'll bring in Apache dependency, which is not only complicating installation, but it is also not a straightforward thing in my environment, as it is heavily regulated.

  8. Yury Vidineev

    Looks like I have the same issue after upgrade from 1.3.4 to 1.7.1. I'm runing Rhodecode on Debian 6 (x86-64, X3450 @2.67GHz, 16Gb RAM) with nginx as frontend, after upgrade from rhodecode 1.3.4 all operations (include web-interface became very slow). I've migrated from SQLite to MySQL and run celery, it helps a bit, but still my 1.7.1 verion much more slower than 1.3.4. Looks like main reason is paster process that eats ~120% CPU. On nginx I can count 4 qps (mostly from our jenkins). hgweb is fast as usual P.S. Sorry for my English

  9. Zheng Zhu

    Same issue here as well, currently running on rhodecode 1.7.1, ubuntu 12.04.02 LTS x64. (dedicated vmware 4 vcpu x86-64 Intel i7-3770 3.4GHZ with 2GB RAM, Mysql 5.5.32 backend., Mercurial 2.6.2). After upgrade to 1.7.1, on client side hg push to the server, the cpu on the server process paster serve spikes to 150-200% for anywhere from 5-25 seconds. Our repos are very small, usually around 10-20 MB only, and most of them have less than 1000 commits. I don't believe I saw this behavior prior to version 1.7.1, but not entirely sure. It certainly appears to be much slower since the 1.7.1 upgrade. (We've been using rhodecode since about version 1.3.x)

  10. Marcin Kuzminski repo owner

    We're investigating all those issues, as for now we cannot find any reason for this slowdown (infact we believe it should be faster since lot of optimization)

    @zzhundg did you upgrade directly form 1.3 to 1.7 ? or the slowndown was from 1.6 to 1.7 ?

  11. Zheng Zhu

    I upgraded from 1.6.0alpha I believe. (Before was probably 1.5.1, 1.4.x, etc.) sorry, can't remember too far back :)

  12. Zheng Zhu

    One more thing. To be clear, the behavior is not very consistent; i.e. it doesn't happen on every push or pull, but will sometimes spike the CPU on either a push or pull. We do use jenkins for CI, but I checked and it's not on any crazy polling cycle (5 minutes minimum). Most of the time the rhodecode server appears to be idle.

  13. Marcin Kuzminski repo owner

    Ok this is very valuable information. I have been talking with @yvidineev on IRC and they had the same issue and they had a CI server pulling. They have switched to uwsgi setup to autospawn additional workers when CI servers hit RhodeCode so it handles extra load on demand. Not sure why this issues pop-s up sudenly now did you switch from paster/waitress or modify your .ini settings ?

  14. Zheng Zhu

    During the upgrade I didn't switch any settings in the ini or the deployment configuration. We've used the same settings for close to a year now.

  15. sai ganguri

    We had the same issue a week ago. We wee using Jenkins as CI environment. In jenkins. there was a field by name "CONCURRENT POLLING", which was set to 5. Reducing the number to 2 decreased the server load and improved the performance during cloning.....

  16. Rohit Sood

    We moved our RhodeCode infrastructure from Ubuntu to SuSe Linux and have noticed a marked degradation in the performance. The RhodeCode version is 1.7.2.

    The migration also entailed the following changes :

    1) Replacement of db from SqlLite to Postgres.

    2) Using RabbitMq message broker and enabling celeryd.

    The CPU load fluctuates between 60% to 130% , consistently staying at the top level of the range during work hours A snapshot of " top -H" is provided below during early morning.

    top - 09:40:32 up 18 days, 17:05, 1 user, load average: 0.05, 0.04, 0.03

    Tasks: 260 total, 5 running, 254 sleeping, 0 stopped, 1 zombie

    Cpu(s): 45.3%us, 10.6%sy, 0.0%ni, 43.7%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st

    Mem: 3928200k total, 3314724k used, 613476k free, 349128k buffers

    Swap: 4193272k total, 33948k used, 4159324k free, 2059520k cached

    Tasks: 254 total, 5 running, 248 sleeping, 0 stopped, 1 zombie

    13756 rhodecod 20 0 986m 665m 4144 S 51 17.4 1276:28 paster

    13759 rhodecod 20 0 986m 665m 4144 R 42 17.4 1273:38 paster

    13757 rhodecod 20 0 986m 665m 4144 R 7 17.4 1265:14 paster

    13758 rhodecod 20 0 986m 665m 4144 R 5 17.4 1260:23 paster

    13760 rhodecod 20 0 986m 665m 4144 R 4 17.4 1265:56 paster

    25686 postgres 20 0 98776 8240 5500 S 2 0.2 0:24.90 postgres

    26354 postgres 20 0 98776 8240 5504 S 1 0.2 0:20.51 postgres

    26391 postgres 20 0 98776 8200 5492 S 1 0.2 0:20.08 postgres

  17. Marcin Kuzminski repo owner

    Please check rhodecode logs on the amount of clone/pull operations. Moving out from sqlite to postgres and speed degradation doesn't really make sense. I understand that this is the same machine just the OS change right ?

  18. Rohit Sood

    Correct. Just a new Virtual Machine with the same memory/ disk space configuration.

    The quantum of clone/push operations would be in the expected range with the number of users staying identical and no additional CI jobs .

    The rhodecode log shows ~150 push operations during a typical day last week. I can gather more metrics to get an empirical figure.

    There are 85 Mercurial repos currently hosted.

  19. Rohit Sood

    I was looking at the Celery debug logs for possible lead. Does the below snippet (attachment not working.. it hangs. )confirm that celery workers are executing tasks from the Rabbitmq queue ? What is the nature of the task ? There are multiple events of the same.

    [Tasks] . celery.backend_cleanup

    . celery.ping

    . rhodecode.lib.celerylib.tasks.create_repo_fork

    . rhodecode.lib.celerylib.tasks.get_commits_stats

    . rhodecode.lib.celerylib.tasks.send_email

    . rhodecode.lib.celerylib.tasks.whoosh_index

    [2013-10-02 16:31:53,059: DEBUG/MainProcess] Instantiating thread components...

    [2013-10-02 16:31:53,063: DEBUG/MainProcess] Starting thread TaskPool...

    [2013-10-02 16:31:53,065: DEBUG/MainProcess] created semlock with handle 140555969622016

    [2013-10-02 16:31:53,065: DEBUG/MainProcess] created semlock with handle 140555969617920

    [2013-10-02 16:31:53,065: DEBUG/MainProcess] created semlock with handle 140555969613824

    [2013-10-02 16:31:53,066: DEBUG/MainProcess] created semlock with handle 140555969609728

    [2013-10-02 16:31:53,073: DEBUG/MainProcess] worker handler starting

    [2013-10-02 16:31:53,075: INFO/PoolWorker-1] child process calling self.run()

    [2013-10-02 16:31:53,077: DEBUG/MainProcess] Starting thread Mediator...

    [2013-10-02 16:31:53,077: DEBUG/MainProcess] result handler starting

    [2013-10-02 16:31:53,078: DEBUG/MainProcess] Starting thread Timer...

    [2013-10-02 16:31:53,078: DEBUG/MainProcess] Starting thread Consumer...

    [2013-10-02 16:31:53,079: WARNING/MainProcess] celery@lxhost07 has started.

    [2013-10-02 16:31:53,079: DEBUG/MainProcess] Consumer: Re-establishing connection to the broker...

    [2013-10-02 16:31:53,084: DEBUG/MainProcess] Start from server, version: 8.0, properties: {u'information': u'Licensed under the MPL. See http://www.rabbitmq.com/', u'product': u'RabbitMQ', u'copyright': u'Copyright (C) 2007-2013 VMware, Inc.', u'capabilities': {}, u'platform': u'Erlang/OTP', u'version': u'3.1.3'}, mechanisms: [u'AMQPLAIN', u'PLAIN'], locales: [u'en_US']

    [2013-10-02 16:31:53,085: DEBUG/MainProcess] Open OK! known_hosts []

    [2013-10-02 16:31:53,086: DEBUG/MainProcess] Consumer: Connection Established.

    [2013-10-02 16:31:53,086: DEBUG/MainProcess] using channel_id: 1

    [2013-10-02 16:31:53,087: DEBUG/MainProcess] Channel open

    [2013-10-02 16:31:53,090: DEBUG/MainProcess] basic.qos: prefetch_count->8

    [2013-10-02 16:31:53,090: DEBUG/MainProcess] using channel_id: 2

    [2013-10-02 16:31:53,091: DEBUG/MainProcess] Channel open

    [2013-10-02 16:31:53,084: INFO/PoolWorker-2] child process calling self.run()

    [2013-10-02 16:31:53,097: DEBUG/MainProcess] Consumer: Starting message consumer...

    [2013-10-02 16:31:53,098: DEBUG/MainProcess] Consumer: Ready to accept tasks!

    [2013-10-03 08:52:42,913: WARNING/MainProcess] celeryd: Warm shutdown (MainProcess)

    [2013-10-03 08:52:42,923: DEBUG/MainProcess] Stopping thread Consumer...

    [2013-10-03 08:52:42,923: DEBUG/MainProcess] Consumer: Stopping consumers...

    [2013-10-03 08:52:42,923: DEBUG/MainProcess] Stopping thread Timer...

    [2013-10-03 08:52:43,776: DEBUG/MainProcess] Stopping thread Mediator...

    [2013-10-03 08:52:44,705: DEBUG/MainProcess] Stopping thread TaskPool...

    [2013-10-03 08:52:44,706: DEBUG/MainProcess] closing pool

    [2013-10-03 08:52:44,914: DEBUG/MainProcess] worker handler exiting

    [2013-10-03 08:52:44,914: DEBUG/MainProcess] joining worker handler

    [2013-10-03 08:52:44,915: DEBUG/MainProcess] task handler got sentinel

    [2013-10-03 08:52:44,915: DEBUG/MainProcess] joining task handler

    [2013-10-03 08:52:44,915: DEBUG/MainProcess] task handler sending sentinel to result handler

    [2013-10-03 08:52:44,916: DEBUG/MainProcess] task handler sending sentinel to workers

    [2013-10-03 08:52:44,916: DEBUG/MainProcess] task handler exiting

    [2013-10-03 08:52:44,916: DEBUG/MainProcess] result handler got sentinel

    [2013-10-03 08:52:44,916: DEBUG/MainProcess] ensuring that outqueue is not full

    [2013-10-03 08:52:44,917: DEBUG/MainProcess] result handler exiting: len(cache)=0, thread._state=0

    [2013-10-03 08:52:44,917: DEBUG/PoolWorker-2] worker got sentinel -- exiting

    [2013-10-03 08:52:44,919: DEBUG/PoolWorker-2] worker exiting after 0 tasks

    [2013-10-03 08:52:44,919: INFO/PoolWorker-2] process shutting down

    [2013-10-03 08:52:44,919: DEBUG/PoolWorker-2] running all "atexit" finalizers with priority >= 0

    [2013-10-03 08:52:44,919: DEBUG/PoolWorker-2] running the remaining "atexit" finalizers

    [2013-10-03 08:52:44,920: INFO/PoolWorker-2] process exiting with exitcode 0

    [2013-10-03 08:52:44,924: DEBUG/PoolWorker-1] worker got sentinel -- exiting

    [2013-10-03 08:52:44,925: DEBUG/PoolWorker-1] worker exiting after 0 tasks

    [2013-10-03 08:52:44,925: INFO/PoolWorker-1] process shutting down

    [2013-10-03 08:52:44,926: DEBUG/PoolWorker-1] running all "atexit" finalizers with priority >= 0

    [2013-10-03 08:52:44,926: DEBUG/PoolWorker-1] running the remaining "atexit" finalizers

    [2013-10-03 08:52:44,926: INFO/PoolWorker-1] process exiting with exitcode 0

    [2013-10-03 08:52:44,929: DEBUG/MainProcess] joining result handler

    [2013-10-03 08:52:44,929: DEBUG/MainProcess] result handler joined

    [2013-10-03 08:52:44,929: DEBUG/MainProcess] joining worker 0/2 (<Process(PoolWorker-1, stopped daemon)>)

    [2013-10-03 08:52:44,930: DEBUG/MainProcess] joining worker 1/2 (<Process(PoolWorker-2, stopped daemon)>)

    [2013-10-03 08:52:44,930: DEBUG/MainProcess] Consumer: Closing consumer channel...

    [2013-10-03 08:52:44,932: DEBUG/MainProcess] Closed channel #1

    [2013-10-03 08:52:44,932: DEBUG/MainProcess] CarrotListener: Closing broadcast channel...

    [2013-10-03 08:52:44,948: DEBUG/MainProcess] Closed channel #2

    [2013-10-03 08:52:44,948: DEBUG/MainProcess] Consumer: Closing connection to broker...

    [2013-10-03 08:52:44,949: INFO/MainProcess] process shutting down

    [2013-10-03 08:52:44,950: DEBUG/MainProcess] running all "atexit" finalizers with priority >= 0 [2013-10-03 08:52:44,950: DEBUG/MainProcess] finalizing pool

    [2013-10-03 08:52:44,950: DEBUG/MainProcess] helping task handler/workers to finish

    [2013-10-03 08:52:44,950: DEBUG/MainProcess] removing tasks from inqueue until task handler finished

    [2013-10-03 08:52:44,950: DEBUG/MainProcess] terminating workers

    [2013-10-03 08:52:44,951: DEBUG/MainProcess] joining task handler

    [2013-10-03 08:52:44,951: DEBUG/MainProcess] joining result handler

    [2013-10-03 08:52:44,951: DEBUG/MainProcess] joining pool workers

    [2013-10-03 08:52:44,951: DEBUG/MainProcess] pool workers joined

    [2013-10-03 08:52:44,951: DEBUG/MainProcess] running the remaining "atexit" finalizers

    [2013-10-03 08:52:55,953: WARNING/MainProcess] -------------- celery@lxhost07 v2.2.10 ---- **** -----

    --- * *** * -- [Configuration]

    -- * - **** --- . broker: amqp://rhodeuser@localhost:5672/rhodevhost

    • ** ---------- . loader: rhodecode.lib.celerypylons.loader.PylonsLoader

    • ** ---------- . logfile: /var/log/rhodecode/celery.log@DEBUG

    • ** ---------- . concurrency: 2

    • ** ---------- . events: OFF

    • *** --- * --- . beat: OFF

    -- * ----

    --- * ----- [Queues]

    -------------- . celery: exchange:celery (direct) binding:celery

  20. Marcin Kuzminski repo owner

    it doesn't seems that it's celery issues But just to be sure, cleanup rabbitmq queue, there's one heavy task RhodeCode can perform via celery ie. parse commit stats when statistics are enabled on a repo.

  21. Rohit Sood

    I tried a "paster make-index .." command in the hope of seeing rhodecode messages flow through on the rabbitmq management console. The queue status is idle with no messages.

    The celery log does not show any activity either.

    The production.ini settings:

    use_celery = true

    broker.host = localhost

    broker.vhost = rhodevhost

    broker.port = 5672

    broker.user = rhodeuser

    broker.password = xxxxx

    celery.imports = rhodecode.lib.celerylib.tasks

    celery.result.backend = amqp

    celery.result.dburi = amqp://

    celery.result.serialier = json

    #celery.send.task.error.emails = true

    #celery.amqp.task.result.expires = 18000

    celeryd.concurrency = 2

    #celeryd.log.file = celeryd.log

    celeryd.log.level = debug

    celeryd.max.tasks.per.child = 1

    celery.always.eager = false

  22. Log in to comment