Provide a fairly constant CPU and network resource

Issue #13079 open
Barnabas Gema
created an issue

I'm trying to run my integration tests using pipelines. When running on my local machine they need a fairly constant 2-3 seconds/test case, but when running them on pipelines the deviance is much higher, they need 1-10 seconds to complete.

It wouldn't be disturbing if on every run the same test cases would be slow, but it doesn't really depend on the exact test case how much time it takes for it to pass, So sometimes a test case needs 1,5 seconds, sometimes it needs 9 and it makes finding a proper timeout for the test cases really challenging. It would be nice if during a pipelines run the container got a fairly constant CPU resource, or if it would have been configurable whether I need a constant or a best effort CPU.

Comments (17)

  1. Gabriel Kaputa

    same here, it is very slow.. i am running phpunit tests and every time, even if i use different tests, after the first two, nothing happens for a couple of minutes and then the testing continues (still slow)

  2. Joshua Tjhin staff
    • changed status to open

    Thanks for the feedback. As you probably know, we run builds on the shared infrastructure and therefore CPU is best effort. In the future, we would like to make CPU more constant for more steady and reliable builds. This however, is unlikely to be implemented in the next 6 months.

  3. Nick Houghton

    This is a big deal for a CI tool. We are trying to move more of our CI workload into pipelines but are seeing strange processing behaviour where pipelines appear to hang for long periods (minutes), and tests fail non-deterministically because of it (tests fail after timeouts).

    Logging shows clock jumps measured in seconds to minutes, like this:

    2017-06-19 09:49:45:164 WARN  kari.pool.HikariPool - readwrite - Thread starvation or clock leap detected (housekeeper delta=45s970ms942µs457ns).
    2017-06-19 09:50:16:831 WARN  kari.pool.HikariPool - readwrite - Thread starvation or clock leap detected (housekeeper delta=45s163ms543µs617ns).
    2017-06-19 09:51:21:838 WARN  kari.pool.HikariPool - readwrite - Thread starvation or clock leap detected (housekeeper delta=1m5s7ms608µs650ns).
    

    Which generally means the process didn't get CPU time for the delta period. Which lines up with what we are seeing around very slow processing of steps in the pipeline. Currently our test step in one of our pipelines takes 11mins to fail, running the same test in the "atlassian/default-image:latest" container locally takes around 2 minutes with no failures.

    Shared CPU infrastructure where scheduling is non-deterministic just means that sometimes builds are going to fail and require 1 or more reruns. Especially for a product where you are billing in minutes, it feels very disingenuous to make builds take longer (cost more minutes) and fail more often and require reruns (cost more minutes).

    Is there any more info about how the CPU is scheduled? and how to stay underneath the limits? Is it per container? or at a lower infrastructure level?

  4. Nick Houghton

    I gave up on bitbucket pipelines and moved to Google Container Builder. 120 mins free per day, and no lame shared compute resources. My build and tests ran first time.

  5. James Dengel

    I'd also like to note that pulling the image can make a huge difference to build time for a very simple project, also there is no monitoring of the time taken to pull the image.

    For instance logs of the build are as follows:

    build setup 5 seconds tox -e coverage 10 seconds

    but the billing period was 57 seconds, not sure where my missing 42 seconds are.

    another build

    build setup 4 seconds tox -e coverage 15 seconds total billing period 22 seconds. - missing 7 seconds

    Can we be shown exactly where this time is being used. It somewhats puts me off paying with knowing where 1/3 of our small build time is going.

    The above times are for the same repo with only minor changes between commits.

  6. Marcus Schumann

    We have the same issues like James Dengel. The build-time is not correct and it's also highly different between each run, even when running the same exact pipeline for the same exact commit (re-run).

  7. James Dengel

    bitbucket.PNG

    This is a prime example of the issue, time taken to pull the image and the cache.

    build: 2 mins 25 seconds - build setup : 1 minute 12s - tox: 9s - push: 17s

    total 1:38 Missing time : 47 seconds.

  8. Joshua Tjhin staff

    Hi James,

    The build setup includes the cloning of the repo and downloading the cache. However, it does not include the time to pull images and start containers. This additional time also varies and might be faster if some image layers have been already downloaded. I've created a new issue #14484 for this improvement to provide a better breakdown of the pipeline duration.

    We're always trying to improve build speeds and a few days ago we rolled out an experiment to cache the public docker images.

    Regards,
    Joshua

  9. Joshua Tjhin staff

    @Marcus Schumann yes, by caching build and service images, you might notice logs start streaming a little earlier. It has been rolled out to all customers and you should benefit automatically. We will monitor the impact and make additional improvements.

  10. Hudson Mendes

    Very, VERY! slow. Such a slow build infrastructure only contributes to miserable hours of wasted time. Already looking into alternatives - Bitbucket pipelines has burnt a lot of very scarse time I had on a startup I'm working on. Nothing fancy necessary: just get more processing power and memory on this machines already...

  11. Nick Boultbee

    I'm seeing custom Docker builds take 17 - 27 mins that I can run in under a minute locally (on a reasonable I7 with SSD).

    So to me, two problems: (a) very slow (big problem), and (b) quite variable (not so much a problem).

  12. Log in to comment