Provide a fairly constant CPU and network resource

Issue #13079 open
Barnabas Gema
created an issue

I'm trying to run my integration tests using pipelines. When running on my local machine they need a fairly constant 2-3 seconds/test case, but when running them on pipelines the deviance is much higher, they need 1-10 seconds to complete.

It wouldn't be disturbing if on every run the same test cases would be slow, but it doesn't really depend on the exact test case how much time it takes for it to pass, So sometimes a test case needs 1,5 seconds, sometimes it needs 9 and it makes finding a proper timeout for the test cases really challenging. It would be nice if during a pipelines run the container got a fairly constant CPU resource, or if it would have been configurable whether I need a constant or a best effort CPU.

Comments (25)

  1. Gabriel Kaputa

    same here, it is very slow.. i am running phpunit tests and every time, even if i use different tests, after the first two, nothing happens for a couple of minutes and then the testing continues (still slow)

  2. Joshua Tjhin staff
    • changed status to open

    Thanks for the feedback. As you probably know, we run builds on the shared infrastructure and therefore CPU is best effort. In the future, we would like to make CPU more constant for more steady and reliable builds. This however, is unlikely to be implemented in the next 6 months.

  3. Nick Houghton

    This is a big deal for a CI tool. We are trying to move more of our CI workload into pipelines but are seeing strange processing behaviour where pipelines appear to hang for long periods (minutes), and tests fail non-deterministically because of it (tests fail after timeouts).

    Logging shows clock jumps measured in seconds to minutes, like this:

    2017-06-19 09:49:45:164 WARN  kari.pool.HikariPool - readwrite - Thread starvation or clock leap detected (housekeeper delta=45s970ms942µs457ns).
    2017-06-19 09:50:16:831 WARN  kari.pool.HikariPool - readwrite - Thread starvation or clock leap detected (housekeeper delta=45s163ms543µs617ns).
    2017-06-19 09:51:21:838 WARN  kari.pool.HikariPool - readwrite - Thread starvation or clock leap detected (housekeeper delta=1m5s7ms608µs650ns).
    

    Which generally means the process didn't get CPU time for the delta period. Which lines up with what we are seeing around very slow processing of steps in the pipeline. Currently our test step in one of our pipelines takes 11mins to fail, running the same test in the "atlassian/default-image:latest" container locally takes around 2 minutes with no failures.

    Shared CPU infrastructure where scheduling is non-deterministic just means that sometimes builds are going to fail and require 1 or more reruns. Especially for a product where you are billing in minutes, it feels very disingenuous to make builds take longer (cost more minutes) and fail more often and require reruns (cost more minutes).

    Is there any more info about how the CPU is scheduled? and how to stay underneath the limits? Is it per container? or at a lower infrastructure level?

  4. Nick Houghton

    I gave up on bitbucket pipelines and moved to Google Container Builder. 120 mins free per day, and no lame shared compute resources. My build and tests ran first time.

  5. James Dengel

    I'd also like to note that pulling the image can make a huge difference to build time for a very simple project, also there is no monitoring of the time taken to pull the image.

    For instance logs of the build are as follows:

    build setup 5 seconds tox -e coverage 10 seconds

    but the billing period was 57 seconds, not sure where my missing 42 seconds are.

    another build

    build setup 4 seconds tox -e coverage 15 seconds total billing period 22 seconds. - missing 7 seconds

    Can we be shown exactly where this time is being used. It somewhats puts me off paying with knowing where 1/3 of our small build time is going.

    The above times are for the same repo with only minor changes between commits.

  6. Marcus Schumann

    We have the same issues like James Dengel. The build-time is not correct and it's also highly different between each run, even when running the same exact pipeline for the same exact commit (re-run).

  7. James Dengel

    bitbucket.PNG

    This is a prime example of the issue, time taken to pull the image and the cache.

    build: 2 mins 25 seconds - build setup : 1 minute 12s - tox: 9s - push: 17s

    total 1:38 Missing time : 47 seconds.

  8. Joshua Tjhin staff

    Hi James,

    The build setup includes the cloning of the repo and downloading the cache. However, it does not include the time to pull images and start containers. This additional time also varies and might be faster if some image layers have been already downloaded. I've created a new issue #14484 for this improvement to provide a better breakdown of the pipeline duration.

    We're always trying to improve build speeds and a few days ago we rolled out an experiment to cache the public docker images.

    Regards,
    Joshua

  9. Joshua Tjhin staff

    @Marcus Schumann yes, by caching build and service images, you might notice logs start streaming a little earlier. It has been rolled out to all customers and you should benefit automatically. We will monitor the impact and make additional improvements.

  10. Hudson Mendes

    Very, VERY! slow. Such a slow build infrastructure only contributes to miserable hours of wasted time. Already looking into alternatives - Bitbucket pipelines has burnt a lot of very scarse time I had on a startup I'm working on. Nothing fancy necessary: just get more processing power and memory on this machines already...

  11. Nick Boultbee

    I'm seeing custom Docker builds take 17 - 27 mins that I can run in under a minute locally (on a reasonable I7 with SSD).

    So to me, two problems: (a) very slow (big problem), and (b) quite variable (not so much a problem).

  12. Nick Boultbee

    CircleCI (and Travis) are indeed much faster and with similar enough configuration.

    I don't feel Bitbucket is incentivised enough here to help, as they're now charging us extra for being so slow (due to exceeding the included minutes in our plan). Hmmm.

  13. Paul Carter-Brown

    Hi,

    I'm getting really frustrated with the performance of pipelines. Especially considering we are paying based on time. This needs to be dealt with urgently or suspend billing clients until the issue is fixed. Here is a docker build step in my pipeline. I've added a "RUN date" step as every second step so you can see the progress. Even running the date command takes something like 30s:

    Status: Downloaded newer image for jinigurumauritius/ubuntu_jdk_tomee:latest ---> 05fdc078884d Step 2/19 : RUN date ---> Running in abc9e8f1d14a Thu Dec 14 13:01:00 UTC 2017 ---> 942cb7bae99e Removing intermediate container abc9e8f1d14a Step 3/19 : RUN mkdir -p /opt/tomee/apps/ && rm /opt/tomee/lib/johnzon- ---> Running in deaf4d47bdc0 ---> f1243138756b Removing intermediate container deaf4d47bdc0 Step 4/19 : RUN date ---> Running in d0747c048f25 Thu Dec 14 13:01:52 UTC 2017 ---> 5c7b8b24192f Removing intermediate container d0747c048f25 Step 5/19 : COPY deployable/target/.ear /opt/tomee/apps/ ---> ec07c86b4e10 Removing intermediate container e916471e79b0 Step 6/19 : RUN date ---> Running in c146d8e77e80 Thu Dec 14 13:02:52 UTC 2017 ---> 17fcdd2da425 Removing intermediate container c146d8e77e80 Step 7/19 : COPY deployable/target/jars/jg-arch-log-formatter.jar /usr/lib/jvm/java-8-oracle/jre/lib/ext/ ---> 04139c43b618 Removing intermediate container 78308a5354b2 Step 8/19 : RUN date ---> Running in 8188cdadb9af Thu Dec 14 13:04:33 UTC 2017 ---> ab33ca967b57 Removing intermediate container 8188cdadb9af Step 9/19 : COPY deployable/target/jars/mysql-connector-java.jar docker/all/hacked-libs/* /opt/tomee/lib/ ---> 10c8b8e8bfec Removing intermediate container b6657221ac64 Step 10/19 : RUN date ---> Running in d765595cd0ce Thu Dec 14 13:05:33 UTC 2017 ---> 3cc1d0c90ed1 Removing intermediate container d765595cd0ce Step 11/19 : COPY docker/all/tomee.xml docker/all/logging.properties docker/all/server.xml /opt/tomee/conf/ ---> 45cc6af66c75 Removing intermediate container 68ffc104468f Step 12/19 : RUN date ---> Running in 3b01f8a87b1f Thu Dec 14 13:06:24 UTC 2017 ---> 1edaced52591 Removing intermediate container 3b01f8a87b1f Step 13/19 : COPY docker/all/setenv.sh /opt/tomee/bin/ ---> 56861adf785b Removing intermediate container d01af87cd73a Step 14/19 : RUN date ---> Running in 163715b5bc4f Thu Dec 14 13:07:10 UTC 2017 ---> 998a1e3d84a3 Removing intermediate container 163715b5bc4f Step 15/19 : COPY docker/all/run.sh / ---> e96fe0845d13 Removing intermediate container a1db54d9b34d Step 16/19 : RUN date ---> Running in 76b720d624cb Thu Dec 14 13:08:05 UTC 2017 ---> 8a2e88fc79e4 Removing intermediate container 76b720d624cb Step 17/19 : RUN chmod +x /run.sh ---> Running in 55956f7e90ce ---> 5e77516ce2f8 Removing intermediate container 55956f7e90ce Step 18/19 : RUN date ---> Running in 488fc1c84247 Thu Dec 14 13:08:58 UTC 2017 ---> e0aa09b5f033 Removing intermediate container 488fc1c84247 Step 19/19 : CMD /run.sh ---> Running in f40389c0e7b2 ---> d271f2bc3f3b Removing intermediate container f40389c0e7b2 Successfully built d271f2bc3f3b Successfully tagged 831776913662.dkr.ecr.eu-west-1.amazonaws.com/ngage:latest

    That's 9 minutes to run a few copy commands.

    Running the same docker build on my laptop lakes about 10s and that's not due to layer caching:

    Step 6/19 : RUN date ---> Running in 305bbd635591 Thu Dec 14 13:20:00 UTC 2017 ---> 9c7bf288a205 Removing intermediate container 305bbd635591 Step 7/19 : COPY deployable/target/jars/jg-arch-log-formatter.jar /usr/lib/jvm/java-8-oracle/jre/lib/ext/ ---> 1d24c7d0b18a Removing intermediate container 6648d4bd22bc Step 8/19 : RUN date ---> Running in 6ad534b21909 Thu Dec 14 13:20:01 UTC 2017 ---> 75ca7eb2ddb3 Removing intermediate container 6ad534b21909 Step 9/19 : COPY deployable/target/jars/mysql-connector-java.jar docker/all/hacked-libs/* /opt/tomee/lib/ ---> 780c4b7652d5 Removing intermediate container 8d303e7996bc Step 10/19 : RUN date ---> Running in 286356627c69 Thu Dec 14 13:20:02 UTC 2017 ---> 975c55ca6fa9 Removing intermediate container 286356627c69 Step 11/19 : COPY docker/all/tomee.xml docker/all/logging.properties docker/all/server.xml /opt/tomee/conf/ ---> 880ca7f0ecef Removing intermediate container d44acfe4b9a6 Step 12/19 : RUN date ---> Running in b8910ed6bf3b Thu Dec 14 13:20:04 UTC 2017 ---> ba51e68840f4 Removing intermediate container b8910ed6bf3b Step 13/19 : COPY docker/all/setenv.sh /opt/tomee/bin/ ---> b9239ae873da Removing intermediate container 3644b2498bc6 Step 14/19 : RUN date ---> Running in 0c56d060b2f8 Thu Dec 14 13:20:05 UTC 2017 ---> 5720533332f9 Removing intermediate container 0c56d060b2f8 Step 15/19 : COPY docker/all/run.sh / ---> f3b8bd2d1032 Removing intermediate container d858557ad957 Step 16/19 : RUN date ---> Running in 90dbd0ea94d1 Thu Dec 14 13:20:06 UTC 2017 ---> fc2c8bcb0643 Removing intermediate container 90dbd0ea94d1 Step 17/19 : RUN chmod +x /run.sh ---> Running in 6f2534932e4b ---> 28450b4cf99f Removing intermediate container 6f2534932e4b Step 18/19 : RUN date ---> Running in e2dee7143445 Thu Dec 14 13:20:07 UTC 2017 ---> ff1b4844f2ba Removing intermediate container e2dee7143445 Step 19/19 : CMD /run.sh ---> Running in d0922815e090 ---> a53821bb5d32 Removing intermediate container d0922815e090 Successfully built a53821bb5d32

    So basically pipeline is 50 time slower than a laptop!

  14. Barnabas Gema reporter

    Actually I posted the ticket 1 year and 5 months ago, so it is even more disturbing. With this quality of service there is really no point in using the product. After a few trials and errors we have moved over to our self-hosted Jenkins, that has an almost identical and well functioning Pipeline feature.

  15. Aneita Yang staff

    Hi everyone,

    Thanks for your interest in this issue and for your patience. When we look at our open tickets, the number of votes that a ticket has plays much more of a role than the priority it is assigned. Over the past couple of months, the team has been working on issues with a much higher number of votes.

    In the past few weeks, we've looked at a range of solutions for the variance in build time. Unfortunately, the solution isn't as simple as just restricting the CPU available to each build. While this makes build times more consistent as every pipeline will have the same amount of resources allocated to it, regardless of when it's being run, it also means that builds could be slower overall. Your build that has previously had no limit in the CPUs it could use, is now limited in this resource. We will continue to investigate this issue and experiment with different solutions in the new year. I'll keep you updated on our progress via this issue.

    Thanks again for your patience.

    Aneita

  16. Log in to comment