Allow optional concurent/parallel script execution in Pipelines

Issue #14354 resolved
Patrick Fiaux created an issue

For a test pipeline which runs multiple tests that are independent of each other it would shorten the feedback loop to run them in parallel.

pipelines:
  default:
    - step:
        parallel: true # for example
        script:
          - test/client.sh
          - test/server.sh
          - test/other.sh

This could be achieved currently by writing a master script that starts the other ones but it would make a mess of all the outputs which are currently cleanly separated in pipelines.

Ideally as soon as one fails it would fail the built early.

Official response

  • Matt Ryall

    Hi everyone,

    Thanks for your votes and feedback on this ticket.

    I'm happy to share that parallel steps is now available to all users in Bitbucket Pipelines. To enable this in your build, just indent your steps into a parallel group as shown below:

    pipelines:
      default:
        - step:
            name: Build
            artifacts:
              - dist/**  # artifacts are copied to all parallel steps
            script:
              - ...
        - parallel:     # run the steps below in parallel
            - step:
                name: Unit tests
                script:
                  - ...
            - step:
                name: Integration tests
                script:
                  - ...
    

    This is primarily designed for splitting up test suites to run in parallel on separate containers and complete faster. We've had reports of this shaving off two-thirds of the build time in some cases, so this can be a big time-saver for teams with many tests. (If you just want to run things in parallel in a single container, there are some options for that described in a comment above.)

    Splitting up your tests into batches can be done in a variety of ways, either manually by configuring multiple test suites or some test runners support methods of automatic batching. Pipelines provides $BITBUCKET_PARALLEL_STEP and $BITBUCKET_PARALLEL_STEP_COUNT environment variables to facilitate the latter.

    The main limitations of the feature today is that you can't declare artifacts in parallel steps (yet - more on that below), and you are limited to a maximum of 10 steps in total for your pipeline, just as you were before.

    We're continuing to work on this feature after release and will be adding support for artifact generation in parallel steps and tidying up a few rough edges around the UI. Since there's a specific ticket open for artifact support (#15843), and only a subset of people will need it, I'll close this one off in favour of that.

    Please watch for #15843 if you're interested in updates regarding artifacts, and raise any other enhancement requests as new tickets.

    Thanks,
    Matt

    Edit: oops, I had the wrong env vars above. Corrected and linked to the docs.

Comments (23)

  1. Adam Long

    Keen on seeing this feature to both improve successful build time and to shorten the feedback loops of failed builds.

  2. Ben Bates Account Deactivated

    We have a regular requirement to deploy to multiple environments (Salesforce) from a commit to a single branch - parallel steps would really help here. Running these steps serially multiplies our build times, would be great to run them in parallel. Another excellent use case is retrieving Salesforce metadata - this can be broken up into multiple, parallel steps, massively reducing the retrieve timeframe (for our biggest org, this activity is reduced from ~40 minutes to ~ 8 minutes when running in parallel using Jenkins pipelines). Hope to see this feature soon!!!

  3. Viet Yen Nguyen

    We have a workaround using bash scripting:

    - composer run cs:check &> /tmp/cs.out &  # run command in the background
    - export CHECKPID=$! # grab the PID of the background process
    ... (do stuff) ...
    - wait $CHECKPID || (cat /tmp/cs.out && exit 1) # now join with the background process
    - cat /tmp/cs.out # show its output
    
  4. Andrew howden

    Google Cloud implement this in a nice way that would also be possible here;

    https://cloud.google.com/container-builder/docs/api/reference/rest/v1/projects.builds#Build (look for waitFor)

    basically, each build step defines it's own dependencies. If there are no dependencies defined, they're executed in sequence. If not, in parallel. So, an example declaration would be:

    ---
    pipelines:
      custom:
        "Deploy to Production":
          - step:
              name: "Build the application"
              script:
                - build/ci/ci.sh build
          - step:
              name: "Provision Testing Infrastruture"
              script:
                - ENV="stg" DEPLOY_TYPE="infrastructure" build/ci/ci.sh deploy
          - step:
              name: "Provision Testing Server"
              script:
                - ENV="stg" DEPLOY_TYPE="server" build/ci/ci.sh deploy
              waitFor: 
                - "Provision Testing Infrastructure"
          - step:
              name: "Provision Testing Application"
              script:
                - ENV="stg" DEPLOY_TYPE="application" build/ci/ci.sh deploy
              deployment: "staging"
              waitFor: 
                - "Provision Testing Server"
                - "Build the application"
          - step:
              name: "Provision Production Server"
              trigger: "manual"
              script:
                - ENV="prd" DEPLOY_TYPE="server" build/ci/ci.sh deploy
          - step:
              name: "Provision Production Application"
              script:
                - ENV="prd" DEPLOY_TYPE="application" build/ci/ci.sh deploy
              deployment: "production"
              waitFor:
                - "Provision the Production server"
    

    Builds would look smth like:

                 Provision the testing application*
                          |
                          |
    Build the application |
       |                  |       Provision production server
       |                  |              |
       O------------------|              |
                          O--------------O---------------O
    O-------O-------------|                              |
    |       |                                            |
    |       |                                 Provision production application
    Provision the testing Infrastrcture
            |
            |
    Provision the testing server
    
    * Last step before manual, means that dependencies are all "reset" here.
    
  5. Jan Esser

    When we receive parallel steps I would recommend the Screen Shot 2018-01-26 at 11.00.36.png info to also show all steps instead of the number of pipelines that where run so that if you have 3 parallel test steps in a pipeline of which one fails you can show 2 of 3 passed, or alternatively separate the step passes from pipeline passes. but show info in all views like source branch, pull request, etc

  6. Raul Gomis staff

    Hi everyone,

    We're excited to let you know that parallel steps is now available in Bitbucket Pipelines Alpha. For information on how to configure parallel steps, check out our documentation.

    If you're not part of the early access program, but would like to try this feature, you can sign up to be a Pipelines early access customer.

    We do have a known issue that parallel steps aren't always displayed in the order that they are configured. This is something that we're looking at fixing. If you have any feedback on the feature, please email the team at pipelines-feedback@atlassian.com.

    Thanks!

  7. Michael Delle

    Thanks for the work on parallel steps but unfortunately it still won't work for us :(

    I want to build my environment in parallel (create database table, install dependencies, and lint in parallel), and then test at the end. But each "step" appears to clone a new repository.

    pipelines:
      default:
        - parallel:
          - step:
              name: Install Dependencies
              caches:
                - composer
              script:
                - composer install
          - step:
              name: Create Database
              script:
                - make database
              services:
                - database
          - step:
              name: Lint
              script:
                - make lint
        - step:
            name: Test
            deployment: test
            script:
              - make test
    
    definitions:
      services:
        database:
          image: mysql:5.5.43
    
  8. Matt Ryall

    @michaeldelle - thanks for the reply. You're correct that parallel steps can't set up your build environment in parallel, because each step in Pipelines will start up in a separate container, each with its own copy of your source code.

    Just to explain briefly what we're designing for, it's really primarily for parallel testing, which is something that can't easily be done with Pipelines today. We're aiming to fit the model we see used by the vast majority of CI/CD builds in the wild, which is:

    • a single initial step to build the software (maybe with some very quick unit tests), followed by
    • multiple parallel steps to fully unit test, lint, integration test, browser test, etc. the software (each with their own environment setup as needed)
    • final serial steps to deploy to a set of environments in order, usually automatically to the test environment, then manually to staging and production.

    Pipelines supports artifacts which allows you to pass folders or binaries from that first step to all the subsequent steps. The environment for the initial build step should be defined in Docker containers as much as possible, and external build dependencies cached, so your build spends most of its time building your software, not setting up the build environment.

    But if you really want to parallelise your environment setup scripts, you can do this by spinning off background tasks in your script. I think something like this should work:

    pipelines:
      default:
        - parallel:
          - step:
              name: Build and test
              caches:
                - composer
              services:
                - database
              script:
                - function prefix_with () { sed -e "s/^/$1 /"; }
                - composer install | prefix_with "[composer]" &
                - make database | prefix_with "[database]" &
                - wait
                - make test
          - step:
              name: Lint
              script:
                - make lint
    
    definitions:
      services:
        database:
          image: mysql:5.5.43
    

    I am curious how much time this shaves off your build though. Composer dependencies should be mostly cached. Setting up the database should be pretty fast, but if not, you could bake it into your own override of the MySQL image. (Linting seems like the best candidate for splitting into a parallel step, so I kept it split out. Might not work if it needs your composer dependencies, and it's easy to drop back into the main step.)

    There are two caveats with running stuff this way:

    1. This simple example won't fail your build if any of the background jobs fail, because wait with no arguments always returns an exit status of zero. This might be okay, but if not, you need to run wait looping over the PIDs of your running processes. There's a couple of examples in the second response to this Stack Overflow thread.
    2. Output gets mixed together, which I've tried to fix with prefixing each line of output. If your tools aren't line-buffered, there could still be mixed up lines. If that's a problem, you could dump the logs from each command to a file, then print them out after the wait command (or just ones that fail).

    Does that seem like it will work for you? I'm sure it's a common need, so if it does, we'll add something to our documentation to cover this off as well.

  9. Bryant

    @mryall_atlassian is there a plan to allow use of artifacts in a parallel step that will be used in a serial step after all of the parallel steps are completed?

    My script matches the build/test/deploy model.

    The goal being that the build step is then run in parallel as well. The benefit of doing that in my case is because I run yarn install and then build my front end and then test it. If the build and test are separate then the test step will then have to pull down the cache of node modules from the build, which isn't a significant amount of time in one build, but it adds up over time and will use more build minutes than is currently being used (without parallel steps).

    The balance I am trying to strike is one where by parallelizing I am able to not spend more build minutes, but have my pipelines be completed in a shorter amount of time.

    As an example, Gitlab pipelines have the concept of stages and steps. Each stage can have multiple steps. By default all steps in a stage are run in parallel, and if a stage has steps that have artifacts then those artifacts are carried over to the subsequent stages.

    It feels like we are really close to having something like that.

  10. Matt Ryall

    Thanks for the feedback, @bsell. We'll look into supporting artifacts generated by parallel steps and see if it's simple enough to include in our first iteration of this feature. There are some edge cases we're worried about and need to investigate, like when parallel steps upload artifacts in the same location.

    We'll keep you posted over the next few weeks.

  11. Matt Ryall

    Hi everyone,

    Thanks for your votes and feedback on this ticket.

    I'm happy to share that parallel steps is now available to all users in Bitbucket Pipelines. To enable this in your build, just indent your steps into a parallel group as shown below:

    pipelines:
      default:
        - step:
            name: Build
            artifacts:
              - dist/**  # artifacts are copied to all parallel steps
            script:
              - ...
        - parallel:     # run the steps below in parallel
            - step:
                name: Unit tests
                script:
                  - ...
            - step:
                name: Integration tests
                script:
                  - ...
    

    This is primarily designed for splitting up test suites to run in parallel on separate containers and complete faster. We've had reports of this shaving off two-thirds of the build time in some cases, so this can be a big time-saver for teams with many tests. (If you just want to run things in parallel in a single container, there are some options for that described in a comment above.)

    Splitting up your tests into batches can be done in a variety of ways, either manually by configuring multiple test suites or some test runners support methods of automatic batching. Pipelines provides $BITBUCKET_PARALLEL_STEP and $BITBUCKET_PARALLEL_STEP_COUNT environment variables to facilitate the latter.

    The main limitations of the feature today is that you can't declare artifacts in parallel steps (yet - more on that below), and you are limited to a maximum of 10 steps in total for your pipeline, just as you were before.

    We're continuing to work on this feature after release and will be adding support for artifact generation in parallel steps and tidying up a few rough edges around the UI. Since there's a specific ticket open for artifact support (#15843), and only a subset of people will need it, I'll close this one off in favour of that.

    Please watch for #15843 if you're interested in updates regarding artifacts, and raise any other enhancement requests as new tickets.

    Thanks,
    Matt

    Edit: oops, I had the wrong env vars above. Corrected and linked to the docs.

  12. Nico Bijl

    Awesome, using it now. Just one more question, on the pipeline detail page, the run time is shown. Is this the time of each step added up? also when using parallel? (it looks like it..)

    Screen Shot 2018-03-28 at 09.23.38.png

    Maybe better if this was the time of the longest running step?

  13. Raul Gomis staff

    Hi Nico,

    Thanks for the feedback. We have introduced a couple of changes to display the correct time and clarify the difference between:

    • build duration: how long it takes for the build to run.
    • build minutes used: how many minutes or seconds were consumed from your total billing amount.

    changes.png

    Regards,

    Raul

  14. Former user Account Deleted

    We have two "chains" of steps that we would like to run in parallel, but the steps within these chains should be ran sequentially. Unfortunately the steps wihtin one of those chains use different images.

    The current implementation does not support this use case. Any plans to support this in the near future?

  15. Matt Ryall

    @whyarno - have you considered using docker run to run your containers in order in one of the parallel steps? That seems like it should work for this situation.

    If you'd like to see something more specific designed into Pipelines for this case, please raise a new ticket with some details about how and why your build process works this way and we'll consider it for the future.

  16. Log in to comment