Refresh caches when dependencies are updated

Issue #16314 open
Linette Voller created an issue

Caches currently live for 7 days, and are only invalidated at the end of the 7 days. Updates to dependencies do not cause the cache to be refreshed which means that builds can sometimes use old versions of dependencies, leading to downstream errors.

Ideally, caches will be refreshed when build dependencies are updated.

Comments (30)

  1. Aneita Yang staff
    • changed status to open

    Thanks for reaching out and for the suggestion - I can see why this is a desirable piece of functionality. Unfortunately, we don't have any plans to work on this right now as we're working on higher priority features, but I'll leave this issue open for future consideration.

  2. Matthew Lindsey

    This should probably be a higher priority. This makes your caching feature far more useful, versus something that's a time-gated antipattern in it's current inception.

    Also... @brianmckenna is probably the most right-wing man I know.

  3. Shmuel Gutman

    Would be very useful for me also. Currently, I need to remember to clear the cache manually whenever I'm updating the dependencies (pakcage.json in my case).

  4. Caleb Scholze

    Our team would really like to see this feature implemented. Not just to invalidate, but also view the cache contents. We use Node.js with node_module caching. We recently updated our version of Node.js, and the inability to invalidate the caches has made it extremely difficult to update and rebuild dependencies for the new runtime.

  5. Andrew Simpson

    Rather than a complex feature to work out when the cache needs to be invalidated, it seems to me that all we need is a more sophisticated way of specifying cache keys. If there was a way to include e.g. the hash of a file, or the name of branch, or the value of an environment variable in the cache key, it'd work out fairly simply.

    I'd suggest looking at what CircleCI does with cache keys and liberally crib from that.

  6. jaye.westen

    it would be really useful as a function, so that at least the cache will automatically empty and free up space without having to do it ourselves because we sometimes forget.

    ShowBox Tutuapp Mobdro

  7. Julien Falque

    This is indeed a much needed feature. Having cache never refreshed kind of defeats its purpose: using out-of-date Docker layers requires to rebuild images; using out-of-date /node_modules requires to run npm install or npm ci anyway; and so one.

    As Andrew Simpson said, having a sophisticated way to use cache keys might be the best solution, e.g. using the checksum of /package-lock.json to have different cache versions of the /node_modules directory. Docker layers should be handled differently maybe, as updating a Dockerfile does not mean all cached layers won't be used anymore.

  8. Alexander Bolodurin

    This is a major issue, rather than a minor enhancement. The current behaviour is completely unexpected and what I would consider broken.

  9. Adam Barnwell

    I agree Alexander. If it helps, I used the artefact feature to cache my node_modules. It gives a speed bump between steps but obviously not between build jobs.

  10. Volodymyr Pokropyvnyi

    I agree with Alexander! I thought that this is obvious that the cache for bundler/yarn will be updated automatically.

  11. Dmitrii Arnautov

    Agree with Alexander! Currently, we have to run npm ci for each step which relies on node_modules. It significantly slows down the build.

  12. Nicolay Hvidsten

    In my opinion the way the caches work today is counter-intuitive and should not be advertised as a feature. This will most likely lead to many confused users whom add new packages to their projects only to have build errors saying the packages do not exist.

  13. Ledion Bitincka

    It would be great if there was an option that would allow users to define if the cache should be update at the end of each run - e.g. a step that would exit with 0/1 to indicate update is necessary or the like.

    It is really surprising that this issue is almost a year old, with tons of requests and still sitting at minor priority!!!

  14. Marek Zielonkowski

    I don’t agree that current state of cache feature significantly slows down the build, or it works counter-intuitive. It is very helpful feature, but when your project is enough mature, and developers just stop (have no need) to add new libraries from npm registry. The disadvantage is that it is missing “stamping” the current state of the cache based on the content of the “dependencies” sections from package.json. Every change in those sections (devDependencies, and so on) should be detected by the pipeline “engine” which as a result should delete caches allowing npm or yarn download fresh dependencies from network.

  15. Łukasz Romanowicz

    Marek, it should be detected but it is not.

    image: node:10.15
    
    pipelines:
      default:
        - step:
            name: Build
            caches:
              - node-modules
            script:
              - yarn
              - yarn lint
              - yarn build
        - step:
            name: Unit Tests
            caches:
              - node-modules
            script:
              - yarn test:unit
    
    definitions:
      caches:
        node-modules: ./node_modules
    

    I was not able to find a way to invalidate/update the cache after running yarn and it’s not detecting the changes automatically. The only workaround is to drop the cache manually.

  16. Alexander Bolodurin

    npm et al are not the only type of cache you can have.

    For C++ builds you normally use ccache which would update your build object cache any time a source file changes, which you’d want to persist it after each build, otherwise your build speed would degrade over time after the cache is created.

  17. Ledion Bitincka

    Given that reading/updating caches in remote storage should be a relatively inexpensive operation (as compared to generating their content), an option to always update the cache after a successful run might be even simpler. Obviously, someone could implement this inside the pipeline by manually managing the cache … or use some other CI platform :)

  18. Nick Romito

    This seems textbook s/bug/feature/g. Ledion’s suggestion seems pretty straightforward, and I’d imagine, easy to implement from bitbucket’s side of things. Strictly time-based invalidation of a dependency cache simply doesn’t make sense. This implementation is also only surfaced to users 3/4th of the way down the docs page. I can’t imagine how much engineering time has been wasted running into this.

  19. Cory Robinson

    This is ridiculous - I expect a production solution like Pipelines to support this feature. I just wasted hours to finally discover it’s a simple, easy fix, problem that the Pipelines team is not giving attention to! 😠

  20. Juri Hahn

    As others have pointed out, this is a must-have feature to really compete with services like CircleCI or SemaphoreCI which both provide a way to define cache keys to deal with stale caches/cache invalidation. Ideally, it would be possible to do the following:

    image: ruby:2.6.3
    options:
      max-time: 15
    pipelines:
      default:
        - step:
            name: Build
            caches:
              - bundler
              - node-modules
            services:
              - mysql
              - redis
            script:
              - export RAILS_ENV=test
              - export BUNDLE_PATH=vendor/bundle
              - curl -sSL https://deb.nodesource.com/setup_10.x | bash -
              - curl -sSL https://dl.yarnpkg.com/debian/pubkey.gpg | apt-key add -
              - echo "deb https://dl.yarnpkg.com/debian/ stable main" | tee /etc/apt/sources.list.d/yarn.list
              - apt-get update && apt-get install --assume-yes --no-install-recommends default-libmysqlclient-dev nodejs yarn
              - yarn install --frozen-lockfile --no-emoji --prefer-offline --ignore-scripts --no-progress --silent
              - gem install bundler -N -v "`tail -n1 Gemfile.lock`"
              - bin/bundle check || bin/bundle install --jobs=2 --frozen
        - parallel:
            - step:
                name: RSpec MySQL
                caches:
                  - bundler
                  - node-modules
                services:
                  - mysql
                  - redis
                script:
                  - # 
                  - bundle exec rspec
            - step:
                name: RSpec PostgreSQL
                caches:
                  - bundler
                  - node-modules
                services:
                  - psql
                  - redis
                script:
                  - # 
                  - bundle exec rspec
            - step:
                name: RuboCop
                caches:
                  - bundler
                script:
                  - bundle exec rubocop
            - step:
                name: Bundler Audit
                caches: 
                  - bundler
                script: 
                  - bundle exec bundle-audit update
                  - bundle exec bundle-audit check  
    definitions:
      caches:
        bundler: 
          path: vendor/bundle
          keys:
            - myapp-ruby-v{{ checksum ".ruby-version" }}-{{ .Branch }}-bundler-{{ checksum "Gemfile.lock" }}
            - myapp-ruby-v{{ checksum ".ruby-version" }}-{{ .Branch }}
            - myapp-ruby-v{{ checksum ".ruby-version" }}
        node-modules: 
          path: node_modules
          keys:
            - myapp-{{ .Branch }}-yarn-v{{ checksum "yarn.lock" }}
            - myapp-{{ .Branch }}
      services:
        mysql:
          image: mysql:5.7
          environment:
            MYSQL_RANDOM_ROOT_PASSWORD: 'yes'
            MYSQL_USER: 'app'
            MYSQL_PASSWORD: 'password'
            MYSQL_DATABASE: 'app_test'
        psql:
          image: postgres:11
          environment:
            POSTGRES_USER: 'app'
            POSTGRES_PASSWORD: 'password'
            POSTGRES_DB: 'app_test'
        redis:
          image: redis:3.2
    

    This would be an example config for a Ruby (on Rails) app. It would work the following way. For cache restore (Build setup phase), the keys are checked in the order they're defined, so top to bottom. Moreover, each key is templated providing access to things like branch name and functions like checksum on a file. Additionally, the key is not used as an exact identifier but more like a regex or string match. The idea is that if there's no cache with a key matching myapp-ruby-vf860323f95b110e5af69b3754d006f39304390a0-master-bundler-e3f55ebaf21e8ead9db92154df2707dfbf207702 then the next key will be used to find the most recent cache with a key matching myapp-ruby-vf860323f95b110e5af69b3754d006f39304390a0-master and so on.

    Upon build teardown, the cache is uploaded using the first key from `keys` (myapp-ruby-v2.6.3-master-bundler-e3f55ebaf21e8ead9db92154df2707dfbf207702).

    As you can see, this allows running multiple steps in parallel for testing, linting, etc but there’s only 1 build step. This not only speeds up the entire pipeline but also saves money by saving build minutes. Frankly, the current caching mechanism is rubbish. Furthermore, I’m not convinced that you’ll find a bulletproof way to fix caching. Thus, if consumers can define their own (templated!) cache keys as per above example, most if not all caching problems could be solved.

  21. David Whitlark

    If we could only update the cache at the end of a successful build that would be so so so much better than the current behavior.

    I had to create my own cache in s3 and the build times now consistently stay lower.

    Standard pipeline caches really aren’t useful for a codebase that has a lot of change.

  22. Yonn Trimoreau

    This is not an enhancement, but an issue… And since we’re all paying for build minutes, it’s a pretty costly one.

  23. Rachael Ludwick

    Yep! I sometimes when we change Gemfiles and similar will intentionally delete the docker cache (for example) to ensure that subsequent builds by other team members don’t have an extra 5 minutes in them (package repository systems can be slow to install packages!) But if I forget to do that, every team member’s builds are taking that extra time on every build until the end of the week.

  24. Log in to comment