Pipelines caches with node does not refresh dependencies when a module is updated

Issue #15804 resolved
Hellmut Adolphs
created an issue

We started using the pipelines cache with our node builds recently. It definitely accelerates the builds, but we have found a problem when we update a module's version that is a dependency for the actual build that is caching the node_modules.

Basically, if you have a dependency that gets updated to a newer version (for example an internal module hosted in a private node registry). The cache does not update. Upon build and install it works because the local node_modules gets updated, but the cached version doesn't get updated... so the next step if you don't run "npm install" as well, it will download the un-updated version of the cache and cause issues (say for example on a deployment step). Ideally we wouldn't have to do npm install on every step...

So this issue is about seeing if its possible to make the cache update every time a a change in the node_modules directly is detected.

Thank you!

Comments (10)

  1. Matt Ryall staff

    Hi Hellmut,

    The cache is designed to speed up npm install, not to remove the need to run it. You still should run it at the start of your build, to make sure all your dependencies are up-to-date and handle the case where the cache does not exist.

    The Pipelines dependency cache is intentionally technology-agnostic. It takes a copy of a directory at the end of a successful build and uploads it to the cache. In subsequent builds, it downloads the cache into the same location before the build starts. Nothing about it is specific to Node, NPM or any other technology.

    Trying to build a technology-specific cache would be very hard and likely wouldn't work properly. For one thing, there are many ways to specify dependencies in node apps, or require additional ones, and our cache might not detect all of them and break your build. We would also need to keep it up to date as npm evolved or risk breaking your builds. For this reason, our cache is design to be very simple, just copying some files around to speed things up, and we expect builds to run npm install as the first step to get the correct dependencies.

    Hope this makes sense.

    Cheers,
    Matt

  2. Hellmut Adolphs reporter

    HI Matt,

    Thanks for the explanation that clarifies the intended functionality. However, it does limit the ability to use multiple steps in each branch (or custom build). Basically if I have a step "build and test" and run npm install in there it works... but the next step doesn't use the updated files, it uses the cached version (so unless you run npm install again the build will probably fail).. perhaps an option to trascend updated files in the same branch from step to step? Or simply keep updating the cache if its modified by a step? even in other building techs such as say maven, that would be useful no?

  3. Hellmut Adolphs reporter

    I am not sure wether I should re-open the issue to continue the thread... please let know, if not I'll re-open in 24 hours just in case... hopefully thats acceptable. Thanks

  4. Matt Ryall staff

    It's okay - we can keep the thread going here. I still get your replies.

    Artifacts are the way we've designed to pass state between steps in Pipelines. We looked at copying all files between steps by default, but because in many builds there are a lot of intermediate files (sometimes gigabytes) decided that would slow down things too much if we did it all the time.

    In a Node project, it often makes sense to keep the entire build directory the same across steps. This is how you can do that with artifacts:

    pipelines:
      default:
        - step:
             name: Build and test
             caches:
               - node
             script:
               - npm install
               - npm test
             artifacts:
               - '**'  # copy the entire build directory to subsequent steps
        - step:
             name: Deploy to test
             deployment: test
             script:
               - npm deploy test
    

    This saves running npm install on the later step, because the modules are all there, as well as whatever you built in the first step.

    If you just want modules passed, you could potentially make this faster by specifically passing the node_modules/ directory (or whatever else you need) through as an artifact.

    Does something like this address your use-case?

  5. Hellmut Adolphs reporter

    Hi Matt,

    Great, yeah we didn't know artifacts was intended for that. I think the reasoning was to generate artifacts with the intent of delivery / deployment (I guess as an analogy to Maven's nomenclature). Pretty cool! Yeah, with this approach we can pass over the node_modules directory across steps. This resolves the issue.

    Thank you! if anything comes up related to this I'll update the thread for future reference.

  6. Hellmut Adolphs reporter

    Quick question, I just noticed your example above does not include the "caches" entry in the follow up step... was this intended? we usually set it in every step so we can have the cache available to the next step... however I am guessing if we use artifacts then this is not necessary any more? For example:

    development:
          - step:
              name: build and unit tests
              caches:
                - node
              script:
                - npm install
                - npm test        
          - step:
              caches:
                - node
              name: integration tests
              script:
                - npm install
                - ./test/runITests.sh
              services:
                - localstack
                - redis
          - step:
              caches:
                - node
              name: UAT tests
              script:
                - npm install
                - ./test/runUATests.sh
          - step:
              caches:
                - node
              script:
                - npm install
                - if [ $PUBLISHING_ENABLED == 'true' ]; then cp .npmrc_config .npmrc; fi
                - if [ $PUBLISHING_ENABLED == 'true' ]; then git config --global user.email "youremail@somewhere.com"; fi
                - if [ $PUBLISHING_ENABLED == 'true' ]; then npm publish; fi
    

    As you can see we where referencing the cache from step to step and having to "refresh" it with npm install every time... I will try now modifying this to artifacts use.

  7. Matt Ryall staff

    That's right, no need to use caches if the artifacts are being passed. The artifacts should clobber the cached files, although I'm not 100% on that.

    There's also no need to split your build tasks across multiple steps if you're using the same container and build state. Multiple steps in Pipelines run parts of your build in different containers, which is meant to support different build environments, parallelism or deployment tracking. If you don't need any of those, including all the testing in the same step should be fine. Most of our Node repos at Atlassian only use a single step.

  8. Log in to comment