As we transition to pipelines, I've been trying to sort out our docker caching so that it reliably caches (and to fix some actual bugs that had crept in but weren't terribly noticeable with our previous build system). I got it caching but then as I kept tweaking the Dockerfile, I noticed it stopped caching. First I checked locally with docker on my mac (fairly recent version of docker) that:
- if I built twice in a row, the second run found cache for all layers
- if I built once, then modified some file in our source tree, the cache was invalidated at the correct step (the on that adds most of the source)
I then confirmed that if I rebuilt my branch with that dockerfile multiple times in pipelines that it wasn't using cache. It showed that it was downloading the cache during the "Build setup" phase but then at the end it didn't upload any new cache and the layers were all rebuilt.
I then deleted the Docker cache in the web UI:
Then I ran the build twice again and it cached properly.
So it appears that the docker cache is very static -- that is, it isn't extracting all layers built by the current run of docker build and adding them to the existing cache. It only uploads any cache for future use if there was no cache to start.
Is this a correct description of how caching works? If so, is there any plan to fix this? That is not how docker caches when you run your own build server with its own docker daemon.
The major issue here is if a developer modifies the Dockerfile (even adding comments to a line can invalidate a step!) then the cache becomes useless, extending our build time. Not all developers are going to remember to go delete the cache manually when they merge such changes to master.