Unblob the repository
Hi Philip,
At my end, pulling the KMA repo takes close to an hour . This is part BitBucket, part Africa, but also … it’s a whopping 200MB!
This is mainly due to this bunch in the history:
ResFinder.fsa | b52ec5ed (2.0 MB), 176114ba (2.0 MB)
all_databases.fsa | 23a43bee (1.9 MB), 5555003d (1.9 MB)
beta-lactamase.fsa | 4804d572 (1.1 MB), 6d61d37d (1.2 MB)
bl2seq | 3f9309bb (12.0 MB)
blast_formatter | 5cb1200a (37.2 MB)
blastall | 9419f510 (12.0 MB)
blastclust | 92ba65ce (10.8 MB)
blastdb_aliastool | 23fc2d58 (23.2 MB)
blastdbcheck | cd4b8a28 (26.3 MB)
blastdbcmd | 1ea89d02 (33.0 MB)
blastn | b10ac9f9 (37.2 MB)
blastp | 56cc7479 (37.2 MB)
blastpgp | b4ba1852 (11.2 MB)
blastx | ba3894e4 (37.2 MB)
convert2blastmask | 0433f0c5 (24.9 MB)
There’s a tool BFG Repo-Cleaner which does a good job at the cleaning. It doesn’t kill history, but of course does need to rewrite all commits since the removal of the files. (It add a line Former-commit-id: 04355e40af35119f06c5675e7a61e1f7fa00629a
so you could still trace every commit back to old repository copies.)
# Download BFG
wget 'https://repo1.maven.org/maven2/com/madgag/bfg/1.13.0/bfg-1.13.0.jar'
# Mirror clone the repository
git clone --mirror git@bitbucket.org/genomicepidemiology/kma.git
# Clean out all blobs over 1MB
java -jar bfg-1.13.0.jar --strip-blobs-bigger-than 1M kma
# Pack the repository
cd kma
git reflog expire --expire=now --all && git gc --prune=now --aggressive
# Push back to BitBucket (this will update all branches)
git push
After the push, best to tell people to reclone the repository. But that’s a breeze, because it’s down to … 2.7MB
Comments (2)
-
-
reporter - changed status to resolved
Yay, that fixed it!
(BTW bitbucket is still horridly slow compared to GitHub, maybe they don't have proxies near here.)
Thanks, Marco
- Log in to comment
Hi Marco
There was a colleague that made an extra branch by mistake with ResFinder and all its dependencies, which is what you pulled out as major contributors to the exploding size.
I have just deleted this branch, and the repo with all its history is down to 4.8 MB again.
Best,
Philip