Repo size seems to not get updated

Issue #7572 resolved
Dariusz Kordonski
created an issue

This repository initially was as big as the page says (96 MB), but then we pruned the big blobs from history and got it down to 2-3 MB. The page still shows 96, though. One reason may be that those files did not get pruned on BB side. FWIW a fresh clone is definitely only 2-3 MB.

Comments (5)

  1. Erik van Zijst staff

    You've answered your own question there ;)

    Once objects get dereferenced, git will keep them on disk for a while before they get deleted and so the repo's size on our disks won't change during that time.

  2. Dariusz Kordonski reporter

    FWIW it's been a couple of months, definitely more than 90 days. Regardless of what you do with repo objects internally, it seems to me that the size field should rather reflect the effective size of the repository (after a clone).

  3. Gaming Imperatrix

    I've got to say this is bothering me as well. I went back and filter-branched and did a lot of internal cleanup to weed out pdfs, docs, swfs, flas and other monstrously unreasonable files, with all sorts of fun gcing and fscking and heck I don't even really know what else, all while --force --aling my pushes. My local repository is now 800 KB and my online repository shows a download of 28.2 MB, which is very unsettling, like being haunted by the collective sum of your entire last year's dirty laundry in ghost form.

    Dariusz, I see your repository now shows 98 MB. Still having the same problem?

  4. Erik van Zijst staff

    @Gaming Imperatrix, as mentioned above, removing objects through filter-branch will not have an immediate result on a (Git) repo's total size, as Git holds on to these files in case they might be needed again.

    @Dariusz Kordonski, I have looked at this repo again and while your old objects are now long gone, this team seems to rebase a lot, which continuously introduces new dangling objects.

    In Git, dangling objects eventually end up as loose files on disk. Many loose object files are much less efficient in terms of space then pack files. This makes the repo balloon in size, even though it still consists of the exact same objects as it did while everything was still neatly packed.

    Now depending on the heuristics built into commands like git-gc and the age of the dangling objects, these things will get deleted. However, it's hard to predict exactly when.

    What complicates (Git) repo sizes further is not every pack file achieves the same level of compression. The efficiency of pack compression is based on the success Git has finding objects that are similar. This is a hard problem that requires a lot of resources and so not all packs are created equal. It's even possible that after running git repack your repo size increases.

    There are lots variables that affect (Git) repos sizes and so it's very hard (if not impossible), to give the "true" disk size of a repo.

    Hope this clarifies things a bit.

    @Dariusz Kordonski I have manually repacked your repo (which funnily enough made it even bigger) and then expired and removed all dangling (rebased) objects older than 5 days. With mostly pack files remaining, this has brought the size back down to the levels you are seeing locally. However, if the team's workflow doesn't change and you keep an eye on things, you can expect that to slowly grow again.

  5. michiel cornille

    Being new to bitbucket, I stumbled upon this thread trying to figure out how to free up some disk space in one of our repos.

    I used the filter-branch and force-push strategy

    We Currently have a 900+mb project because there is some really bad stuff in there (like 400 mb text files inside a .rar/zip)

    The above adds to the problem because we can't see if our filter-branch operations worked.

    It's not very easy to go to this file and delete it from the site either because there seems to be no easy way to search for this file and delete it. (it just downloads it) The reason we started looking for these is because team city just had a java heap overflow trying to fetch the repo, ll try to confirm that it's not because the number doesn't change, that the size doesn't....

    Anyway, it would be really nice to just have a feature in the site that tells you , "hey you bad bad person, your repo is over 100 mb, here are the top 10 huuge files in your repo, they take up to much space, wanna delete them?"

  6. Log in to comment