1. Bitbucket Website
  2. Public Issue Tracker
  3. master

Issues

Issue #3843 open

LargeFiles support (BB-3903)

David Vega
created an issue

It would be nice to see support for the LargeFiles extension with Mercurial 2.1+, even if it is only for paid plans

Comments (76)

  1. codesparkle

    Thanks for considering this! It would help a lot with tracking documentation such as pptx, docx and png's (project logos etc.) - sure, there is dedicated software for file tracking, but working in a familiar Mercurial environment with project documentation and having precise control over what I commit (and *when* I commit it) would be fabulous. Alternatively, the possibility of attaching such files to bitbucket wiki entries would be helpful (limiting file size, of course).

  2. ALIENQuake

    Great extension that can be very useful when dealing with binary files. Note that even if people would want to store for eg: 10-20 GB of binary files, it doesn't necessary mean that they would download all of those data 5 times a day.

  3. Inverness

    Has there been any progress made on this? It's a bit irritating to have to create zip files containing my binaries at different revisions and sticking them in the downloads section. That's just taking up more memory than it would normally. It would be much easier to just manage them in the repository.

    I'm surprised that support hasn't been added after so long? Is there a reason for this?

  4. Andy Savage

    I would be perfectly happy to pay for this as an addon. It's a specific reason that I would upgrade to a paid plan from the free one.

    Do you guys not want my money?? :(

  5. Garnet Chaney

    I am having problems getting jenkins to pull from a 800MB BitBucket repo that has 100 changes to each of two different 2MB files. hg 2.2.3 is stalling on the clone from BitBucket, using just 23 seconds of processing time in a whole hour, and the clone is still not finished! A clone from a local copy of the same repo (that was pushed to BitBucket) completes in 30 seconds:

    ... files: 109/109 chunks (100.00%) added 181 changesets with 764 changes to 109 files updating the branch cache

    real 0m26.255s user 0m22.406s sys 0m2.297s

    Having CI get stalled for hours on the clone each time a build is needed is not acceptable. For now I've removed the two binaries, making a new repository, and I am no longer running into this problem. But it would be very easy to run into this issue again if we we had assets like videos or audio to track and merge into our project.

    What workaround does Atlassian suggest for us to use for this situation instead of largefiles extension?

    P.S. I would be happy to provide a copy of my original repo with the binaries to Atlassian for one of your engineers to experiment with to come up with a suggestion that would work better with your service.

  6. Ashwin Nanjappa

    It would be great to have this feature. I was surprised to find out its not supported. It is useful for all kind of image assets used for GUI programs, which may actually not change that much over time.

  7. Bogdan Mart

    Hello Garnet Chaney, I don't know why Jenkins is doing Clone every time, but it's seems logical that it's long pricess due to high bandwidth.

    I have used Team City for CI before and it wasn't making clone every time, it's Just pulled onto local repo on CI Agent machine.

    Try configuring your CI to pull instead of clone every time.

    PS consider using subrepo with subversion for sutch files.

  8. Garnet Chaney

    If the HG (hg-configs) used by Jenkins is set to debug mode, Jenkins gets confused by the output and thinks it needs to do a clone every time. I have since changed my mercurial back to non-debug.... I think having a subrepo for the binaries, if ever pulled by jenkins, would cause the same problem again.

  9. Michael McFarland

    we could really use this feature for our game development projects. It would be especially useful if the central server could be located off of bitbucket. It would avoid having you guys host large files, as well as provide a means to have more local serving of large binary files when requested. It would be fine for our use cases if it was paid only, and off bitbucket hosting of binary files was required.

  10. Rory Plaire

    We want a "single checkout, single command" build process, and to do this we want to put all dependencies and prerequisites in the repository. Some of these are big binary files. Adding those without largefiles really slows things down.

  11. dnm

    Like the many others before me in the comments above, please add me to the list of users who would very much like to see the largefiles extension supported by Bitbucket. In particular I'd like to be able to store, track, and manage large binary assets (e.g. images, compressed data files, bitfiles [some that take several hours to build], etc.) all within the same Mercurial setup, nicely hosted on Bitbucket, without having to resort to additional one-off hacks. Thanks!

  12. Florian Zinggeler

    Please add support for large files soon. I'd like to have my game project on Bitbucket, but as it does not support large files yet, I would feel like it would be rather impolite to upload them anyway.

  13. Kevin Thompson

    Any new status updates on largefiles support? Has this even been brought up in any developer meetings? If not, will it ever be discussed?

    It would be much appreciated even just to hear that it is still on the backlog feature list and that someday someone somewhere might give it consideration.

    In other words, "what's the latest info for largefiles support?" Thanks.

  14. Friedrich Kastner-Masilko

    As already commented, it would be relatively easy to implement a bitbucketstore in largefiles extension, so that you can use the download sector of each project as a server-side cache. Nothing on BitBucket's side would need a change, but I doubt that they will just sit there and watch you upload GBs of "downloads" to the account without questioning the fair use behind it.

    That's basically why I've asked if the Bitbucket team would even allow such an approach at all. Seems like the lack of an answer IS already the answer here...

  15. Friedrich Kastner-Masilko

    BB would not have to do anything more but officially state here that the download section can generally be used for largefiles objects. If this is the case, it is possible to provide a modified largefiles extension that just uses that as remote store. If they do not do this, or state that it is not allowed, nobody will invest time. Unfortunately, it seems like they are not really interested in giving feedback to this discussion.

  16. Peter Val Preda

    What is unfortunate is that if bitbucket does not get interested into looking at issues such as large file support, then other file service that seem to "get" the job done will render bitbucket obsolete.

  17. Kevin Thompson

    You would think that it could at least be made an available option for paying subscribers. It would certainly be worth it to me, and would still be a much better deal than going with that "other" file service. :o)

  18. Tasgall

    I actually started my current project with the assumption that bitbucket supports this... I'm surprised it doesn't. Hopefully this project is small enough that I don't have to move to hosting on my own server.

  19. coyotte508

    I'm considering switching to bitbucket from github, some features like the ability to upvote issues make me want to. But I'd like large filte storage to be supported on here as well, like the exisiting git-annex.

  20. Taron Millet

    So close... I don't really want to switch to Git, now that finally have trained teammates to use Mercurial. Is there any hope for LargeFiles for Mercurial users?

  21. Valentin Kantchev

    Mercurial support has been neglected by Atlassian for quite some time now. Too sad... Atlassian seems to be on it's way to become the arrogant greedy corporation that only cares about profit, and ignores the very users that helped grow this product in first place.

  22. Arne Babenhauserheide

    Valentin W: Note that it’s not Atlassian which got big with Mercurial. Bitbucket got big with Mercurial and was then bought by Atlassian. Also Atlassian is still spreading lies about Mercurial in the Atlassian blog by hosting a guest entry by a git zealot which is filled with factual errors, some even disproven in the examples in the article. Despite being called out on that in public, they did not even see the need to add a note to that guest entry about misunderstanding by the author.

    I asked the Atlassian marketing team personally several times to correct this. I know they read it, because people I used to collaborate with work at the BitBucket Mercurial support.

    Dear BitBucket, this is where you could be: Virtuos Games uses BitTorrentSync with Mercurial for game development with decentralized large asset storage.

    I guess they show that there is room for a Mercurial hosting company. I‘m sorry for the great Mercurial developers working at Atlassian to improve Mercurial support. I know you’re doing great work and I hope you will prove me wrong on this. But from the outside it seems like you’re being used to hide hostility by the parent company against the core part of their own product. “…we decided to collaborate with GitHub on building a standard for large file support” — seriously? There is already a standard for large file support which has been part of Mercurial core since 2011, and works almost seamlessly. It just needs support from BitBucket to be easier for BitBucket users.

    This crazyness is a new spin on never trust a company: never ever trust a zealot with a tool which helps “the other side”: They are prone to even put zeal over business. For everyone at BitBucket: If this isn’t a wakeup call, I don’t know what is.

  23. Sean Farley staff

    Arne Babenhauserheide Heh, I was actually working on a response before I got distracted by the deploy :-)

    I've already been thinking about how to integrate Mercurial's largefiles support since I've started working here, especially now that we've publicly announced LFS support.

    One of the main show stoppers for Mercurial's largefile extension is that it is designed to push the largefiles to the same server as the changesets. This is no good for us since we want to avoid that transfer cost entirely. I've addressed these limitations on the Mercurial mailing list already:

    http://markmail.org/message/7j4ohopxspcnpnhw

    I've also gone ahead and volunteered to be a Google Summer of Code mentor:

    https://www.mercurial-scm.org/wiki/SummerOfCode/Ideas2016#Allow_largefiles_to_be_at_a_different_location

    If no one volunteers for the project, then I'll queue it up to my already long backlog. As for the Virtuos Games company, you heard about it the same time I did. I really wish they would have contacted the Mercurial community so we could have worked together :-(

  24. Arne Babenhauserheide

    Sean Farley: I know that you’re doing great work here — that we can set a repository as non-publishing has been a great step forward towards enabling the features Mercurial provides easily which are missing in Git (though I did not use those for collaboration yet).

    Would the Mercurial large files be stored on BitBucket similar to the git large files? Would people need to update Mercurial for it to work?

    Your post to the mailing list sounds like largefiles at BitBucket would require the most up to date Mercurial — which is a blocker for many people I know (we have to be able to run 3 years old Mercurial, because that’s what’s on the cluster). So it would be very useful, if BitBucket could provide regular large files support as it existed in 2013.

    Or rather, practically put, the instructions to use largefiles should be at least as simple as those for git lfs. People in this thread are willing to pay for largefile support.

    Is there a chance that BitBucket would hire more people to improve Mercurial support and reduce your backlog?

    And could you give your Marketing team another prod? That deceptive Git vs. Mercurial article in the Atlassian Blogs is still unchanged, almost 4 years after its significant factual errors were called out.

  25. Ian Cervantez

    Arne Babenhauserheide: The last edit on the Largefiles Extension page on the Mercurial wiki was on 2013-09-01. In fact, the original page's creation was on 2011-10-18. The Largefiles Extension shipped with Mercurial 2.0, which released on 2011-11-01

    It's easy enough to see if your Mercurial version is larger than 2.0, but if it's only 3 years old you should be okay. The use of the Mercurial large files options is just as easy as the github large files (I believe that github's LF support was inspired by Mercurial's). You can learn more from https://www.mercurial-scm.org/wiki/LargefilesExtension but suffice it to say that either passing a --large flag to your hg add command or by setting the largefiles.size or largefiles.patterns config option for your repo to automatically use the largefiles plugin.

    Unlike git, largefile support is baked into Mercurial - so it should be easier to adopt and less "hacky".

  26. Sean Farley staff

    Responding to both Arne Magnus Bakke and Ian Cervantez in this reply:

    Would the Mercurial large files be stored on BitBucket similar to the git large files?

    Neither Git nor Mercurial large files will ever be on our backend. They will be hosted somewhere in The Cloud™. This is by design by both us and Github so that our servers never have to incur the cost of that huge transfer.

    Would people need to update Mercurial for it to work?

    Yep.

    It's easy enough to see if your Mercurial version is larger than 2.0, but if it's only 3 years old you should be okay. The use of the Mercurial large files options is just as easy as the github large files ... Unlike git, largefile support is baked into Mercurial - so it should be easier to adopt and less "hacky".

    As I mentioned before, the current implementation of Mercurial largefile support won't work for Bitbucket because that hardcodes the url for the largefile to be the same as the path for the code (in this case, our core Bitbucket servers). This design makes it impossible for us to offload that work (and bandwidth cost) to a dedicated datacenter for hosting large files.

    Your post to the mailing list sounds like largefiles at BitBucket would require the most up to date Mercurial — which is a blocker for many people I know (we have to be able to run 3 years old Mercurial, because that’s what’s on the cluster).

    I, too, have been the admin of computing clusters. And I, too, have been a user of some clusters. Even on the most painful of painful powerpc and aix machines, I could always update to the newest Mercurial since at its core, you just need a working libc.

    So it would be very useful, if BitBucket could provide regular large files support as it existed in 2013.

    Sure, no problem. Let me hop in my time machine ;-)

    Or rather, practically put, the instructions to use largefiles should be at least as simple as those for git lfs.

    Don't worry, it will. For git lfs support, you need Go (or a precompiled binary) which is very hard on a cluster. So, in that regard, this will be easier. For Mercurial, you can pip install the newest version and you'll be good-to-go.

  27. Sean Farley staff

    Daniele Benegiamo No problem :-) The largefiles project idea I had was accepted for this year's Summer of Code and, on top of that, it looks like there is a student willing to work on it! Here's to hoping there is some fruit to that labor ^_^

  28. Arne Babenhauserheide

    Sean Farley If you’re an admin to the cluster, it’s no problem to update. At my institute, however, we struggle with not that responsive admins. Would it be an option to restrict the changes to the largefiles extension and provide this also as separate extension, which makes it easy for users to add BitBucket largefiles support when they can’t update Mercurial itself?

    Yay GSoC!

    Re git lfs install: Yikes! Somehow that wasn’t mentioned very prominently on the git lfs page <:)

    zdm: If GSoC works out, that would be 6 months. It feels like this is finally moving. Sean Farley already added support for changeset obsolescence, which allows using Mutable-HG with BitBucket — you can safely and collaboratively rewrite history using BitBucket as backend. I think that shows that Sean Farley can push this forwards.

  29. Sean Farley staff

    When I was using BlueGene/P (or Q), I would just install Mercurial in my home directory. This is mainly due to it being difficult to separate out a core extension. We'll see how this summer code goes, though.

    Also, I find it hard to argue that clusters need largefiles but that's another discussion. For now, I would recommend installing it into your home directory. Anyways, time for more coffee and more coding :-)

  30. Log in to comment