Thanks for considering this! It would help a lot with tracking documentation such as pptx, docx and png's (project logos etc.) - sure, there is dedicated software for file tracking, but working in a familiar Mercurial environment with project documentation and having precise control over what I commit (and *when* I commit it) would be fabulous. Alternatively, the possibility of attaching such files to bitbucket wiki entries would be helpful (limiting file size, of course).
Great extension that can be very useful when dealing with binary files. Note that even if people would want to store for eg: 10-20 GB of binary files, it doesn't necessary mean that they would download all of those data 5 times a day.
Has there been any progress made on this? It's a bit irritating to have to create zip files containing my binaries at different revisions and sticking them in the downloads section. That's just taking up more memory than it would normally. It would be much easier to just manage them in the repository.
I'm surprised that support hasn't been added after so long? Is there a reason for this?
I am having problems getting jenkins to pull from a 800MB BitBucket repo that has 100 changes to each of two different 2MB files. hg 2.2.3 is stalling on the clone from BitBucket, using just 23 seconds of processing time in a whole hour, and the clone is still not finished! A clone from a local copy of the same repo (that was pushed to BitBucket) completes in 30 seconds:
... files: 109/109 chunks (100.00%)
added 181 changesets with 764 changes to 109 files
updating the branch cache
Having CI get stalled for hours on the clone each time a build is needed is not acceptable. For now I've removed the two binaries, making a new repository, and I am no longer running into this problem. But it would be very easy to run into this issue again if we we had assets like videos or audio to track and merge into our project.
What workaround does Atlassian suggest for us to use for this situation instead of largefiles extension?
P.S. I would be happy to provide a copy of my original repo with the binaries to Atlassian for one of your engineers to experiment with to come up with a suggestion that would work better with your service.
It would be great to have this feature. I was surprised to find out its not supported. It is useful for all kind of image assets used for GUI programs, which may actually not change that much over time.
If the HG (hg-configs) used by Jenkins is set to debug mode, Jenkins gets confused by the output and thinks it needs to do a clone every time. I have since changed my mercurial back to non-debug.... I think having a subrepo for the binaries, if ever pulled by jenkins, would cause the same problem again.
we could really use this feature for our game development projects.
It would be especially useful if the central server could be located off of bitbucket. It would avoid having you guys host large files, as well as provide a means to have more local serving of large binary files when requested. It would be fine for our use cases if it was paid only, and off bitbucket hosting of binary files was required.
We want a "single checkout, single command" build process, and to do this we want to put all dependencies and prerequisites in the repository. Some of these are big binary files. Adding those without largefiles really slows things down.
Like the many others before me in the comments above, please add me to the list of users who would very much like to see the largefiles extension supported by Bitbucket. In particular I'd like to be able to store, track, and manage large binary assets (e.g. images, compressed data files, bitfiles [some that take several hours to build], etc.) all within the same Mercurial setup, nicely hosted on Bitbucket, without having to resort to additional one-off hacks. Thanks!
As already commented, it would be relatively easy to implement a bitbucketstore in largefiles extension, so that you can use the download sector of each project as a server-side cache. Nothing on BitBucket's side would need a change, but I doubt that they will just sit there and watch you upload GBs of "downloads" to the account without questioning the fair use behind it.
That's basically why I've asked if the Bitbucket team would even allow such an approach at all. Seems like the lack of an answer IS already the answer here...
BB would not have to do anything more but officially state here that the download section can generally be used for largefiles objects. If this is the case, it is possible to provide a modified largefiles extension that just uses that as remote store. If they do not do this, or state that it is not allowed, nobody will invest time. Unfortunately, it seems like they are not really interested in giving feedback to this discussion.
What is unfortunate is that if bitbucket does not get interested into looking at issues such as large file support, then other file service that seem to "get" the job done will render bitbucket obsolete.
You would think that it could at least be made an available option for paying subscribers. It would certainly be worth it to me, and would still be a much better deal than going with that "other" file service. :o)
Kiln is free for 1-2 users, see http://www.fogcreek.com/kiln/student-and-startup/ (sorry Atlassian but it has on your backlog for > 2-1/2 years and you're not especiallyshort of resources, really). Also, Kiln does Git & Hg with the same repo - seamless translations.
I actually started my current project with the assumption that bitbucket supports this... I'm surprised it doesn't. Hopefully this project is small enough that I don't have to move to hosting on my own server.
+1 on this, between Mercurial's built-in largefiles extension, Github's LFS, and https://github.com/lionheart/git-bigstore something should be done about Bitbucket's lack of support for large binary assets.
I'm considering switching to bitbucket from github, some features like the ability to upvote issues make me want to. But I'd like large filte storage to be supported on here as well, like the exisiting git-annex.
Mercurial support has been neglected by Atlassian for quite some time now. Too sad... Atlassian seems to be on it's way to become the arrogant greedy corporation that only cares about profit, and ignores the very users that helped grow this product in first place.
@Valentin W: Note that it’s not Atlassian which got big with Mercurial. Bitbucket got big with Mercurial and was then bought by Atlassian. Also Atlassian is still spreading lies about Mercurial in the Atlassian blog by hosting a guest entry by a git zealot which is filled with factual errors, some even disproven in the examples in the article. Despite being called out on that in public, they did not even see the need to add a note to that guest entry about misunderstanding by the author.
I asked the Atlassian marketing team personally several times to correct this. I know they read it, because people I used to collaborate with work at the BitBucket Mercurial support.
I guess they show that there is room for a Mercurial hosting company. I‘m sorry for the great Mercurial developers working at Atlassian to improve Mercurial support. I know you’re doing great work and I hope you will prove me wrong on this. But from the outside it seems like you’re being used to hide hostility by the parent company against the core part of their own product. “…we decided to collaborate with GitHub on building a standard for large file support” — seriously? There is already a standard for large file support which has been part of Mercurial core since 2011, and works almost seamlessly. It just needs support from BitBucket to be easier for BitBucket users.
This crazyness is a new spin on never trust a company: never ever trust a zealot with a tool which helps “the other side”: They are prone to even put zeal over business. For everyone at BitBucket: If this isn’t a wakeup call, I don’t know what is.
I've already been thinking about how to integrate Mercurial's largefiles support since I've started working here, especially now that we've publicly announced LFS support.
One of the main show stoppers for Mercurial's largefile extension is that it is designed to push the largefiles to the same server as the changesets. This is no good for us since we want to avoid that transfer cost entirely. I've addressed these limitations on the Mercurial mailing list already:
If no one volunteers for the project, then I'll queue it up to my already long backlog. As for the Virtuos Games company, you heard about it the same time I did. I really wish they would have contacted the Mercurial community so we could have worked together :-(
@Sean Farley: I know that you’re doing great work here — that we can set a repository as non-publishing has been a great step forward towards enabling the features Mercurial provides easily which are missing in Git (though I did not use those for collaboration yet).
Would the Mercurial large files be stored on BitBucket similar to the git large files? Would people need to update Mercurial for it to work?
Your post to the mailing list sounds like largefiles at BitBucket would require the most up to date Mercurial — which is a blocker for many people I know (we have to be able to run 3 years old Mercurial, because that’s what’s on the cluster). So it would be very useful, if BitBucket could provide regular large files support as it existed in 2013.
Or rather, practically put, the instructions to use largefiles should be at least as simple as those for git lfs. People in this thread are willing to pay for largefile support.
Is there a chance that BitBucket would hire more people to improve Mercurial support and reduce your backlog?
It's easy enough to see if your Mercurial version is larger than 2.0, but if it's only 3 years old you should be okay. The use of the Mercurial large files options is just as easy as the github large files (I believe that github's LF support was inspired by Mercurial's). You can learn more from https://www.mercurial-scm.org/wiki/LargefilesExtension but suffice it to say that either passing a --large flag to your hg add command or by setting the largefiles.size or largefiles.patterns config option for your repo to automatically use the largefiles plugin.
Unlike git, largefile support is baked into Mercurial - so it should be easier to adopt and less "hacky".
Would the Mercurial large files be stored on BitBucket similar to the git large files?
Neither Git nor Mercurial large files will ever be on our backend. They will be hosted somewhere in The Cloud™. This is by design by both us and Github so that our servers never have to incur the cost of that huge transfer.
Would people need to update Mercurial for it to work?
It's easy enough to see if your Mercurial version is larger than 2.0, but if it's only 3 years old you should be okay. The use of the Mercurial large files options is just as easy as the github large files
Unlike git, largefile support is baked into Mercurial - so it should be easier to adopt and less "hacky".
As I mentioned before, the current implementation of Mercurial largefile support won't work for Bitbucket because that hardcodes the url for the largefile to be the same as the path for the code (in this case, our core Bitbucket servers). This design makes it impossible for us to offload that work (and bandwidth cost) to a dedicated datacenter for hosting large files.
Your post to the mailing list sounds like largefiles at BitBucket would require the most up to date Mercurial — which is a blocker for many people I know (we have to be able to run 3 years old Mercurial, because that’s what’s on the cluster).
I, too, have been the admin of computing clusters. And I, too, have been a user of some clusters. Even on the most painful of painful powerpc and aix machines, I could always update to the newest Mercurial since at its core, you just need a working libc.
So it would be very useful, if BitBucket could provide regular large files support as it existed in 2013.
Sure, no problem. Let me hop in my time machine ;-)
Or rather, practically put, the instructions to use largefiles should be at least as simple as those for git lfs.
Don't worry, it will. For git lfs support, you need Go (or a precompiled binary) which is very hard on a cluster. So, in that regard, this will be easier. For Mercurial, you can pip install the newest version and you'll be good-to-go.
@Daniele Benegiamo No problem :-) The largefiles project idea I had was accepted for this year's Summer of Code and, on top of that, it looks like there is a student willing to work on it! Here's to hoping there is some fruit to that labor ^_^
@Sean Farley If you’re an admin to the cluster, it’s no problem to update. At my institute, however, we struggle with not that responsive admins. Would it be an option to restrict the changes to the largefiles extension and provide this also as separate extension, which makes it easy for users to add BitBucket largefiles support when they can’t update Mercurial itself?
Re git lfs install: Yikes! Somehow that wasn’t mentioned very prominently on the git lfs page <:)
@zdm: If GSoC works out, that would be 6 months. It feels like this is finally moving. @Sean Farley already added support for changeset obsolescence, which allows using Mutable-HG with BitBucket — you can safely and collaboratively rewrite history using BitBucket as backend. I think that shows that @Sean Farley can push this forwards.
When I was using BlueGene/P (or Q), I would just install Mercurial in my home directory. This is mainly due to it being difficult to separate out a core extension. We'll see how this summer code goes, though.
Also, I find it hard to argue that clusters need largefiles but that's another discussion. For now, I would recommend installing it into your home directory. Anyways, time for more coffee and more coding :-)
@Piotr Listkiewicz made a lot of progress this summer and I'm very proud of him :-) Unfortunately, we're in a place where I need to review and merge his code, as well as get the Bitbucket side into shape.
During the same time, @Facebook started their own implementation of LFS which is actually using the same protocol. Using this might be more ideal for Bitbucket since that would mean we'd only need one implementation on the server-side of things.
I'm hoping to have some time in the coming weeks to look into the lfs support that Matt added. I could use some help, though. If anyone wants to test, I suspect that pushing / pulling will work just fine. The Bitbucket UI might be a different story, though. What doesn't work exactly? What does work?
If anyone is willing to help test, that'd be great!
I'd be happy to give you my two cents if it doesn't affect anything else in my or my team's workflow. We'd use LF, though just lightly (having a few 50-100MB binaries here and there), but if it works (well) then we'll possibly move some more data to Bitbucket that is now stored elsewhere.