Files changed (1)
-All data parsed together, except for Mercurial via dispatch() (because it offers no new information) and the aggressive garbage collection in git, because that is much too slow.:
+First we’ll look at all data parsed together, except for Mercurial via dispatch() (because it offers no new information) and the aggressive garbage collection in git, because that is much too slow.:
The size shown in the legend shows the minimum size, which the repository has directly after a manual garbage collection and a size close to the maximum size. Since the data is not perfectly the same, the minimum size can deviate. I did not calculate the standard-deviation (for time reasons), so it can only be estimated. If the sizes differ by 3 MiB or less, treat them as equal.
+For getting the bigger picture, the cumulative time for all commits for Mercurial, git with gc every 10 steps and git with gc every 100 steps - including fit-functions.
-The actions with less then 0.4 seconds runtime (only the range 0 seconds to 0.4 seconds is shown, though the garbage collection in git takes significantly longer. Thus the garbage collection is not visible here, showing only the fast operations:
+The actions with less then 0.4 seconds runtime (only the range 0 seconds to 0.4 seconds is shown, though the garbage collection in git takes significantly longer). This hides the time for garbage collection, showing only the fast operations:
![Mercurial vs. Git when ignoring garbage collection (just the region below 0.4s)](hg-vs-git-ignore-gc.png)
+Now we check if we can rely on the program to leave no user waiting. For that we compare Mercurial with Git using garbage collection every 10, 100 or 1000 commits:
+There are strange spikes for git without garbage collection, which should not be there, because there is specific program which should take time. Let’s investigate: Mercurial vs. git with no garbage collection on the full range:
![Mercurial vs. Git when ignoring garbage collection (just the region below 0.4s)](hg-vs-git-no-gc.png)
+Since git offers automatic garbage collection, we look for that, too. Keep in mind, that with automatic garbage collection the final repository was bigger than with garbge collection every 1000 commits, growing up to 5× the size of the repository after grbage collection (I did not investigate that further).
Mercurial vs. git with automatic garbage collection (git gc --auto called after each commit, which makes the actions of Mercurial and Git feature equal):
*From here on, the Mercurial code via commit() uses a stronger locking mechanism, which makes it faster and gets the commit time closer to a constant time.*
+If you look at the first few commits in the speed plots, you’ll notice that at the very beginning Mercurial seems to take only half the time of git. Let’s investigate:
-For getting the bigger picture, the cumulative time for all commits for Mercurial, git with gc every 10 steps and git with gc every 100 steps - including fit-functions.
Finally a comparision of Mercurial called via mercurial.commands.commit(ui, repo, message=str(msg), addremove=True, quiet=True, debug=False) and via mercurial.dispatch.dispatch(["commit", "-q", "-A", "-m", message]):
+As you can see, you should really use mercurial.commands.commit() with explicit locking, because it has a clear performance advantage.
If you need reliable performance and space requirements, Mercurial is the better choice, especially when called directly via its API. Also for small repositories with up to about 200 commits, it is faster than git even without garbage collection.
+Someone in the git mailing list [complained](http://lists-archives.org/git/757545-benchmarks-regarding-git-s-gc.html), that the test was rigged to show that Mercurial is better.
+The reality is, that the test is meant as a complement to the Bachlor Thesis [knittl2010] linked in the intro. That thesis left out this usecase (wasn’t in the problem-space defined for the work), so I decided to test it myself. It struck me as the case where performance in the sub-second range actually counts.
+[knittl2010]: http://thehappy.de/~neo/dvcs.pdf "Analysis and Comparison of Distributed Version Control Systems"
+Sidenote: Out-of-band gc only works if the load is very different from constant which at least for the german-only page whose statistics I linked in the intro is not true. For a multilingual page it will likely be even closer to constant load.
+[Another comment](http://lists-archives.org/git/757597-benchmarks-regarding-git-s-gc.html) was very constructive and simply true: Yes, in branchy history, Mercurial would have been a bit worse (mostly the size would have been bigger). And yes, not requiring garbage collection is a big advantage - for which Mercurial pays the price of a bit bigger repository sizes.
+PS: Also I now learned, that git might soon get pack format packv4, so I guess that git changes the repository format from time to time just like everyone else. I had been told a different story before, but that’s that. Good to hear that it evolves. Even though free software projects compete against each other for users, there’s the much more important competition against unfree software. And in that we stand side-by-side.