Commits

Anonymous committed 948b587

More structured evaluation text (headings + logical ordering).

Comments (0)

Files changed (1)

 
 <!--1d6 uses only the first 255 commits because git gc became unbearably slow.-->
 
-All data parsed together, except for Mercurial via dispatch() (because it offers no new information) and the aggressive garbage collection in git, because that is much too slow.: 
+
+### All data
+
+First we’ll look at all data parsed together, except for Mercurial via dispatch() (because it offers no new information) and the aggressive garbage collection in git, because that is much too slow.: 
 
 ![All data except git gc --aggressive](hg-vs-git-all-data-no-aggressive.png)
 
 The size shown in the legend shows the minimum size, which the repository has directly after a manual garbage collection and a size close to the maximum size. Since the data is not perfectly the same, the minimum size can deviate. I did not calculate the standard-deviation (for time reasons), so it can only be estimated. If the sizes differ by 3 MiB or less, treat them as equal. 
 
-The aggressive garbage collection in git (git gc --aggressive): 
+Then we add aggressive garbage collection in git (git gc --aggressive): 
 
 ![Mercurial vs. aggressive garbage collection in git](hg-com-vs-git-gc-aggressive.png)
 
 Note: The garbage collection steps created a mean load of 100% for 3 of my 4 cores. 
 
+To get a clearer overview, let’s check the cumulative data, ignoring all details.
+
+### Total time and size
+
 The total time of the commits as well as the repository sizes: 
 
 ![Total time for commits](total-time-spent-committing.png)
 ![Repository sizes with and without manual repack at the end](repository-size.png)
 
+For getting the bigger picture, the cumulative time for all commits for Mercurial, git with gc every 10 steps and git with gc every 100 steps - including fit-functions.
 
-The actions with less then 0.4 seconds runtime (only the range 0 seconds to 0.4 seconds is shown, though the garbage collection in git takes significantly longer. Thus the garbage collection is not visible here, showing only the fast operations: 
+![Mercurial vs. Git, cumulative + fit](hg-vs-git-g100-cumulative-fit.png)
+
+Now it’s time to look at some more specific questions: 
+
+### Minimum time for committing
+
+The actions with less then 0.4 seconds runtime (only the range 0 seconds to 0.4 seconds is shown, though the garbage collection in git takes significantly longer). This hides the time for garbage collection, showing only the fast operations: 
 
 ![Mercurial vs. Git when ignoring garbage collection (just the region below 0.4s)](hg-vs-git-ignore-gc.png)
 
-As contrast, Mercurial vs. git with no garbage collection on the full range: 
+### Maximum latency of operations (volatility)
+
+Now we check if we can rely on the program to leave no user waiting. For that we compare Mercurial with Git using garbage collection every 10, 100 or 1000 commits: 
+
+![Mercurial vs. Git with gc every 10, 100, 1000 commits](hg-com-vs-git-gc-10-100-1000.png)
+
+There are strange spikes for git without garbage collection, which should not be there, because there is specific program which should take time. Let’s investigate: Mercurial vs. git with no garbage collection on the full range: 
 
 ![Mercurial vs. Git when ignoring garbage collection (just the region below 0.4s)](hg-vs-git-no-gc.png)
 
+### Git with automatic garbage collection
 
+Since git offers automatic garbage collection, we look for that, too. Keep in mind, that with automatic garbage collection the final repository was bigger than with garbge collection every 1000 commits, growing up to 5× the size of the repository after grbage collection (I did not investigate that further).
 
 Mercurial vs. git with automatic garbage collection (git gc --auto called after each commit, which makes the actions of Mercurial and Git feature equal): 
 
 ![Mercurial vs. Git with gc --auto](hg-com-vs-git-auto.png)
 
-Mercurial and git with garbage collection activated every 10, 100 or 1000 commits: 
-
-![Mercurial vs. Git with gc every 10, 100, 1000 commits](hg-com-vs-git-gc-10-100-1000.png)
+### The case of small repositories
 
 *From here on, the Mercurial code via commit() uses a stronger locking mechanism, which makes it faster and gets the commit time closer to a constant time.*
 
-And only the first 300 commits of Mercurial and git: 
+If you look at the first few commits in the speed plots, you’ll notice that at the very beginning Mercurial seems to take only half the time of git. Let’s investigate: 
+
+Only the first 300 commits of Mercurial and git: 
 
 ![Mercurial vs Git, first 300 commits](hg-vs-git-first-300.png)
 
-For getting the bigger picture, the cumulative time for all commits for Mercurial, git with gc every 10 steps and git with gc every 100 steps - including fit-functions.
-
-![Mercurial vs. Git, cumulative + fit](hg-vs-git-g100-cumulative-fit.png)
+### Sidenote for Mercurial Syestem implementers
 
 Finally a comparision of Mercurial called via mercurial.commands.commit(ui, repo, message=str(msg), addremove=True, quiet=True, debug=False) and via mercurial.dispatch.dispatch(["commit", "-q", "-A", "-m", message]): 
 
 ![Mercurial commit() vs. Mercurial dispatch(['commit', …])](hg-com-vs-hg-dis.png)
 
-Images with ¹ are run in a second run, with the data from the first. 
+As you can see, you should really use mercurial.commands.commit() with explicit locking, because it has a clear performance advantage. 
+
+*Images with ¹ are run in a second run, with the data from the first.*
 
 
 Conclusion
 If you need reliable performance and space requirements, Mercurial is the better choice, especially when called directly via its API. Also for small repositories with up to about 200 commits, it is faster than git even without garbage collection. 
 
 
+## Pre-selected problem-domain?
+
+Someone in the git mailing list [complained](http://lists-archives.org/git/757545-benchmarks-regarding-git-s-gc.html), that the test was rigged to show that Mercurial is better. 
+
+The reality is, that the test is meant as a complement to the Bachlor Thesis [knittl2010][] linked in the intro. That thesis left out this usecase (wasn’t in the problem-space defined for the work), so I decided to test it myself. It struck me as the case where performance in the sub-second range actually counts.
+
+[knittl2010]: http://thehappy.de/~neo/dvcs.pdf "Analysis and Comparison of Distributed Version Control Systems"
+
+Sidenote: Out-of-band gc only works if the load is very different from constant which at least for the german-only page whose statistics I linked in the intro is not true. For a multilingual page it will likely be even closer to constant load. 
+
+[Another comment](http://lists-archives.org/git/757597-benchmarks-regarding-git-s-gc.html) was very constructive and simply true: Yes, in branchy history, Mercurial would have been a bit worse (mostly the size would have been bigger). And yes, not requiring garbage collection is a big advantage - for which Mercurial pays the price of a bit bigger repository sizes.
+
+PS: Also I now learned, that git might soon get pack format packv4, so I guess that git changes the repository format from time to time just like everyone else. I had been told a different story before, but that’s that. Good to hear that it evolves. Even though free software projects compete against each other for users, there’s the much more important competition against unfree software. And in that we stand side-by-side.