hg-vs-git-for-server-apps / Readme.txt

Mercurial vs. Git, performance for a server
===========================================

On a typical human workflow, the raw speed of a command doesn’t really matter, as long as it’s not too long, because the time the command takes to complete is far below the time for typing the commend — even if you just type “C-r com ENTER”. The highest typing speed ever measured is 216 words per minute¹², which equals to 1080 characters, or 56ms per keystroke. 

¹: http://www.owled.com/typing.html
²: More: http://en.wikipedia.org/wiki/Words_per_minute

Where performance of the fast commands really matters is when you implement a server which uses versiontracking for some task. Also a server has more options for getting higher performance than a normal user – for example the log can be cached. For this test I use these assumptions: 

* The load is mostly constant. For example the server could get accessed from multiple timezones. 
* The respective repos of Mercurial, Git, Freenet and rpg-1d6 are good examples for projects written in Python, C(++), Java and general media. 

### Method

As actions I’ll first test committing and repository size, because these are the areas, where the biggest changes from the general usecase are to be expected. 

The changes from the traditional test are: 

* For Mercurial: 
   - via call() (hg call)
   - Using the mercurial.dispatch.dispatch() function to call dispatch (hg dis)
* Git gets split into 4 usecases: 
   - Automatic garbage collection (git ag)
   - No garbage collecting (git ng)
   - Garbage collecting every 100 commits (git g100)
   - Garbage collecting every 1000 commits (git g1000)
   - Aggressive garbage collecting every 100 commits (git a100)

All of these 5 tests will be done with models corresponding to the 4 repository types. The model gets generated by simply getting the files changed in each commit. Then at each iteration a commit is chosen at random and a number of lines are appended to each file corresponding to the full size of the repo divided by the number of total changes to files (sum of the files in each commit). Just using the size of the repo incurs an error, because the individual changes will likely be bigger. This can still be checked later on. 

### Info

The files at each commit can be found by simply calling 
$ hg log --template "{files}\n"

The size via
$ du -hsc repo/*

From [knittl2010][] we know that the size of the files doesn’t have a noticeable effect on the performance of the system, so we can treat additions and changes as simple additions. 

[knittl2010]: http://thehappy.de/neo/dvcs.pdf "Analysis and Comparison of Distributed Version Control Systems"


* Mercurial has 12979 commits with a mean number of 2.19 files changed per commit. The maximum number of files changed per commit is 731. The full repository size is 14 MiB. That’s about 0.5 kiB per committed file. 

* Git has 27286 commits with a mean number of 4.02 files changed per commit. 
The maximum number of files changed in a commit is 324. 
The full repository size is 18 MiB. That’s about 0.17 kiB per committed file.

* Freenet has 19638 commits with a mean number of 2.47 files changed per commit. The maximum number of files changed in a commit is 1031. The full repository size is 15 MiB. That’s about 0.3 kiB per committed file.

* rpg-1d6 has 1090 commits with a mean number of 1.9 files changed per commit. The maximum number of files changed per commit is 27. The full size is about 630 MiB. That’s about 304 kiB per committed file.

The changed files for the profiles are are in hg-files.txt, git-files.txt, freenet-files.txt and 1d6-files.txt, respectively. 

The code will now load the list of files, shuffle it, do the commits (only appending the data: Use a random selection of lines of some of the code files³) and measure the times per commit, outputting them as simple list of newline seperated numbers. 

Afterwards check the size of the repositories. 

### Testcases

The testcases are: 

* Mercurial (hg call, hg dis, git ag, git ng, git g100, git g1000, git ag100)
* Git (hg call, hg dis, git ag, git ng, git g100, git g1000, git ag100)
* Freenet (hg call, hg dis, git ag, git ng, git g100, git g1000, git ag100)
* rpg-1d6 (hg call, hg dis, git ag, git ng, git g100, git g1000, git ag100)

Needed from the program side: 
* A setup function which creates a new testdir (name: test-time()), loads the profiles and sets up the basic repositories. 
* A profile-enacting function which takes the files-list, the size of the data to append per file and a target directory. 
* A time-tracking commit function for each testcase which just puts the result time on stdout. 

The code is [public on Bitbucket](https://bitbucket.org/ArneBab/hg-vs-git-for-server-apps).

³: cat ~/Quell/Programme/Mercurial/hg-stable/*/*.py git2/*/*.c git2/*/*.c ~/Quell/Programme/freenet/fred-staging-hg/src/freenet/*/*.java ~/ews/*/*.txt ~/ews/*.txt | shuf > random_lines.txt  
3.1 MiB from freenet, 892 kiB from 1d6, 1.7 MiB from Mercurial and 2,6 MiB from git. 


### Usage

Run the script

$ python run_test.py
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.