Clone wiki

cached_thg / Home


While thg is nice and functional and all, it clearly isn't the fastest app on earth. When you try to use it with large real-life repositories (dozens of thousands of files and commits) the performance issues are grave; try out the netbeans repo if you don't believe me.

This situation is rather unfortunate because thg is the only serious gui for Mercurial and believe me or not, there are lots of people that simply won't use anything that does not have reasonable gui app; and they only can laugh at me when they see thg trying to deal with 20k-revs, 20k-files repository :(

I'm not a GUI programmer and I don't know Python at all but I can hardly believe that no improvements could be made with the current code base. Also, the developer's statements like I'm not going to implement a disk cache... are not very promising...

So this fork is an attempt to identify the worst performance problems and fix or at least work around them by caching or whatever.


  • Make it fast enough to work smoothly with the netbeans repo (once data is cached).
  • Make it usable at my work so mercurial adoption won't be rejected because of the irresponsive GUI.
  • Treat it as a proof-of-concept so the thg devs can reconsider their attitude towards caching.


After some Investigation, two improvements have been implemented:

  • Persistent cache of status data (file lists) for repo browser, using the shelve python module. This is the worst performance bottleneck in the original thg, performed after every click on revision lists.
  • Repo browser no longer displays diffs automatically after clicking on a revision; the user must click a file on the list to see the file (diff); it saves quite a few clock ticks when browsing large ropositories.

The code is experimental and unstable.


I'm using the Netbeans repository for performance comparison as it is large enough (approching 200k commits and 80k files). I'm measuring time consumed in HgFileListModel.setContext() after each click on the revision list. Times in seconds, rounded to one place after the decimal point.

Stage 1

orig contains times for the original thg code. cache-miss and cache-hit where measured after implementing a simple in-memory result cache for the setContext() function. The cache-miss and cache-hit column names should be self-explaining.


[*] thg changeset used in comparison: 48de49efeb27

Stage 2

The second table contains results measured after implementing persistent cache and disabling diff/file content from being automatically displayed whenever the user selects different revision. Now


As you can see, I've managed to achieve reasonable response times (each below 0.05 sec) in case of cache hit. Cache-miss times are better than original and better then in Stage 1 (simple in-memory cache implemented as Pyhon set) because of the second optimization (not displaying diffs automatically).

[*] thg changeset used in comparison: 48de49efeb27

Stage 3

Changeset 9623ea3fda18 .

Restored the original behaviour that after clicking a revision a diff is automatically displayed. At the same time, implemented filedata result caching in I'm sure it can be done in a more efficient way because it is only a first diff from a given revision which is slow, when you click other files they are displayed pretty fast.

23ac0cec374b2.9 2.7 2.7 2.72.9 2.7 2.8 2.80.0
4a89e95416971.4 1.4 1.4 1.41.5 1.5 1.4 1.60.0
1c061c501b181.7 1.7 1.8 1.71.8 1.8 1.8 1.70.0
1ff2983273df1.9 1.7 1.7 1.71.8 1.7 1.7 1.70.0
0bdfbe13fe0e2.6 2.6 2.5 2.62.8 2.6 2.6 2.60.0
36069e69ffe32.9 2.9 2.9 2.93.1 2.9 2.9 2.80.0
85c49c66938a2.3 2.3 2.3 2.32.5 2.5 2.3 2.30.0
916c078e83971.8 1.8 1.7 1.71.9 1.8 1.7 2.00.0
e4c9e21222f71.4 1.3 1.3 1.31.3 1.3 1.4 1.30.0
d90c6e4868ea1.8 1.7 1.7 1.81.8 1.7 1.8 1.70.0
150ea9e4cce91.9 1.8 1.5 1.71.9 1.9 1.9 1.70.0

Measured times in setContext() [] when clicking on those consecutive revisions in the repo browser. Cache-hit times were always below 50 ms and often below 10 ms.

[*] thg changeset used in comparison: 60b9b8b8143f


  • Tested only on Linux, hard-coded path to cache file relative to the $HOME directory
  • I use python shelve module, all its limitation applies.
  • Some debugging messages may be printed on stdout
  • There are some open issues to resolve...

To Do

  • Cache persistency so it can be re-used (try out the shelve module). [done]
  • Concurrent access to persistent on-disk cache.
  • Different cache files for different repos so they can be accessed concurrently.
  • Cache retention to free up disk space.
  • Pre-cache some commits to improve first time experience.
  • Cache file diffs to display them quickly. [done]
  • Check if shelve (bdb) scales when the amount of cached data grows...

Trying out

My changes are comitted on the branch named cache, so you can safely pull it into your normal repo, try out and strip if you don't like it.

hg pull
hg update cache

Of course you can also clone it or get by any other means provided by bitbucket. Just remember to update to my branch

hg clone
hg update cache