Overview

SvnToHg

A utility for converting an SVN repository to Mercurial.

Why use this instead of the utilities available in the Convert extension?

  • SvnToHg is driven by a rules file that allows the expression of complex mappings of svn directories to branches and tags
  • SvnToHg can handle svn revisions that span multiple branches and tags, such as those created by exotic uses of svnmucc
  • SvnToHg runs in a single pass, making it up to 4 times faster than the Convert extension for a 10-year old repository

Requirements

In order to use SvnToHg, you need:

  • Mercurial 1.9 or later
  • Subvertpy - you need trunk or version 0.8.6
  • Enough RAM to track the revision mappings effectively (Our 10 year old repo needs just under 1GB)

Installation and Operation

Obtain SvnToHg:

 hg clone https://bitbucket.org/wez/svntohg
 cd svntohg

The conversion is driven by the SvnToHg script. Before you can run it, you need to provide a rules file. The rules file is a JSON file containing author and branch mapping information. Examples are included in the rules directory.

You need to have file based access to your subversion repository; in other words, you either need to run this on the same machine as your subversion server, or otherwise rsync the repository directory across to where you want to manage the conversion.

To convert a standard layout subversion repository:

 ./SvnToHg --rules rules/std.json \
      --target /var/tmp/myrepo.hg \
      /path/to/my/svn/repo/dir

This will start to examine your repo, but will notice that you have no author information and will output a sample authors.json file that you should edit and then re-run the tool:

 ./SvnToHg --rules rules/std.json \
      --target /var/tmp/myrepo.hg \
      --authors authors.json      \
      /path/to/my/svn/repo/dir

Once armed with the author information, the utility will read through all of the revisions in the subversion repo and compare them against the rules in the order they were declared in the rules file.

Note that this usage with --target will cause the script to attempt to create the target repository. If it already exists, the script will abort. This is especially useful while you iterate to prove that your rules work appropriately. If you don't specify --target then it will attempt to use the mercurial repository from the current directory. This latter usage is not recommended because it is difficult to revert the repo to its former state if you need to revise and adjust your rules.

Rules File

The rules file contains an array of rule definitions. Each revision of the source repository is analyzed; each changed file is evaluated against each rule definition in the order they were declared in the rules file. The first matching rule is taken. If no rule matches a path, the conversion aborts--that means that you are required to define a mapping for all possible paths in your repository.

Rules allow you to classify subversion paths into a number of different ways. For example, you can perform very simple direct mapping of code changes to branches, and you can map subversion tags to mercurial branches.

The rules file looks something like this:

 [
   {
      "path": "trunk"
   },
   {
      "path": "branches/([^/]+)",
      "name": "\\1"
   },
   {
      "path": "tags/([^/]+)",
      "kind": "tag",
      "tagname": "\\1"
   }
 ]

Each rule object has the following possible keys:

  • path - a regular expression that is matched against the svn path.
  • kind - what kind of rule this is (default is "branch")
  • name - the branch name (default is "default")
  • tagname - if this is a tag rule, the name of the tag (default is "\\1")
  • min - the minimum svn revision number for the rule to match
  • max - the maximum svn revision number for the rule to match

path

The path regex is anchored to match from the start of the svn path, and automatically appends a pattern that forces the right hand side of your path pattern to match either the end of the svn path or match a directory separator.

In other words, a "path" of "trunk" will match "trunk", "trunk/foo" but not "trunks".

kind

Define the kind of rule. Allowed values are:

  • branch - maps the commit against a branch
  • tag - records the commit as a tag
  • ignore - the commit will not be recorded in the destination repo

name

Defines the branch name for the commit. The default is the "default" branch, but you may specify any value mercurial branch name string. The string is processed by expanded any captures from the regular expression. Consider the following rule:

   {
      "path": "branches/([^/]+)",
      "name": "\\1"
   },

If the subversion repository has commits to "branches/2.0", this rule will cause those to be recorded against a mercurial branch named "2.0". The numeric portion of the subversion path is captured by the parentheses and expanded in place of the \\1 in the name string.

tagname

Defines the tag name for the commit. This is specified independently from the branch name because tags are mutable in subversion; despite the best practices recommending that you don't modify things under the "tags" portion of the repo, there is nothing to prevent that from happening.

The SvnToHg strategy for handling these sorts of things is track changes made to tags as a line of code development that is tracked against an appropriate branch. If we imagine that we wanted to tag "branches/2.0" as "tags/2.0.1" and later found that we needed to tweak a version number in a build script and made that change directly to the tag (perhaps through laziness or perhaps by accident), we have to consider how we'd like to deal with this.

In order to track the build script change, we have to record that commit in mercurial; by default that change will go to the "default" branch, but in this case it is more appropriate for it to be tracked against our "2.0" branch, so we might use the following rule to make that happen:

   {
      "path": "tags/(2\\.0[^/]*)",
      "kind": "tag",
      "name": "2.0",
      "tagname": "\\1"
   }

This will produce "hg glog" output along the lines of:

    @ changeset:   10
    | branch:      default
    | tag:         tip
    | parent:      8
    | summary:     fix version script
    |
    | o changeset:  9
    | | branch:     2.0
    | | tag:        2.0.1
    | | parent:     7
    | | summary:    fix version script
    | |
    o | changeset:  8
    | | branch:     default
    | | parent:     5
    | | summary:    tag 2.0.1
    | |
    | o changeset:  7
    | | branch:     2.0
    | | parent:     6
    | | summary:    tag 2.0.1
    | |
    | o changeset:  6
    | | branch:     2.0
    | | parent:     5
    | | summary:    do 2.0 dev work
    |/
    o changeset:    5
    | branch:       default
    | summary:      stuff on the mainline

Changesets 8 and 10 occur on the "default" branch; these are synthesized commits that hold the modifications to the .hgtags file and only that file. There are two changes to .hgtags in this scenario because there are two versions of the 2.0.1 tag; the first is created along with the initial tag creation event in changset 7 (the corresponding .hgtags change is in changeset 8). The second version of the tag is created by changeset 9 (with the .hgtags change in changeset 10).

Changeset 9 corresponds to the actual version script change in this example; we track it against the 2.0 branch so that the history is clearer. If we didn't specify the branch, the conversion will use the "default" branch, producing a more confusing history with more branches.

SvnToHg tracks these tag commits to the "2.0" branch separately from true "branches/2.0" commits so it can safely represent changes to tags that are interleaved with changes to the "2.0" branch. The "glog" output will show those lines of development quite clearly.

If SvnToHg needs to record changes against a tag, it will mark the tip of that code line as being closed so that it won't show in the "hg heads" output unless you request that closed heads by displayed.

min and max

These specify a range of validity for the rule object. This will not typically be used unless you have some unusual history in your subversion repository. Consider the following example:

Your organization has decided to allow your developers to create their own branches under "branches/username". They are free to use that space how they like for their various projects.

Now let's say that one of the developers copied "trunk" to "branches/username" and did some work on it, then merged it back and deleted "branches/username" in svn revision 100. Later on, they had two projects ongoing and tracked those as "branches/username/a" and "branches/username/b".

In this scenario there are three logical branches that belong to that user:

  • branches/username (up until revision 100)
  • branches/username/a (after revision 100)
  • branches/username/b (after revision 100)

To capture these distinct branches, you might use rules like the following:

   {
      "path": "branches/username",
      "name": "username-feature-1",
      "max":  100
   },
   {
      "path": "branches/username/([^/]+)",
      "name": "username-feature-\\1",
      "min":  100
   },

The first rule is only evaluated up until subversion revision 100, after which it is skipped and the next rule evaluated instead. In this way, you can manage what would otherwise be conflicting or colliding rules that cause other migration tools to fail.

Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.