1. David Schleimer
  2. scratch-notes


scratch-notes / hgsubversion / branches.txt

This document describes a plan for addressing one of the current
limitations of hgsubversion.  Specifically, that it is limited to a
single path within a subversion repository.


Facebook is in the process of migrating from a mixed git/svn
environment to a Mercurial environment, via a mixed hg/svn
environment.  One of the requirements for our migration is that we
make our release branches available via Mercurial, both as a whole in
the central repository, and individually to developers.

Our repository HEAD is structured something like this:


Additionally, there are historical directories at the same level as
trunk and releases (notably including branches), which do not
currently exist, but which are involved in the history of trunk.  We
are mostly uninterested in these directories, aside from their
contributions to the history of trunk.

There is a svn:externals property on the latest directory that is
updated to point to one of the www directories under releases every
time a new release is created.  We are uninterested in this directory
beyond ensuring it is not present in Mercurial.

We currently have a mercurial clone in single mode of
prefix/trunk/www.  We want to add a Mercurial branch for each of the
releases in a central repo.  We do not want all developers to have a
copy of all the branches, since most developers will never need most
branches.  Instead we want developers to be able to add individual
branches to their repository as nedded.

While this particular layout is unique to our repository, we believe the problem we are facing is not.  We would rather add sufficient flexibilty to the upstream to cover our needs than build a custom extension in-house.


* Specify arbitrary directory in which to search for branches.
* Exclude paths not under trunk/branches dir/tags
* Specify path relative to trunk or ${branch_dir}/${branch_name}
  that will be the root of commits in Mercurial
* Cleanly add a new branch, whose root may be older than
  last(fromsvn()), to an existing repo


* We currently track a single most recently pulled rev, which makes it
  difficult to retroactively add a branch
* 'branches/' is hard-coded in many locations

Proposed Interface

Currently there are two layouts: single, and standard.  I would like to
replace standard with a new simple mode, and add a third custom layout.

Simple layout would be similar to the current standard layout, with
the exception that the branches (and possibly tags) dir would be
configurable, and that it would support a configurable infix that
represents the relative path from trunk, or a branches/branchname
directory to the directory that corresponds to the root of the
Mercurial repo.  Paths outside these directories will be handled in
the same way that standard currently handles them.

Custom layout would accept a map whose keys are mercurial branch
names, and whose values are non-overlapping svn paths relative to the
uri we are pulling from.  Changes outside these paths will be ignored
while importing revisions into Mercurial.  Addition of a new path will
cause the next fetch to partially re-import some svn revisions if said
revisions affect the new path.  This allows for "sparse" layouts where
you only want a handful of branches, or for completely arbitrary
layouts that do not resemble the trunk/branches/tags layout.

Alternative Interface

Rather than having a custom layout, we could add support for a branch
whitelist to the simple layout, which would serve our needs.  I
believe that this is likely to be harder to implement than a separate
custom layout.  Specifically, I believe that backfilling a new branch
will be much harder, in particular since we can afford to scale
inefiiciently with the number of branches for the custom layout, where
this wouldn't be reasonable for the default layout.  This is because I
believe that we can assume that people using the custom layout will be
interested in few branches.  This approach has been rejected due to
the consensus that it will be less clean.

Implementation Strategy

Right now, the branching logic is spread across at least pushmod.py,
svncommads.py, and svnmeta.py.  The first step is to pull this logic
out into a new set of modules under hgsubversion/layouts/ with one
module for each layout.

The new library will need to:

* Detect layout when cloning
* Translate from a svn path to a mercurial branch name
* Translate from a Mercurial commit to svn path
* Perform path infix stripping on pull and reapplication on push


The exact set of flags -> config settings clone processes will change
to support the new layouts, but it's behavior will not otherwise


Very little will change for the simple and the single cases, mostly
just refactoring.

For the custom case, we will need to find the lowest last fetched
revision for any branch.  If there is a branch which does not have a
last fetched revision, we will need to attempt to detect the creation
revision for that branch, similar to what we would need to do during
an intial clone.  We then replay revisions from that commit onward,
discarding changes which touch files outside the paths we care about,
or that map to a branch for which we have already fetched the revision
in question.

Note that for this to be effective, we need to update the last pulled
revision for a branch every time we pull revisions and include it in
the branches we care about, regardless of whether we see new commits
for that branch.