Support for named branches

Issue #27 resolved
Petri Lehtinen
created an issue

hg-git doesn't currently support Hg's named branches automatically. There's a good reason for this: Named branches are a totally different concept compared to Git's branches.

(When converting from Git to Hg, the {{{--HG--}}} metadata tags can be used to assing changesets to named branches. This doesn't work automatically, though, and clutters the Git commit messages.)

However, this makes life hard for a Git user who works with a mercurial upstream, if the upstream folks refuse to use bookmarks and want to stick with named branches.

What I'm proposing is an approach that could be enabled per-repository in {{{hgrc}}} and so would be totally optional. It would work in both directions (from Hg to Git and from Git to Hg). It uses tree searching to determine which branch name should be assigned to each Git commit when converting Git commits to Hg changeset (hgimport).

=== Hg to Git conversion ===

This is the easy part. Pseudocode:



Check that all branches have only one head

for branch_name in hg.branches: if hg.branches[branch_name].number_of_heads > 1: raise ValueError('Named branch %s has multiple heads' % branch_name)

Converts Git commits to hg changesets


Update Git branches

for branch_name in hg.branches: git.update_ref(branch_name, branch_name) }}}

Most projects have only one head for each named branch. If this is not the case, I guess there's nothing that can be done.

=== Git to Hg conversion ===

This is the more interesting part. For every Git commit, we need to determine which named branch the commit belongs to. Pseudocode:



Found Git commits and their branches:

found[commit_id] = branch_for_the_commit

found = {}

Working set, a list of (commit_id, branch) tuples

work = []

Master is always processed first and mapped to the

"default" mercurial branch

work.append((git.branches['master'], 'default'))

Add other Git branches

for branch_name, commit_id in git.branches: if branch_name == 'master': continue

work.append((commit_id, branch_name))

Main loop

while len(work) > 0: commit_id, branch_name = work.pop(0)

while True: if commit_id in found: # Already found break

if commit_id in hg.git_commits:
  # Already converted to hg in by hgimport
  # in the past

found[commit_id] = branch_name

# List of commit's parents
parents = git.commits[commit_id].parents

if not parents:
  # This commit is a "root" commit that has
  # no parent

# Descend to the first parent in the next
# iteration
commit_id = parents[0]

# Merge commits have more than one parent
for parent in parents[1:]:
  # Descend to other parents when all
  # previously found branches have already
  # been searched first
  work.append((parent, branch))

After this, import commits normally to Hg, using

found[commit_id] as the branch name for each Hg



This is a depth-first search of branches. A branch starts from each Git branch and each merge commit that is found along the way. When searching to the maximum depth, the first parent of each commit is followed, and other parents are added to the work set as new branches. Because the first branch is the "target" branch of merges, this assigns the correct branch name to merge commits.

Parents of merge commits (other than the first one) are searched only when all the "normal" branches have first been searched to the maximum depth. This ensures that "ad-hoc" branches get the correct branch name but no "normal" branch gets the wrong branch name.

Adding master as the first element in the working set ensures that all new branches are spawn from the default branch.

Some remarks: The actual implementation should implement {{{work}}} as a deque for better performance. Commits not reachable from any branch are never added to {{{found}}} and thus don't get a branch name. A mapping between Git and Hg branch names can be implemented on top of this If long-lasting branches can spawn from other branches than master, the branches should be ordered somehow. Otherwise we might get into a situation where some commits in a branch get the wrong branch label if another branch has spawned from it in the past. This could also be possible by tweaking the algorithm.

=== Conclusion ===

I haven't looked much at hg-git's code or dulwich's API yet. I'd like to receive comments whether this is a totally silly idea to start with, or does it sound doable.

Comments (4)

  1. Nilesh

    I haven't checked the code here, but this is definitely a great amendment. I just created a git mirror of my project which was originally in hg and then saw those comment lines mercurial uses in git commit messages. It looks really weird. Also, I had a tough time managing those branch names. Bookmarking master to default and stable to master, eh, but finally got things right and working.

  2. Augie Fackler repo owner
    • removed assignee

    I'll try to spend time thinking about this, but if you want to have a real discussion with more people please email the google group rather than trying to use the bug tracker for development discussions.

  3. Log in to comment