Commits

Mike Bayer  committed fa2c54a

initial rev

  • Participants

Comments (0)

Files changed (8)

+*.pyc
+.*.swp
+*.egg-info
+.venv
+build/
+dist/
+.coverage
+		    GNU GENERAL PUBLIC LICENSE
+		       Version 2, June 1991
+
+ Copyright (C) 1989, 1991 Free Software Foundation, Inc.
+     59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+			    Preamble
+
+  The licenses for most software are designed to take away your
+freedom to share and change it.  By contrast, the GNU General Public
+License is intended to guarantee your freedom to share and change free
+software--to make sure the software is free for all its users.  This
+General Public License applies to most of the Free Software
+Foundation's software and to any other program whose authors commit to
+using it.  (Some other Free Software Foundation software is covered by
+the GNU Library General Public License instead.)  You can apply it to
+your programs, too.
+
+  When we speak of free software, we are referring to freedom, not
+price.  Our General Public Licenses are designed to make sure that you
+have the freedom to distribute copies of free software (and charge for
+this service if you wish), that you receive source code or can get it
+if you want it, that you can change the software or use pieces of it
+in new free programs; and that you know you can do these things.
+
+  To protect your rights, we need to make restrictions that forbid
+anyone to deny you these rights or to ask you to surrender the rights.
+These restrictions translate to certain responsibilities for you if you
+distribute copies of the software, or if you modify it.
+
+  For example, if you distribute copies of such a program, whether
+gratis or for a fee, you must give the recipients all the rights that
+you have.  You must make sure that they, too, receive or can get the
+source code.  And you must show them these terms so they know their
+rights.
+
+  We protect your rights with two steps: (1) copyright the software, and
+(2) offer you this license which gives you legal permission to copy,
+distribute and/or modify the software.
+
+  Also, for each author's protection and ours, we want to make certain
+that everyone understands that there is no warranty for this free
+software.  If the software is modified by someone else and passed on, we
+want its recipients to know that what they have is not the original, so
+that any problems introduced by others will not reflect on the original
+authors' reputations.
+
+  Finally, any free program is threatened constantly by software
+patents.  We wish to avoid the danger that redistributors of a free
+program will individually obtain patent licenses, in effect making the
+program proprietary.  To prevent this, we have made it clear that any
+patent must be licensed for everyone's free use or not licensed at all.
+
+  The precise terms and conditions for copying, distribution and
+modification follow.
+
+		    GNU GENERAL PUBLIC LICENSE
+   TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
+
+  0. This License applies to any program or other work which contains
+a notice placed by the copyright holder saying it may be distributed
+under the terms of this General Public License.  The "Program", below,
+refers to any such program or work, and a "work based on the Program"
+means either the Program or any derivative work under copyright law:
+that is to say, a work containing the Program or a portion of it,
+either verbatim or with modifications and/or translated into another
+language.  (Hereinafter, translation is included without limitation in
+the term "modification".)  Each licensee is addressed as "you".
+
+Activities other than copying, distribution and modification are not
+covered by this License; they are outside its scope.  The act of
+running the Program is not restricted, and the output from the Program
+is covered only if its contents constitute a work based on the
+Program (independent of having been made by running the Program).
+Whether that is true depends on what the Program does.
+
+  1. You may copy and distribute verbatim copies of the Program's
+source code as you receive it, in any medium, provided that you
+conspicuously and appropriately publish on each copy an appropriate
+copyright notice and disclaimer of warranty; keep intact all the
+notices that refer to this License and to the absence of any warranty;
+and give any other recipients of the Program a copy of this License
+along with the Program.
+
+You may charge a fee for the physical act of transferring a copy, and
+you may at your option offer warranty protection in exchange for a fee.
+
+  2. You may modify your copy or copies of the Program or any portion
+of it, thus forming a work based on the Program, and copy and
+distribute such modifications or work under the terms of Section 1
+above, provided that you also meet all of these conditions:
+
+    a) You must cause the modified files to carry prominent notices
+    stating that you changed the files and the date of any change.
+
+    b) You must cause any work that you distribute or publish, that in
+    whole or in part contains or is derived from the Program or any
+    part thereof, to be licensed as a whole at no charge to all third
+    parties under the terms of this License.
+
+    c) If the modified program normally reads commands interactively
+    when run, you must cause it, when started running for such
+    interactive use in the most ordinary way, to print or display an
+    announcement including an appropriate copyright notice and a
+    notice that there is no warranty (or else, saying that you provide
+    a warranty) and that users may redistribute the program under
+    these conditions, and telling the user how to view a copy of this
+    License.  (Exception: if the Program itself is interactive but
+    does not normally print such an announcement, your work based on
+    the Program is not required to print an announcement.)
+
+These requirements apply to the modified work as a whole.  If
+identifiable sections of that work are not derived from the Program,
+and can be reasonably considered independent and separate works in
+themselves, then this License, and its terms, do not apply to those
+sections when you distribute them as separate works.  But when you
+distribute the same sections as part of a whole which is a work based
+on the Program, the distribution of the whole must be on the terms of
+this License, whose permissions for other licensees extend to the
+entire whole, and thus to each and every part regardless of who wrote it.
+
+Thus, it is not the intent of this section to claim rights or contest
+your rights to work written entirely by you; rather, the intent is to
+exercise the right to control the distribution of derivative or
+collective works based on the Program.
+
+In addition, mere aggregation of another work not based on the Program
+with the Program (or with a work based on the Program) on a volume of
+a storage or distribution medium does not bring the other work under
+the scope of this License.
+
+  3. You may copy and distribute the Program (or a work based on it,
+under Section 2) in object code or executable form under the terms of
+Sections 1 and 2 above provided that you also do one of the following:
+
+    a) Accompany it with the complete corresponding machine-readable
+    source code, which must be distributed under the terms of Sections
+    1 and 2 above on a medium customarily used for software interchange; or,
+
+    b) Accompany it with a written offer, valid for at least three
+    years, to give any third party, for a charge no more than your
+    cost of physically performing source distribution, a complete
+    machine-readable copy of the corresponding source code, to be
+    distributed under the terms of Sections 1 and 2 above on a medium
+    customarily used for software interchange; or,
+
+    c) Accompany it with the information you received as to the offer
+    to distribute corresponding source code.  (This alternative is
+    allowed only for noncommercial distribution and only if you
+    received the program in object code or executable form with such
+    an offer, in accord with Subsection b above.)
+
+The source code for a work means the preferred form of the work for
+making modifications to it.  For an executable work, complete source
+code means all the source code for all modules it contains, plus any
+associated interface definition files, plus the scripts used to
+control compilation and installation of the executable.  However, as a
+special exception, the source code distributed need not include
+anything that is normally distributed (in either source or binary
+form) with the major components (compiler, kernel, and so on) of the
+operating system on which the executable runs, unless that component
+itself accompanies the executable.
+
+If distribution of executable or object code is made by offering
+access to copy from a designated place, then offering equivalent
+access to copy the source code from the same place counts as
+distribution of the source code, even though third parties are not
+compelled to copy the source along with the object code.
+
+  4. You may not copy, modify, sublicense, or distribute the Program
+except as expressly provided under this License.  Any attempt
+otherwise to copy, modify, sublicense or distribute the Program is
+void, and will automatically terminate your rights under this License.
+However, parties who have received copies, or rights, from you under
+this License will not have their licenses terminated so long as such
+parties remain in full compliance.
+
+  5. You are not required to accept this License, since you have not
+signed it.  However, nothing else grants you permission to modify or
+distribute the Program or its derivative works.  These actions are
+prohibited by law if you do not accept this License.  Therefore, by
+modifying or distributing the Program (or any work based on the
+Program), you indicate your acceptance of this License to do so, and
+all its terms and conditions for copying, distributing or modifying
+the Program or works based on it.
+
+  6. Each time you redistribute the Program (or any work based on the
+Program), the recipient automatically receives a license from the
+original licensor to copy, distribute or modify the Program subject to
+these terms and conditions.  You may not impose any further
+restrictions on the recipients' exercise of the rights granted herein.
+You are not responsible for enforcing compliance by third parties to
+this License.
+
+  7. If, as a consequence of a court judgment or allegation of patent
+infringement or for any other reason (not limited to patent issues),
+conditions are imposed on you (whether by court order, agreement or
+otherwise) that contradict the conditions of this License, they do not
+excuse you from the conditions of this License.  If you cannot
+distribute so as to satisfy simultaneously your obligations under this
+License and any other pertinent obligations, then as a consequence you
+may not distribute the Program at all.  For example, if a patent
+license would not permit royalty-free redistribution of the Program by
+all those who receive copies directly or indirectly through you, then
+the only way you could satisfy both it and this License would be to
+refrain entirely from distribution of the Program.
+
+If any portion of this section is held invalid or unenforceable under
+any particular circumstance, the balance of the section is intended to
+apply and the section as a whole is intended to apply in other
+circumstances.
+
+It is not the purpose of this section to induce you to infringe any
+patents or other property right claims or to contest validity of any
+such claims; this section has the sole purpose of protecting the
+integrity of the free software distribution system, which is
+implemented by public license practices.  Many people have made
+generous contributions to the wide range of software distributed
+through that system in reliance on consistent application of that
+system; it is up to the author/donor to decide if he or she is willing
+to distribute software through any other system and a licensee cannot
+impose that choice.
+
+This section is intended to make thoroughly clear what is believed to
+be a consequence of the rest of this License.
+
+  8. If the distribution and/or use of the Program is restricted in
+certain countries either by patents or by copyrighted interfaces, the
+original copyright holder who places the Program under this License
+may add an explicit geographical distribution limitation excluding
+those countries, so that distribution is permitted only in or among
+countries not thus excluded.  In such case, this License incorporates
+the limitation as if written in the body of this License.
+
+  9. The Free Software Foundation may publish revised and/or new versions
+of the General Public License from time to time.  Such new versions will
+be similar in spirit to the present version, but may differ in detail to
+address new problems or concerns.
+
+Each version is given a distinguishing version number.  If the Program
+specifies a version number of this License which applies to it and "any
+later version", you have the option of following the terms and conditions
+either of that version or of any later version published by the Free
+Software Foundation.  If the Program does not specify a version number of
+this License, you may choose any version ever published by the Free Software
+Foundation.
+
+  10. If you wish to incorporate parts of the Program into other free
+programs whose distribution conditions are different, write to the author
+to ask for permission.  For software which is copyrighted by the Free
+Software Foundation, write to the Free Software Foundation; we sometimes
+make exceptions for this.  Our decision will be guided by the two goals
+of preserving the free status of all derivatives of our free software and
+of promoting the sharing and reuse of software generally.
+
+			    NO WARRANTY
+
+  11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
+FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW.  EXCEPT WHEN
+OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
+PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
+OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.  THE ENTIRE RISK AS
+TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU.  SHOULD THE
+PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
+REPAIR OR CORRECTION.
+
+  12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
+WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
+REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
+INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
+OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
+TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
+YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
+PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGES.
+
+		     END OF TERMS AND CONDITIONS
+
+	    How to Apply These Terms to Your New Programs
+
+  If you develop a new program, and you want it to be of the greatest
+possible use to the public, the best way to achieve this is to make it
+free software which everyone can redistribute and change under these terms.
+
+  To do so, attach the following notices to the program.  It is safest
+to attach them to the start of each source file to most effectively
+convey the exclusion of warranty; and each file should have at least
+the "copyright" line and a pointer to where the full notice is found.
+
+    <one line to give the program's name and a brief idea of what it does.>
+    Copyright (C) <year>  <name of author>
+
+    This program is free software; you can redistribute it and/or modify
+    it under the terms of the GNU General Public License as published by
+    the Free Software Foundation; either version 2 of the License, or
+    (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with this program; if not, write to the Free Software
+    Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+
+
+Also add information on how to contact you by electronic and paper mail.
+
+If the program is interactive, make it output a short notice like this
+when it starts in an interactive mode:
+
+    Gnomovision version 69, Copyright (C) year  name of author
+    Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
+    This is free software, and you are welcome to redistribute it
+    under certain conditions; type `show c' for details.
+
+The hypothetical commands `show w' and `show c' should show the appropriate
+parts of the General Public License.  Of course, the commands you use may
+be called something other than `show w' and `show c'; they could even be
+mouse-clicks or menu items--whatever suits your program.
+
+You should also get your employer (if you work as a programmer) or your
+school, if any, to sign a "copyright disclaimer" for the program, if
+necessary.  Here is a sample; alter the names:
+
+  Yoyodyne, Inc., hereby disclaims all copyright interest in the program
+  `Gnomovision' (which makes passes at compilers) written by James Hacker.
+
+  <signature of Ty Coon>, 1 April 1989
+  Ty Coon, President of Vice
+
+This General Public License does not permit incorporating your program into
+proprietary programs.  If your program is a subroutine library, you may
+consider it more useful to permit linking proprietary applications with the
+library.  If this is what you want to do, use the GNU Library General
+Public License instead of this License.

File hggitonce/__init__.py

Empty file added.

File hggitonce/base.py

+from mercurial import ui, hg
+from .git_handler import GitHandler
+
+def hg_to_git(args):
+    hg_repo = args.hg_repo
+    dest = args.dest
+
+    ui_obj = ui.ui()
+
+
+    repo = hg.repository(ui_obj, hg_repo)
+    GitHandler(repo, ui_obj, dest).export_commits()
+
+def translate_file_revs():
+    pass

File hggitonce/cmd.py

+import argparse
+from . import base
+
+
+def main(argv=None):
+    parser = argparse.ArgumentParser()
+
+    subparsers = parser.add_subparsers(help="sub-command help")
+
+
+    subparser = subparsers.add_parser("convert",
+                                help="convert an hg repo to git")
+    subparser.set_defaults(cmd=base.hg_to_git)
+    subparser.add_argument("hg_repo", help="Path to hg repo")
+    subparser.add_argument("dest", help="where to put the git repo")
+
+    args = parser.parse_args(argv)
+
+    cmd = args.cmd
+
+    cmd(args)
+
+if __name__ == '__main__':
+    main()

File hggitonce/git_handler.py

+################################################################
+#
+# Adapted from https://bitbucket.org/durin42/hg-git/.
+#
+################################################################
+
+import os
+import urllib
+import re
+
+from dulwich.objects import Commit, parse_timezone
+from dulwich.pack import apply_delta
+from dulwich.repo import Repo
+
+try:
+    from mercurial import bookmarks
+    bookmarks.update
+except ImportError:
+    from hgext import bookmarks
+
+try:
+    from mercurial.error import RepoError
+except ImportError:
+    from mercurial.repo import RepoError
+
+from mercurial.i18n import _
+from mercurial.node import hex as hex_  # this is just binascii.hexlify
+from mercurial import util as hgutil
+
+from . import hg2git
+
+
+RE_GIT_AUTHOR = re.compile('^(.*?) ?\<(.*?)(?:\>(.*))?$')
+
+RE_GIT_SANITIZE_AUTHOR = re.compile('[<>\n]')
+
+RE_GIT_AUTHOR_EXTRA = re.compile('^(.*?)\ ext:\((.*)\) <(.*)\>$')
+
+# Test for git:// and git+ssh:// URI.
+# Support several URL forms, including separating the
+# host and path with either a / or : (sepr)
+RE_GIT_URI = re.compile(
+    r'^(?P<scheme>git([+]ssh)?://)(?P<host>.*?)(:(?P<port>\d+))?'
+    r'(?P<sepr>[:/])(?P<path>.*)$')
+
+RE_NEWLINES = re.compile('[\r\n]')
+RE_GIT_PROGRESS = re.compile('\((\d+)/(\d+)\)')
+
+RE_AUTHOR_FILE = re.compile('\s*=\s*')
+
+
+class GitHandler(object):
+    mapfile = 'git-mapfile'
+    tagsfile = 'git-tags'
+
+    def __init__(self, dest_repo, ui, gitdir):
+        self.repo = dest_repo
+        self.ui = ui
+        self.gitdir = gitdir
+
+        self.init_author_file()
+
+        self.paths = ui.configitems('paths')
+
+        self.branch_bookmark_suffix = ui.config('git', 'branch_bookmark_suffix')
+
+        self._map_git_real = {}
+        self._map_hg_real = {}
+        self.load_tags()
+
+    def path_join(self, path):
+        return os.path.join(self.gitdir, path)
+
+    def opener(self, *args):
+        return open(*args)
+
+    @property
+    def _map_git(self):
+      if not self._map_git_real:
+        self.load_map()
+      return self._map_git_real
+
+    @property
+    def _map_hg(self):
+      if not self._map_hg_real:
+        self.load_map()
+      return self._map_hg_real
+
+    # make the git data directory
+    def init_if_missing(self):
+        if os.path.exists(self.gitdir):
+            self.git = Repo(self.gitdir)
+        else:
+            os.mkdir(self.gitdir)
+            self.git = Repo.init_bare(self.gitdir)
+
+    def init_author_file(self):
+        self.author_map = {}
+        if self.ui.config('git', 'authors'):
+            f = open(self.repo.wjoin(
+                self.ui.config('git', 'authors')))
+            try:
+                for line in f:
+                    line = line.strip()
+                    if not line or line.startswith('#'):
+                        continue
+                    from_, to = RE_AUTHOR_FILE.split(line, 2)
+                    self.author_map[from_] = to
+            finally:
+                f.close()
+
+    ## FILE LOAD AND SAVE METHODS
+
+    def map_set(self, gitsha, hgsha):
+        self._map_git[gitsha] = hgsha
+        self._map_hg[hgsha] = gitsha
+
+    def map_hg_get(self, gitsha):
+        return self._map_git.get(gitsha)
+
+    def map_git_get(self, hgsha):
+        return self._map_hg.get(hgsha)
+
+    def load_map(self):
+        if os.path.exists(self.path_join(self.mapfile)):
+            for line in self.opener(self.mapfile):
+                gitsha, hgsha = line.strip().split(' ', 1)
+                self._map_git_real[gitsha] = hgsha
+                self._map_hg_real[hgsha] = gitsha
+
+    def save_map(self):
+        file = self.opener(self.mapfile, 'w+')
+        for hgsha, gitsha in sorted(self._map_hg.iteritems()):
+            file.write("%s %s\n" % (gitsha, hgsha))
+        # If this complains that NoneType is not callable, then
+        # atomictempfile no longer has either of rename (pre-1.9) or
+        # close (post-1.9)
+        getattr(file, 'rename', getattr(file, 'close', None))()
+
+    def load_tags(self):
+        self.tags = {}
+        if os.path.exists(self.path_join(self.tagsfile)):
+            for line in self.opener(self.tagsfile):
+                sha, name = line.strip().split(' ', 1)
+                self.tags[name] = sha
+
+    def save_tags(self):
+        file = self.path_opener(self.tagsfile, 'w+', atomictemp=True)
+        for name, sha in sorted(self.tags.iteritems()):
+            if not self.repo.tagtype(name) == 'global':
+                file.write("%s %s\n" % (sha, name))
+        # If this complains that NoneType is not callable, then
+        # atomictempfile no longer has either of rename (pre-1.9) or
+        # close (post-1.9)
+        getattr(file, 'rename', getattr(file, 'close', None))()
+
+    ## END FILE LOAD AND SAVE METHODS
+
+    ## COMMANDS METHODS
+
+    def export_commits(self):
+        try:
+            self.export_git_objects()
+            self.export_hg_tags()
+            self.update_references()
+        finally:
+            self.save_map()
+
+
+    ## CHANGESET CONVERSION METHODS
+
+    def export_git_objects(self):
+        self.init_if_missing()
+
+        nodes = [self.repo.lookup(n) for n in self.repo]
+        export = [node for node in nodes if not hex(node) in self._map_hg]
+        total = len(export)
+        if total:
+            self.ui.note(_("exporting hg objects to git\n"))
+
+        # By only exporting deltas, the assertion is that all previous objects
+        # for all other changesets are already present in the Git repository.
+        # This assertion is necessary to prevent redundant work.
+        exporter = hg2git.IncrementalChangesetExporter(self.repo)
+
+        for i, rev in enumerate(export):
+            self.ui.progress('exporting', i, total=total)
+            ctx = self.repo.changectx(rev)
+            state = ctx.extra().get('hg-git', None)
+            if state == 'octopus':
+                self.ui.debug("revision %d is a part "
+                              "of octopus explosion\n" % ctx.rev())
+                continue
+            self.export_hg_commit(rev, exporter)
+        self.ui.progress('importing', None, total=total)
+
+
+    # convert this commit into git objects
+    # go through the manifest, convert all blobs/trees we don't have
+    # write the commit object (with metadata info)
+    def export_hg_commit(self, rev, exporter):
+        self.ui.note(_("converting revision %s\n") % hex(rev))
+
+        oldenc = self.swap_out_encoding()
+
+        ctx = self.repo.changectx(rev)
+        extra = ctx.extra()
+
+        commit = Commit()
+
+        (time, timezone) = ctx.date()
+        # work around to bad timezone offets - dulwich does not handle
+        # sub minute based timezones. In the one known case, it was a
+        # manual edit that led to the unusual value. Based on that,
+        # there is no reason to round one way or the other, so do the
+        # simplest and round down.
+        timezone -= (timezone % 60)
+        commit.author = self.get_git_author(ctx)
+        commit.author_time = int(time)
+        commit.author_timezone = -timezone
+
+        if 'committer' in extra:
+            # fixup timezone
+            (name, timestamp, timezone) = extra['committer'].rsplit(' ', 2)
+            commit.committer = name
+            commit.commit_time = timestamp
+
+            # work around a timezone format change
+            if int(timezone) % 60 != 0: #pragma: no cover
+                timezone = parse_timezone(timezone)
+                # Newer versions of Dulwich return a tuple here
+                if isinstance(timezone, tuple):
+                    timezone, neg_utc = timezone
+                    commit._commit_timezone_neg_utc = neg_utc
+            else:
+                timezone = -int(timezone)
+            commit.commit_timezone = timezone
+        else:
+            commit.committer = commit.author
+            commit.commit_time = commit.author_time
+            commit.commit_timezone = commit.author_timezone
+
+        commit.parents = []
+        for parent in self.get_git_parents(ctx):
+            hgsha = hex(parent.node())
+            git_sha = self.map_git_get(hgsha)
+            if git_sha:
+                if git_sha not in self.git.object_store:
+                    raise hgutil.Abort(_('Parent SHA-1 not present in Git'
+                                         'repo: %s' % git_sha))
+
+                commit.parents.append(git_sha)
+
+        commit.message = self.get_git_message(ctx)
+
+        if 'encoding' in extra:
+            commit.encoding = extra['encoding']
+
+        for obj, nodeid in exporter.update_changeset(ctx):
+            self.git.object_store.add_object(obj)
+
+        tree_sha = exporter.root_tree_sha
+
+        if tree_sha not in self.git.object_store:
+            raise hgutil.Abort(_('Tree SHA-1 not present in Git repo: %s' %
+                tree_sha))
+
+        commit.tree = tree_sha
+
+        self.git.object_store.add_object(commit)
+        self.map_set(commit.id, ctx.hex())
+
+        self.swap_out_encoding(oldenc)
+        return commit.id
+
+    def get_valid_git_username_email(self, name):
+        r"""Sanitize usernames and emails to fit git's restrictions.
+
+        The following is taken from the man page of git's fast-import
+        command:
+
+            [...] Likewise LF means one (and only one) linefeed [...]
+
+            committer
+                The committer command indicates who made this commit,
+                and when they made it.
+
+                Here <name> is the person's display name (for example
+                "Com M Itter") and <email> is the person's email address
+                ("cm@example.com[1]"). LT and GT are the literal
+                less-than (\x3c) and greater-than (\x3e) symbols. These
+                are required to delimit the email address from the other
+                fields in the line. Note that <name> and <email> are
+                free-form and may contain any sequence of bytes, except
+                LT, GT and LF. <name> is typically UTF-8 encoded.
+
+        Accordingly, this function makes sure that there are none of the
+        characters <, >, or \n in any string which will be used for
+        a git username or email. Before this, it first removes left
+        angle brackets and spaces from the beginning, and right angle
+        brackets and spaces from the end, of this string, to convert
+        such things as " <john@doe.com> " to "john@doe.com" for
+        convenience.
+
+        TESTS:
+
+        >>> from mercurial.ui import ui
+        >>> g = GitHandler('', ui()).get_valid_git_username_email
+        >>> g('John Doe')
+        'John Doe'
+        >>> g('john@doe.com')
+        'john@doe.com'
+        >>> g(' <john@doe.com> ')
+        'john@doe.com'
+        >>> g('    <random<\n<garbage\n>  > > ')
+        'random???garbage?'
+        >>> g('Typo in hgrc >but.hg-git@handles.it.gracefully>')
+        'Typo in hgrc ?but.hg-git@handles.it.gracefully'
+        """
+        return RE_GIT_SANITIZE_AUTHOR.sub('?', name.lstrip('< ').rstrip('> '))
+
+    def get_git_author(self, ctx):
+        # hg authors might not have emails
+        author = ctx.user()
+
+        # see if a translation exists
+        author = self.author_map.get(author, author)
+
+        # check for git author pattern compliance
+        a = RE_GIT_AUTHOR.match(author)
+
+        if a:
+            name = self.get_valid_git_username_email(a.group(1))
+            email = self.get_valid_git_username_email(a.group(2))
+            if a.group(3) != None and len(a.group(3)) != 0:
+                name += ' ext:(' + urllib.quote(a.group(3)) + ')'
+            author = self.get_valid_git_username_email(name) + ' <' + self.get_valid_git_username_email(email) + '>'
+        elif '@' in author:
+            author = self.get_valid_git_username_email(author) + ' <' + self.get_valid_git_username_email(author) + '>'
+        else:
+            author = self.get_valid_git_username_email(author) + ' <none@none>'
+
+        if 'author' in ctx.extra():
+            author = "".join(apply_delta(author, ctx.extra()['author']))
+
+        return author
+
+    def get_git_parents(self, ctx):
+        def is_octopus_part(ctx):
+            return ctx.extra().get('hg-git', None) in ('octopus', 'octopus-done')
+
+        parents = []
+        if ctx.extra().get('hg-git', None) == 'octopus-done':
+            # implode octopus parents
+            part = ctx
+            while is_octopus_part(part):
+                (p1, p2) = part.parents()
+                assert not is_octopus_part(p1)
+                parents.append(p1)
+                part = p2
+            parents.append(p2)
+        else:
+            parents = ctx.parents()
+
+        return parents
+
+    def get_git_message(self, ctx):
+        extra = ctx.extra()
+
+        message = ctx.description() + "\n"
+
+        # TODO: convert changelog ids
+        if 'message' in extra:
+            message = "".join(apply_delta(message, extra['message']))
+
+        for f in ctx.files():
+            if f not in ctx.manifest():
+                continue
+            rename = ctx.filectx(f).renamed()
+
+        return message
+
+
+    ## REFERENCES HANDLING
+
+    def update_references(self):
+        heads = self.local_heads()
+
+        # Create a local Git branch name for each
+        # Mercurial bookmark.
+        for key in heads:
+            git_ref = self.map_git_get(heads[key])
+            if git_ref:
+                self.git.refs['refs/heads/' + key] = self.map_git_get(heads[key])
+
+    def export_hg_tags(self):
+        for tag, sha in self.repo.tags().iteritems():
+            if self.repo.tagtype(tag) in ('global', 'git'):
+                tag = tag.replace(' ', '_')
+                target = self.map_git_get(hex(sha))
+                if target is not None:
+                    self.git.refs['refs/tags/' + tag] = target
+                    self.tags[tag] = hex(sha)
+                else:
+                    self.repo.ui.warn(
+                        'Skipping export of tag %s because it '
+                        'has no matching git revision.' % tag)
+
+    def local_heads(self):
+        d = dict(
+            (k, hex_(v)) for
+                k, v in  self.repo.branchtags().items()
+        )
+        if 'default' in d:
+            d['master'] = d['default']
+            del d['default']
+        return d
+
+
+
+    ## UTILITY FUNCTIONS
+
+    def extract_hg_metadata(self, message):
+        split = message.split("\n--HG--\n", 1)
+        renames = {}
+        extra = {}
+        branch = False
+        if len(split) == 2:
+            message, meta = split
+            lines = meta.split("\n")
+            for line in lines:
+                if line == '':
+                    continue
+
+                if ' : ' not in line:
+                    break
+                command, data = line.split(" : ", 1)
+
+                if command == 'rename':
+                    before, after = data.split(" => ", 1)
+                    renames[after] = before
+                if command == 'branch':
+                    branch = data
+                if command == 'extra':
+                    before, after = data.split(" : ", 1)
+                    extra[before] = urllib.unquote(after)
+        return (message, renames, branch, extra)
+
+
+    # Stolen from hgsubversion
+    def swap_out_encoding(self, new_encoding='UTF-8'):
+        try:
+            from mercurial import encoding
+            old = encoding.encoding
+            encoding.encoding = new_encoding
+        except ImportError:
+            old = hgutil._encoding
+            hgutil._encoding = new_encoding
+        return old
+

File hggitonce/hg2git.py

+################################################################
+#
+# Adapted from https://bitbucket.org/durin42/hg-git/.
+#
+################################################################
+
+
+# This file contains code dealing specifically with converting Mercurial
+# repositories to Git repositories. Code in this file is meant to be a generic
+# library and should be usable outside the context of hg-git or an hg command.
+
+import os
+import stat
+
+import dulwich.objects as dulobjs
+import mercurial.node
+
+from collections import OrderedDict
+
+class IncrementalChangesetExporter(object):
+    """Incrementally export Mercurial changesets to Git trees.
+
+    The purpose of this class is to facilitate Git tree export that is more
+    optimal than brute force.
+
+    A "dumb" implementations of Mercurial to Git export would iterate over
+    every file present in a Mercurial changeset and would convert each to
+    a Git blob and then conditionally add it to a Git repository if it didn't
+    yet exist. This is suboptimal because the overhead associated with
+    obtaining every file's raw content and converting it to a Git blob is
+    not trivial!
+
+    This class works around the suboptimality of brute force export by
+    leveraging the information stored in Mercurial - the knowledge of what
+    changed between changesets - to only export Git objects corresponding to
+    changes in Mercurial. In the context of converting Mercurial repositories
+    to Git repositories, we only export objects Git (possibly) hasn't seen yet.
+    This prevents a lot of redundant work and is thus faster.
+
+    Callers instantiate an instance of this class against a mercurial.localrepo
+    instance. They then associate it with a specific changesets by calling
+    update_changeset(). On each call to update_changeset(), the instance
+    computes the difference between the current and new changesets and emits
+    Git objects that haven't yet been encountered during the lifetime of the
+    class instance. In other words, it expresses Mercurial changeset deltas in
+    terms of Git objects. Callers then (usually) take this set of Git objects
+    and add them to the Git repository.
+
+    This class only emits Git blobs and trees, not commits.
+
+    The tree calculation part of this class is essentially a reimplementation
+    of dulwich.index.commit_tree. However, since our implementation reuses
+    Tree instances and only recalculates SHA-1 when things change, we are
+    more efficient.
+    """
+
+    def __init__(self, hg_repo):
+        """Create an instance against a mercurial.localrepo."""
+        self._hg = hg_repo
+
+        # Our current revision.
+        self._rev = mercurial.node.nullrev
+
+        # Path to dulwich.objects.Tree.
+        self._dirs = {}
+
+        # Mercurial file nodeid to Git blob SHA-1. Used to prevent redundant
+        # blob calculation.
+        self._blob_cache = {}
+
+    @property
+    def root_tree_sha(self):
+        """The SHA-1 of the root Git tree.
+
+        This is needed to construct a Git commit object.
+        """
+        return self._dirs[''].id
+
+    def update_changeset(self, ctx):
+        """Set the tree to track a new Mercurial changeset.
+
+        This is a generator of 2-tuples. The first item in each tuple is a
+        dulwich object, either a Blob or a Tree. The second item is the
+        corresponding Mercurial nodeid for the item, if any. Only blobs will
+        have nodeids. Trees do not correspond to a specific nodeid, so it does
+        not make sense to emit a nodeid for them.
+
+        When exporting trees from Mercurial, callers typically write the
+        returned dulwich object to the Git repo via the store's add_object().
+
+        Some emitted objects may already exist in the Git repository. This
+        class does not know about the Git repository, so it's up to the caller
+        to conditionally add the object, etc.
+
+        Emitted objects are those that have changed since the last call to
+        update_changeset. If this is the first call to update_chanageset, all
+        objects in the tree are emitted.
+        """
+        # Our general strategy is to accumulate dulwich.objects.Blob and
+        # dulwich.objects.Tree instances for the current Mercurial changeset.
+        # We do this incremental by iterating over the Mercurial-reported
+        # changeset delta. We rely on the behavior of Mercurial to lazy
+        # calculate a Tree's SHA-1 when we modify it. This is critical to
+        # performance.
+
+        # In theory we should be able to look at changectx.files(). This is
+        # *much* faster. However, it may not be accurate, especially with older
+        # repositories, which may not record things like deleted files
+        # explicitly in the manifest (which is where files() gets its data).
+        # The only reliable way to get the full set of changes is by looking at
+        # the full manifest. And, the easy way to compare two manifests is
+        # localrepo.status().
+        modified, added, removed = self._hg.status(self._rev, ctx.rev())[0:3]
+
+        # We track which directories/trees have modified in this update and we
+        # only export those.
+        dirty_trees = set()
+
+        # We first process file removals so we can prune dead trees.
+        for path in removed:
+            d = os.path.dirname(path)
+            tree = self._dirs.get(d, dulobjs.Tree())
+
+            del tree[os.path.basename(path)]
+            dirty_trees.add(d)
+
+            # If removing this file made the tree empty, we should delete this
+            # tree. This could result in parent trees losing their only child
+            # and so on.
+            if not len(tree):
+                self._remove_tree(d)
+                continue
+
+            self._dirs[d] = tree
+
+        # For every file that changed or was added, we need to calculate the
+        # corresponding Git blob and its tree entry. We emit the blob
+        # immediately and update trees to be aware of its presence.
+        for path in set(modified) | set(added):
+            # Handle special Mercurial paths.
+            if path == '.hgsubstate':
+                self._handle_subrepos(ctx, dirty_trees)
+                continue
+
+            if path == '.hgsub':
+                continue
+
+            d = os.path.dirname(path)
+            tree = self._dirs.setdefault(d, dulobjs.Tree())
+            dirty_trees.add(d)
+
+            fctx = ctx[path]
+
+            entry, blob = IncrementalChangesetExporter.tree_entry(fctx,
+                self._blob_cache)
+            if blob is not None:
+                yield (blob, fctx.filenode())
+
+            tree.add(*entry)
+
+        # Now that all the trees represent the current changeset, recalculate
+        # the tree IDs and emit them. Note that we wait until now to calculate
+        # tree SHA-1s. This is an important difference between us and
+        # dulwich.index.commit_tree(), which builds new Tree instances for each
+        # series of blobs.
+        for obj in self._populate_tree_entries(dirty_trees):
+            yield (obj, None)
+
+        self._rev = ctx.rev()
+
+    def _remove_tree(self, path):
+        """Remove a (presumably empty) tree from the current changeset.
+
+        A now-empty tree may be the only child of its parent. So, we traverse
+        up the chain to the root tree, deleting any empty trees along the way.
+        """
+        try:
+            del self._dirs[path]
+        except KeyError:
+            return
+
+        # Now we traverse up to the parent and delete any references.
+        if path == '':
+            return
+
+        basename = os.path.basename(path)
+        parent = os.path.dirname(path)
+        while True:
+            tree = self._dirs.get(parent, None)
+
+            # No parent entry. Nothing to remove or update.
+            if tree is None:
+                return
+
+            try:
+                del tree[basename]
+            except KeyError:
+                return
+
+            if len(tree):
+                return
+
+            # The parent tree is empty. Se, we can delete it.
+            del self._dirs[parent]
+
+            if parent == '':
+                return
+
+            basename = os.path.basename(parent)
+            parent = os.path.dirname(parent)
+
+    def _populate_tree_entries(self, dirty_trees):
+        self._dirs.setdefault('', dulobjs.Tree())
+
+        # Fill in missing directories.
+        for path in self._dirs.keys():
+            parent = os.path.dirname(path)
+
+            while parent != '':
+                parent_tree = self._dirs.get(parent, None)
+
+                if parent_tree is not None:
+                    break
+
+                self._dirs[parent] = dulobjs.Tree()
+                parent = os.path.dirname(parent)
+
+        for dirty in list(dirty_trees):
+            parent = os.path.dirname(dirty)
+
+            while parent != '':
+                if parent in dirty_trees:
+                    break
+
+                dirty_trees.add(parent)
+                parent = os.path.dirname(parent)
+
+        # The root tree is always dirty but doesn't always get updated.
+        dirty_trees.add('')
+
+        # We only need to recalculate and export dirty trees.
+        for d in sorted(dirty_trees, key=len, reverse=True):
+            # Only happens for deleted directories.
+            try:
+                tree = self._dirs[d]
+            except KeyError:
+                continue
+
+            yield tree
+
+            if d == '':
+                continue
+
+            parent_tree = self._dirs[os.path.dirname(d)]
+
+            # Accessing the tree's ID is what triggers SHA-1 calculation and is
+            # the expensive part (at least if the tree has been modified since
+            # the last time we retrieved its ID). Also, assigning an entry to a
+            # tree (even if it already exists) invalidates the existing tree
+            # and incurs SHA-1 recalculation. So, it's in our interest to avoid
+            # invalidating trees. Since we only update the entries of dirty
+            # trees, this should hold true.
+            parent_tree[os.path.basename(d)] = (stat.S_IFDIR, tree.id)
+
+    def _handle_subrepos(self, ctx, dirty_trees):
+        substate = parse_hgsubstate(ctx['.hgsubstate'].data().splitlines())
+        sub = OrderedDict()
+
+        if '.hgsub' in ctx:
+            sub = parse_hgsub(ctx['.hgsub'].data().splitlines())
+
+        for path, sha in substate.iteritems():
+            # Ignore non-Git repositories keeping state in .hgsubstate.
+            if path in sub and not sub[path].startswith('[git]'):
+                continue
+
+            d = os.path.dirname(path)
+            dirty_trees.add(d)
+            tree = self._dirs.setdefault(d, dulobjs.Tree())
+            tree.add(os.path.basename(path), dulobjs.S_IFGITLINK, sha)
+
+    @staticmethod
+    def tree_entry(fctx, blob_cache):
+        """Compute a dulwich TreeEntry from a filectx.
+
+        A side effect is the TreeEntry is stored in the passed cache.
+
+        Returns a 2-tuple of (dulwich.objects.TreeEntry, dulwich.objects.Blob).
+        """
+        blob_id = blob_cache.get(fctx.filenode(), None)
+        blob = None
+
+        if blob_id is None:
+            blob = dulobjs.Blob.from_string(fctx.data())
+            blob_id = blob.id
+            blob_cache[fctx.filenode()] = blob_id
+
+        flags = fctx.flags()
+
+        if 'l' in flags:
+            mode = 0120000
+        elif 'x' in flags:
+            mode = 0100755
+        else:
+            mode = 0100644
+
+        return (dulobjs.TreeEntry(os.path.basename(fctx.path()), mode, blob_id),
+                blob)
+
+
+
+def parse_hgsub(lines):
+    """Fills OrderedDict with hgsub file content passed as list of lines"""
+    rv = OrderedDict()
+    for l in lines:
+        ls = l.strip();
+        if not ls or ls[0] == '#': continue
+        name, value = l.split('=', 1)
+        rv[name.strip()] = value.strip()
+    return rv
+
+def serialize_hgsub(data):
+    """Produces a string from OrderedDict hgsub content"""
+    return ''.join(['%s = %s\n' % (n,v) for n,v in data.iteritems()])
+
+def parse_hgsubstate(lines):
+    """Fills OrderedDict with hgsubtate file content passed as list of lines"""
+    rv = OrderedDict()
+    for l in lines:
+        ls = l.strip();
+        if not ls or ls[0] == '#': continue
+        value, name = l.split(' ', 1)
+        rv[name.strip()] = value.strip()
+    return rv
+
+def serialize_hgsubstate(data):
+    """Produces a string from OrderedDict hgsubstate content"""
+    return ''.join(['%s %s\n' % (data[n], n) for n in sorted(data)])
+from setuptools import setup
+
+
+setup(name='hggitonce',
+      version=1.0,
+      description="migrate hg to git one way",
+      classifiers=[
+      'Development Status :: 4 - Beta',
+      'Environment :: Console',
+      'Programming Language :: Python',
+      'Programming Language :: Python :: Implementation :: CPython',
+      'Programming Language :: Python :: Implementation :: PyPy',
+      ],
+      author='Mike Bayer',
+      author_email='mike@zzzcomputing.com',
+      url='http://bitbucket.org/zzzeek/hggitonce',
+      license='MIT',
+      packages=["hggitonce"],
+      zip_safe=False,
+      install_requires=['dulwich>=0.8.6'],
+      entry_points={
+        'console_scripts': ['hggitonce = hggitonce.cmd:main'],
+      }
+)