Commits

Axel Hecht  committed 652bd22

removing parts of convert we don't need

  • Participants
  • Parent commits d384766

Comments (0)

Files changed (10)

File gaiaconv/__init__.py

 '''import revisions from foreign VCS repositories into Mercurial'''
 
 import convcmd
-import cvsps
-import subversion
-from mercurial import commands, templatekw
+from mercurial import commands
 from mercurial.i18n import _
 
 testedwith = 'internal'
 
 # Commands definition was moved elsewhere to ease demandload job.
 
-def convert(ui, src, dest=None, revmapfile=None, **opts):
-    """convert a foreign SCM repository to a Mercurial one.
-
-    Accepted source formats [identifiers]:
-
-    - Mercurial [hg]
-    - CVS [cvs]
-    - Darcs [darcs]
-    - git [git]
-    - Subversion [svn]
-    - Monotone [mtn]
-    - GNU Arch [gnuarch]
-    - Bazaar [bzr]
-    - Perforce [p4]
-
-    Accepted destination formats [identifiers]:
-
-    - Mercurial [hg]
-    - Subversion [svn] (history on branches is not preserved)
-
-    If no revision is given, all revisions will be converted.
-    Otherwise, convert will only import up to the named revision
-    (given in a format understood by the source).
-
-    If no destination directory name is specified, it defaults to the
-    basename of the source with ``-hg`` appended. If the destination
-    repository doesn't exist, it will be created.
-
-    By default, all sources except Mercurial will use --branchsort.
-    Mercurial uses --sourcesort to preserve original revision numbers
-    order. Sort modes have the following effects:
-
-    --branchsort  convert from parent to child revision when possible,
-                  which means branches are usually converted one after
-                  the other. It generates more compact repositories.
-
-    --datesort    sort revisions by date. Converted repositories have
-                  good-looking changelogs but are often an order of
-                  magnitude larger than the same ones generated by
-                  --branchsort.
-
-    --sourcesort  try to preserve source revisions order, only
-                  supported by Mercurial sources.
-
-    --closesort   try to move closed revisions as close as possible
-                  to parent branches, only supported by Mercurial
-                  sources.
-
-    If ``REVMAP`` isn't given, it will be put in a default location
-    (``<dest>/.hg/shamap`` by default). The ``REVMAP`` is a simple
-    text file that maps each source commit ID to the destination ID
-    for that revision, like so::
-
-      <source ID> <destination ID>
-
-    If the file doesn't exist, it's automatically created. It's
-    updated on each commit copied, so :hg:`convert` can be interrupted
-    and can be run repeatedly to copy new commits.
-
-    The authormap is a simple text file that maps each source commit
-    author to a destination commit author. It is handy for source SCMs
-    that use unix logins to identify authors (e.g.: CVS). One line per
-    author mapping and the line format is::
-
-      source author = destination author
-
-    Empty lines and lines starting with a ``#`` are ignored.
-
-    The filemap is a file that allows filtering and remapping of files
-    and directories. Each line can contain one of the following
-    directives::
-
-      include path/to/file-or-dir
-
-      exclude path/to/file-or-dir
-
-      rename path/to/source path/to/destination
-
-    Comment lines start with ``#``. A specified path matches if it
-    equals the full relative name of a file or one of its parent
-    directories. The ``include`` or ``exclude`` directive with the
-    longest matching path applies, so line order does not matter.
-
-    The ``include`` directive causes a file, or all files under a
-    directory, to be included in the destination repository, and the
-    exclusion of all other files and directories not explicitly
-    included. The ``exclude`` directive causes files or directories to
-    be omitted. The ``rename`` directive renames a file or directory if
-    it is converted. To rename from a subdirectory into the root of
-    the repository, use ``.`` as the path to rename to.
-
-    The splicemap is a file that allows insertion of synthetic
-    history, letting you specify the parents of a revision. This is
-    useful if you want to e.g. give a Subversion merge two parents, or
-    graft two disconnected series of history together. Each entry
-    contains a key, followed by a space, followed by one or two
-    comma-separated values::
-
-      key parent1, parent2
-
-    The key is the revision ID in the source
-    revision control system whose parents should be modified (same
-    format as a key in .hg/shamap). The values are the revision IDs
-    (in either the source or destination revision control system) that
-    should be used as the new parents for that node. For example, if
-    you have merged "release-1.0" into "trunk", then you should
-    specify the revision on "trunk" as the first parent and the one on
-    the "release-1.0" branch as the second.
-
-    The branchmap is a file that allows you to rename a branch when it is
-    being brought in from whatever external repository. When used in
-    conjunction with a splicemap, it allows for a powerful combination
-    to help fix even the most badly mismanaged repositories and turn them
-    into nicely structured Mercurial repositories. The branchmap contains
-    lines of the form::
-
-      original_branch_name new_branch_name
-
-    where "original_branch_name" is the name of the branch in the
-    source repository, and "new_branch_name" is the name of the branch
-    is the destination repository. No whitespace is allowed in the
-    branch names. This can be used to (for instance) move code in one
-    repository from "default" to a named branch.
-
-    Mercurial Source
-    ################
-
-    The Mercurial source recognizes the following configuration
-    options, which you can set on the command line with ``--config``:
-
-    :convert.hg.ignoreerrors: ignore integrity errors when reading.
-        Use it to fix Mercurial repositories with missing revlogs, by
-        converting from and to Mercurial. Default is False.
-
-    :convert.hg.saverev: store original revision ID in changeset
-        (forces target IDs to change). It takes a boolean argument and
-        defaults to False.
-
-    :convert.hg.startrev: convert start revision and its descendants.
-        It takes a hg revision identifier and defaults to 0.
-
-    CVS Source
-    ##########
-
-    CVS source will use a sandbox (i.e. a checked-out copy) from CVS
-    to indicate the starting point of what will be converted. Direct
-    access to the repository files is not needed, unless of course the
-    repository is ``:local:``. The conversion uses the top level
-    directory in the sandbox to find the CVS repository, and then uses
-    CVS rlog commands to find files to convert. This means that unless
-    a filemap is given, all files under the starting directory will be
-    converted, and that any directory reorganization in the CVS
-    sandbox is ignored.
-
-    The following options can be used with ``--config``:
-
-    :convert.cvsps.cache: Set to False to disable remote log caching,
-        for testing and debugging purposes. Default is True.
-
-    :convert.cvsps.fuzz: Specify the maximum time (in seconds) that is
-        allowed between commits with identical user and log message in
-        a single changeset. When very large files were checked in as
-        part of a changeset then the default may not be long enough.
-        The default is 60.
-
-    :convert.cvsps.mergeto: Specify a regular expression to which
-        commit log messages are matched. If a match occurs, then the
-        conversion process will insert a dummy revision merging the
-        branch on which this log message occurs to the branch
-        indicated in the regex. Default is ``{{mergetobranch
-        ([-\\w]+)}}``
-
-    :convert.cvsps.mergefrom: Specify a regular expression to which
-        commit log messages are matched. If a match occurs, then the
-        conversion process will add the most recent revision on the
-        branch indicated in the regex as the second parent of the
-        changeset. Default is ``{{mergefrombranch ([-\\w]+)}}``
-
-    :convert.localtimezone: use local time (as determined by the TZ
-        environment variable) for changeset date/times. The default
-        is False (use UTC).
-
-    :hooks.cvslog: Specify a Python function to be called at the end of
-        gathering the CVS log. The function is passed a list with the
-        log entries, and can modify the entries in-place, or add or
-        delete them.
-
-    :hooks.cvschangesets: Specify a Python function to be called after
-        the changesets are calculated from the CVS log. The
-        function is passed a list with the changeset entries, and can
-        modify the changesets in-place, or add or delete them.
-
-    An additional "debugcvsps" Mercurial command allows the builtin
-    changeset merging code to be run without doing a conversion. Its
-    parameters and output are similar to that of cvsps 2.1. Please see
-    the command help for more details.
-
-    Subversion Source
-    #################
-
-    Subversion source detects classical trunk/branches/tags layouts.
-    By default, the supplied ``svn://repo/path/`` source URL is
-    converted as a single branch. If ``svn://repo/path/trunk`` exists
-    it replaces the default branch. If ``svn://repo/path/branches``
-    exists, its subdirectories are listed as possible branches. If
-    ``svn://repo/path/tags`` exists, it is looked for tags referencing
-    converted branches. Default ``trunk``, ``branches`` and ``tags``
-    values can be overridden with following options. Set them to paths
-    relative to the source URL, or leave them blank to disable auto
-    detection.
-
-    The following options can be set with ``--config``:
-
-    :convert.svn.branches: specify the directory containing branches.
-        The default is ``branches``.
-
-    :convert.svn.tags: specify the directory containing tags. The
-        default is ``tags``.
-
-    :convert.svn.trunk: specify the name of the trunk branch. The
-        default is ``trunk``.
-
-    :convert.localtimezone: use local time (as determined by the TZ
-        environment variable) for changeset date/times. The default
-        is False (use UTC).
-
-    Source history can be retrieved starting at a specific revision,
-    instead of being integrally converted. Only single branch
-    conversions are supported.
-
-    :convert.svn.startrev: specify start Subversion revision number.
-        The default is 0.
-
-    Perforce Source
-    ###############
-
-    The Perforce (P4) importer can be given a p4 depot path or a
-    client specification as source. It will convert all files in the
-    source to a flat Mercurial repository, ignoring labels, branches
-    and integrations. Note that when a depot path is given you then
-    usually should specify a target directory, because otherwise the
-    target may be named ``...-hg``.
-
-    It is possible to limit the amount of source history to be
-    converted by specifying an initial Perforce revision:
-
-    :convert.p4.startrev: specify initial Perforce revision (a
-        Perforce changelist number).
-
-    Mercurial Destination
-    #####################
-
-    The following options are supported:
-
-    :convert.hg.clonebranches: dispatch source branches in separate
-        clones. The default is False.
-
-    :convert.hg.tagsbranch: branch name for tag revisions, defaults to
-        ``default``.
-
-    :convert.hg.usebranchnames: preserve branch names. The default is
-        True.
+def gaiaconv(ui, src, dest=None, revmapfile=None, **opts):
+    """convert a gaia to l10n repo
     """
     return convcmd.convert(ui, src, dest, revmapfile, **opts)
 
-def debugsvnlog(ui, **opts):
-    return subversion.debugsvnlog(ui, **opts)
-
-def debugcvsps(ui, *args, **opts):
-    '''create changeset information from CVS
-
-    This command is intended as a debugging tool for the CVS to
-    Mercurial converter, and can be used as a direct replacement for
-    cvsps.
-
-    Hg debugcvsps reads the CVS rlog for current directory (or any
-    named directory) in the CVS repository, and converts the log to a
-    series of changesets based on matching commit log entries and
-    dates.'''
-    return cvsps.debugcvsps(ui, *args, **opts)
-
-commands.norepo += " convert debugsvnlog debugcvsps"
+commands.norepo += " gaiaconv"
 
 cmdtable = {
-    "convert":
-        (convert,
+    "gaiaconv":
+        (gaiaconv,
          [('', 'authors', '',
            _('username mapping filename (DEPRECATED, use --authormap instead)'),
            _('FILE')),
           ('', 'datesort', None, _('try to sort changesets by date')),
           ('', 'sourcesort', None, _('preserve source changesets order')),
           ('', 'closesort', None, _('try to reorder closed revisions'))],
-         _('hg convert [OPTION]... SOURCE [DEST [REVMAP]]')),
-    "debugsvnlog":
-        (debugsvnlog,
-         [],
-         'hg debugsvnlog'),
-    "debugcvsps":
-        (debugcvsps,
-         [
-          # Main options shared with cvsps-2.1
-          ('b', 'branches', [], _('only return changes on specified branches')),
-          ('p', 'prefix', '', _('prefix to remove from file names')),
-          ('r', 'revisions', [],
-           _('only return changes after or between specified tags')),
-          ('u', 'update-cache', None, _("update cvs log cache")),
-          ('x', 'new-cache', None, _("create new cvs log cache")),
-          ('z', 'fuzz', 60, _('set commit time fuzz in seconds')),
-          ('', 'root', '', _('specify cvsroot')),
-          # Options specific to builtin cvsps
-          ('', 'parents', '', _('show parent changesets')),
-          ('', 'ancestors', '',
-           _('show current changeset in ancestor branches')),
-          # Options that are ignored for compatibility with cvsps-2.1
-          ('A', 'cvs-direct', None, _('ignored for compatibility')),
-         ],
-         _('hg debugcvsps [OPTION]... [PATH]...')),
+         _('hg convert [OPTION]... SOURCE [DEST [REVMAP]]'))
 }
-
-def kwconverted(ctx, name):
-    rev = ctx.extra().get('convert_revision', '')
-    if rev.startswith('svn:'):
-        if name == 'svnrev':
-            return str(subversion.revsplit(rev)[2])
-        elif name == 'svnpath':
-            return subversion.revsplit(rev)[1]
-        elif name == 'svnuuid':
-            return subversion.revsplit(rev)[0]
-    return rev
-
-def kwsvnrev(repo, ctx, **args):
-    """:svnrev: String. Converted subversion revision number."""
-    return kwconverted(ctx, 'svnrev')
-
-def kwsvnpath(repo, ctx, **args):
-    """:svnpath: String. Converted subversion revision project path."""
-    return kwconverted(ctx, 'svnpath')
-
-def kwsvnuuid(repo, ctx, **args):
-    """:svnuuid: String. Converted subversion revision repository identifier."""
-    return kwconverted(ctx, 'svnuuid')
-
-def extsetup(ui):
-    templatekw.keywords['svnrev'] = kwsvnrev
-    templatekw.keywords['svnpath'] = kwsvnpath
-    templatekw.keywords['svnuuid'] = kwsvnuuid
-
-# tell hggettext to extract docstrings from these functions:
-i18nfunctions = [kwsvnrev, kwsvnpath, kwsvnuuid]

File gaiaconv/bzr.py

-# bzr.py - bzr support for the convert extension
-#
-#  Copyright 2008, 2009 Marek Kubica <marek@xivilization.net> and others
-#
-# This software may be used and distributed according to the terms of the
-# GNU General Public License version 2 or any later version.
-
-# This module is for handling 'bzr', that was formerly known as Bazaar-NG;
-# it cannot access 'bar' repositories, but they were never used very much
-
-import os
-from mercurial import demandimport
-# these do not work with demandimport, blacklist
-demandimport.ignore.extend([
-        'bzrlib.transactions',
-        'bzrlib.urlutils',
-        'ElementPath',
-    ])
-
-from mercurial.i18n import _
-from mercurial import util
-from common import NoRepo, commit, converter_source
-
-try:
-    # bazaar imports
-    from bzrlib import bzrdir, revision, errors
-    from bzrlib.revisionspec import RevisionSpec
-except ImportError:
-    pass
-
-supportedkinds = ('file', 'symlink')
-
-class bzr_source(converter_source):
-    """Reads Bazaar repositories by using the Bazaar Python libraries"""
-
-    def __init__(self, ui, path, rev=None):
-        super(bzr_source, self).__init__(ui, path, rev=rev)
-
-        if not os.path.exists(os.path.join(path, '.bzr')):
-            raise NoRepo(_('%s does not look like a Bazaar repository')
-                         % path)
-
-        try:
-            # access bzrlib stuff
-            bzrdir
-        except NameError:
-            raise NoRepo(_('Bazaar modules could not be loaded'))
-
-        path = os.path.abspath(path)
-        self._checkrepotype(path)
-        try:
-            self.sourcerepo = bzrdir.BzrDir.open(path).open_repository()
-        except errors.NoRepositoryPresent:
-            raise NoRepo(_('%s does not look like a Bazaar repository')
-                         % path)
-        self._parentids = {}
-
-    def _checkrepotype(self, path):
-        # Lightweight checkouts detection is informational but probably
-        # fragile at API level. It should not terminate the conversion.
-        try:
-            from bzrlib import bzrdir
-            dir = bzrdir.BzrDir.open_containing(path)[0]
-            try:
-                tree = dir.open_workingtree(recommend_upgrade=False)
-                branch = tree.branch
-            except (errors.NoWorkingTree, errors.NotLocalUrl):
-                tree = None
-                branch = dir.open_branch()
-            if (tree is not None and tree.bzrdir.root_transport.base !=
-                branch.bzrdir.root_transport.base):
-                self.ui.warn(_('warning: lightweight checkouts may cause '
-                               'conversion failures, try with a regular '
-                               'branch instead.\n'))
-        except Exception:
-            self.ui.note(_('bzr source type could not be determined\n'))
-
-    def before(self):
-        """Before the conversion begins, acquire a read lock
-        for all the operations that might need it. Fortunately
-        read locks don't block other reads or writes to the
-        repository, so this shouldn't have any impact on the usage of
-        the source repository.
-
-        The alternative would be locking on every operation that
-        needs locks (there are currently two: getting the file and
-        getting the parent map) and releasing immediately after,
-        but this approach can take even 40% longer."""
-        self.sourcerepo.lock_read()
-
-    def after(self):
-        self.sourcerepo.unlock()
-
-    def _bzrbranches(self):
-        return self.sourcerepo.find_branches(using=True)
-
-    def getheads(self):
-        if not self.rev:
-            # Set using=True to avoid nested repositories (see issue3254)
-            heads = sorted([b.last_revision() for b in self._bzrbranches()])
-        else:
-            revid = None
-            for branch in self._bzrbranches():
-                try:
-                    r = RevisionSpec.from_string(self.rev)
-                    info = r.in_history(branch)
-                except errors.BzrError:
-                    pass
-                revid = info.rev_id
-            if revid is None:
-                raise util.Abort(_('%s is not a valid revision') % self.rev)
-            heads = [revid]
-        # Empty repositories return 'null:', which cannot be retrieved
-        heads = [h for h in heads if h != 'null:']
-        return heads
-
-    def getfile(self, name, rev):
-        revtree = self.sourcerepo.revision_tree(rev)
-        fileid = revtree.path2id(name.decode(self.encoding or 'utf-8'))
-        kind = None
-        if fileid is not None:
-            kind = revtree.kind(fileid)
-        if kind not in supportedkinds:
-            # the file is not available anymore - was deleted
-            raise IOError(_('%s is not available in %s anymore') %
-                    (name, rev))
-        mode = self._modecache[(name, rev)]
-        if kind == 'symlink':
-            target = revtree.get_symlink_target(fileid)
-            if target is None:
-                raise util.Abort(_('%s.%s symlink has no target')
-                                 % (name, rev))
-            return target, mode
-        else:
-            sio = revtree.get_file(fileid)
-            return sio.read(), mode
-
-    def getchanges(self, version):
-        # set up caches: modecache and revtree
-        self._modecache = {}
-        self._revtree = self.sourcerepo.revision_tree(version)
-        # get the parentids from the cache
-        parentids = self._parentids.pop(version)
-        # only diff against first parent id
-        prevtree = self.sourcerepo.revision_tree(parentids[0])
-        return self._gettreechanges(self._revtree, prevtree)
-
-    def getcommit(self, version):
-        rev = self.sourcerepo.get_revision(version)
-        # populate parent id cache
-        if not rev.parent_ids:
-            parents = []
-            self._parentids[version] = (revision.NULL_REVISION,)
-        else:
-            parents = self._filterghosts(rev.parent_ids)
-            self._parentids[version] = parents
-
-        branch = self.recode(rev.properties.get('branch-nick', u'default'))
-        if branch == 'trunk':
-            branch = 'default'
-        return commit(parents=parents,
-                date='%d %d' % (rev.timestamp, -rev.timezone),
-                author=self.recode(rev.committer),
-                desc=self.recode(rev.message),
-                branch=branch,
-                rev=version)
-
-    def gettags(self):
-        bytetags = {}
-        for branch in self._bzrbranches():
-            if not branch.supports_tags():
-                return {}
-            tagdict = branch.tags.get_tag_dict()
-            for name, rev in tagdict.iteritems():
-                bytetags[self.recode(name)] = rev
-        return bytetags
-
-    def getchangedfiles(self, rev, i):
-        self._modecache = {}
-        curtree = self.sourcerepo.revision_tree(rev)
-        if i is not None:
-            parentid = self._parentids[rev][i]
-        else:
-            # no parent id, get the empty revision
-            parentid = revision.NULL_REVISION
-
-        prevtree = self.sourcerepo.revision_tree(parentid)
-        changes = [e[0] for e in self._gettreechanges(curtree, prevtree)[0]]
-        return changes
-
-    def _gettreechanges(self, current, origin):
-        revid = current._revision_id
-        changes = []
-        renames = {}
-        seen = set()
-        # Process the entries by reverse lexicographic name order to
-        # handle nested renames correctly, most specific first.
-        curchanges = sorted(current.iter_changes(origin),
-                            key=lambda c: c[1][0] or c[1][1],
-                            reverse=True)
-        for (fileid, paths, changed_content, versioned, parent, name,
-            kind, executable) in curchanges:
-
-            if paths[0] == u'' or paths[1] == u'':
-                # ignore changes to tree root
-                continue
-
-            # bazaar tracks directories, mercurial does not, so
-            # we have to rename the directory contents
-            if kind[1] == 'directory':
-                if kind[0] not in (None, 'directory'):
-                    # Replacing 'something' with a directory, record it
-                    # so it can be removed.
-                    changes.append((self.recode(paths[0]), revid))
-
-                if kind[0] == 'directory' and None not in paths:
-                    renaming = paths[0] != paths[1]
-                    # neither an add nor an delete - a move
-                    # rename all directory contents manually
-                    subdir = origin.inventory.path2id(paths[0])
-                    # get all child-entries of the directory
-                    for name, entry in origin.inventory.iter_entries(subdir):
-                        # hg does not track directory renames
-                        if entry.kind == 'directory':
-                            continue
-                        frompath = self.recode(paths[0] + '/' + name)
-                        if frompath in seen:
-                            # Already handled by a more specific change entry
-                            # This is important when you have:
-                            # a => b
-                            # a/c => a/c
-                            # Here a/c must not be renamed into b/c
-                            continue
-                        seen.add(frompath)
-                        if not renaming:
-                            continue
-                        topath = self.recode(paths[1] + '/' + name)
-                        # register the files as changed
-                        changes.append((frompath, revid))
-                        changes.append((topath, revid))
-                        # add to mode cache
-                        mode = ((entry.executable and 'x')
-                                or (entry.kind == 'symlink' and 's')
-                                or '')
-                        self._modecache[(topath, revid)] = mode
-                        # register the change as move
-                        renames[topath] = frompath
-
-                # no further changes, go to the next change
-                continue
-
-            # we got unicode paths, need to convert them
-            path, topath = paths
-            if path is not None:
-                path = self.recode(path)
-            if topath is not None:
-                topath = self.recode(topath)
-            seen.add(path or topath)
-
-            if topath is None:
-                # file deleted
-                changes.append((path, revid))
-                continue
-
-            # renamed
-            if path and path != topath:
-                renames[topath] = path
-                changes.append((path, revid))
-
-            # populate the mode cache
-            kind, executable = [e[1] for e in (kind, executable)]
-            mode = ((executable and 'x') or (kind == 'symlink' and 'l')
-                    or '')
-            self._modecache[(topath, revid)] = mode
-            changes.append((topath, revid))
-
-        return changes, renames
-
-    def _filterghosts(self, ids):
-        """Filters out ghost revisions which hg does not support, see
-        <http://bazaar-vcs.org/GhostRevision>
-        """
-        parentmap = self.sourcerepo.get_parent_map(ids)
-        parents = tuple([parent for parent in ids if parent in parentmap])
-        return parents

File gaiaconv/convcmd.py

 # GNU General Public License version 2 or any later version.
 
 from common import NoRepo, MissingTool, SKIPREV, mapfile
-from cvs import convert_cvs
-from darcs import darcs_source
-from git import convert_git
 from hg import mercurial_source, mercurial_sink
-from subversion import svn_source, svn_sink
-from monotone import monotone_source
-from gnuarch import gnuarch_source
-from bzr import bzr_source
-from p4 import p4_source
 import filemap
 
 import os, shutil, shlex
         return s.decode('utf-8').encode(orig_encoding, 'replace')
 
 source_converters = [
-    ('cvs', convert_cvs, 'branchsort'),
-    ('git', convert_git, 'branchsort'),
-    ('svn', svn_source, 'branchsort'),
     ('hg', mercurial_source, 'sourcesort'),
-    ('darcs', darcs_source, 'branchsort'),
-    ('mtn', monotone_source, 'branchsort'),
-    ('gnuarch', gnuarch_source, 'branchsort'),
-    ('bzr', bzr_source, 'branchsort'),
-    ('p4', p4_source, 'branchsort'),
     ]
 
 sink_converters = [
     ('hg', mercurial_sink),
-    ('svn', svn_sink),
     ]
 
 def convertsource(ui, path, type, rev):

File gaiaconv/cvs.py

-# cvs.py: CVS conversion code inspired by hg-cvs-import and git-cvsimport
-#
-#  Copyright 2005-2009 Matt Mackall <mpm@selenic.com> and others
-#
-# This software may be used and distributed according to the terms of the
-# GNU General Public License version 2 or any later version.
-
-import os, re, socket, errno
-from cStringIO import StringIO
-from mercurial import encoding, util
-from mercurial.i18n import _
-
-from common import NoRepo, commit, converter_source, checktool
-from common import makedatetimestamp
-import cvsps
-
-class convert_cvs(converter_source):
-    def __init__(self, ui, path, rev=None):
-        super(convert_cvs, self).__init__(ui, path, rev=rev)
-
-        cvs = os.path.join(path, "CVS")
-        if not os.path.exists(cvs):
-            raise NoRepo(_("%s does not look like a CVS checkout") % path)
-
-        checktool('cvs')
-
-        self.changeset = None
-        self.files = {}
-        self.tags = {}
-        self.lastbranch = {}
-        self.socket = None
-        self.cvsroot = open(os.path.join(cvs, "Root")).read()[:-1]
-        self.cvsrepo = open(os.path.join(cvs, "Repository")).read()[:-1]
-        self.encoding = encoding.encoding
-
-        self._connect()
-
-    def _parse(self):
-        if self.changeset is not None:
-            return
-        self.changeset = {}
-
-        maxrev = 0
-        if self.rev:
-            # TODO: handle tags
-            try:
-                # patchset number?
-                maxrev = int(self.rev)
-            except ValueError:
-                raise util.Abort(_('revision %s is not a patchset number')
-                                 % self.rev)
-
-        d = os.getcwd()
-        try:
-            os.chdir(self.path)
-            id = None
-
-            cache = 'update'
-            if not self.ui.configbool('convert', 'cvsps.cache', True):
-                cache = None
-            db = cvsps.createlog(self.ui, cache=cache)
-            db = cvsps.createchangeset(self.ui, db,
-                fuzz=int(self.ui.config('convert', 'cvsps.fuzz', 60)),
-                mergeto=self.ui.config('convert', 'cvsps.mergeto', None),
-                mergefrom=self.ui.config('convert', 'cvsps.mergefrom', None))
-
-            for cs in db:
-                if maxrev and cs.id > maxrev:
-                    break
-                id = str(cs.id)
-                cs.author = self.recode(cs.author)
-                self.lastbranch[cs.branch] = id
-                cs.comment = self.recode(cs.comment)
-                if self.ui.configbool('convert', 'localtimezone'):
-                    cs.date = makedatetimestamp(cs.date[0])
-                date = util.datestr(cs.date, '%Y-%m-%d %H:%M:%S %1%2')
-                self.tags.update(dict.fromkeys(cs.tags, id))
-
-                files = {}
-                for f in cs.entries:
-                    files[f.file] = "%s%s" % ('.'.join([str(x)
-                                                        for x in f.revision]),
-                                              ['', '(DEAD)'][f.dead])
-
-                # add current commit to set
-                c = commit(author=cs.author, date=date,
-                           parents=[str(p.id) for p in cs.parents],
-                           desc=cs.comment, branch=cs.branch or '')
-                self.changeset[id] = c
-                self.files[id] = files
-
-            self.heads = self.lastbranch.values()
-        finally:
-            os.chdir(d)
-
-    def _connect(self):
-        root = self.cvsroot
-        conntype = None
-        user, host = None, None
-        cmd = ['cvs', 'server']
-
-        self.ui.status(_("connecting to %s\n") % root)
-
-        if root.startswith(":pserver:"):
-            root = root[9:]
-            m = re.match(r'(?:(.*?)(?::(.*?))?@)?([^:\/]*)(?::(\d*))?(.*)',
-                         root)
-            if m:
-                conntype = "pserver"
-                user, passw, serv, port, root = m.groups()
-                if not user:
-                    user = "anonymous"
-                if not port:
-                    port = 2401
-                else:
-                    port = int(port)
-                format0 = ":pserver:%s@%s:%s" % (user, serv, root)
-                format1 = ":pserver:%s@%s:%d%s" % (user, serv, port, root)
-
-                if not passw:
-                    passw = "A"
-                    cvspass = os.path.expanduser("~/.cvspass")
-                    try:
-                        pf = open(cvspass)
-                        for line in pf.read().splitlines():
-                            part1, part2 = line.split(' ', 1)
-                            # /1 :pserver:user@example.com:2401/cvsroot/foo
-                            # Ah<Z
-                            if part1 == '/1':
-                                part1, part2 = part2.split(' ', 1)
-                                format = format1
-                            # :pserver:user@example.com:/cvsroot/foo Ah<Z
-                            else:
-                                format = format0
-                            if part1 == format:
-                                passw = part2
-                                break
-                        pf.close()
-                    except IOError, inst:
-                        if inst.errno != errno.ENOENT:
-                            if not getattr(inst, 'filename', None):
-                                inst.filename = cvspass
-                            raise
-
-                sck = socket.socket()
-                sck.connect((serv, port))
-                sck.send("\n".join(["BEGIN AUTH REQUEST", root, user, passw,
-                                    "END AUTH REQUEST", ""]))
-                if sck.recv(128) != "I LOVE YOU\n":
-                    raise util.Abort(_("CVS pserver authentication failed"))
-
-                self.writep = self.readp = sck.makefile('r+')
-
-        if not conntype and root.startswith(":local:"):
-            conntype = "local"
-            root = root[7:]
-
-        if not conntype:
-            # :ext:user@host/home/user/path/to/cvsroot
-            if root.startswith(":ext:"):
-                root = root[5:]
-            m = re.match(r'(?:([^@:/]+)@)?([^:/]+):?(.*)', root)
-            # Do not take Windows path "c:\foo\bar" for a connection strings
-            if os.path.isdir(root) or not m:
-                conntype = "local"
-            else:
-                conntype = "rsh"
-                user, host, root = m.group(1), m.group(2), m.group(3)
-
-        if conntype != "pserver":
-            if conntype == "rsh":
-                rsh = os.environ.get("CVS_RSH") or "ssh"
-                if user:
-                    cmd = [rsh, '-l', user, host] + cmd
-                else:
-                    cmd = [rsh, host] + cmd
-
-            # popen2 does not support argument lists under Windows
-            cmd = [util.shellquote(arg) for arg in cmd]
-            cmd = util.quotecommand(' '.join(cmd))
-            self.writep, self.readp = util.popen2(cmd)
-
-        self.realroot = root
-
-        self.writep.write("Root %s\n" % root)
-        self.writep.write("Valid-responses ok error Valid-requests Mode"
-                          " M Mbinary E Checked-in Created Updated"
-                          " Merged Removed\n")
-        self.writep.write("valid-requests\n")
-        self.writep.flush()
-        r = self.readp.readline()
-        if not r.startswith("Valid-requests"):
-            raise util.Abort(_('unexpected response from CVS server '
-                               '(expected "Valid-requests", but got %r)')
-                             % r)
-        if "UseUnchanged" in r:
-            self.writep.write("UseUnchanged\n")
-            self.writep.flush()
-            r = self.readp.readline()
-
-    def getheads(self):
-        self._parse()
-        return self.heads
-
-    def getfile(self, name, rev):
-
-        def chunkedread(fp, count):
-            # file-objects returned by socket.makefile() do not handle
-            # large read() requests very well.
-            chunksize = 65536
-            output = StringIO()
-            while count > 0:
-                data = fp.read(min(count, chunksize))
-                if not data:
-                    raise util.Abort(_("%d bytes missing from remote file")
-                                     % count)
-                count -= len(data)
-                output.write(data)
-            return output.getvalue()
-
-        self._parse()
-        if rev.endswith("(DEAD)"):
-            raise IOError
-
-        args = ("-N -P -kk -r %s --" % rev).split()
-        args.append(self.cvsrepo + '/' + name)
-        for x in args:
-            self.writep.write("Argument %s\n" % x)
-        self.writep.write("Directory .\n%s\nco\n" % self.realroot)
-        self.writep.flush()
-
-        data = ""
-        mode = None
-        while True:
-            line = self.readp.readline()
-            if line.startswith("Created ") or line.startswith("Updated "):
-                self.readp.readline() # path
-                self.readp.readline() # entries
-                mode = self.readp.readline()[:-1]
-                count = int(self.readp.readline()[:-1])
-                data = chunkedread(self.readp, count)
-            elif line.startswith(" "):
-                data += line[1:]
-            elif line.startswith("M "):
-                pass
-            elif line.startswith("Mbinary "):
-                count = int(self.readp.readline()[:-1])
-                data = chunkedread(self.readp, count)
-            else:
-                if line == "ok\n":
-                    if mode is None:
-                        raise util.Abort(_('malformed response from CVS'))
-                    return (data, "x" in mode and "x" or "")
-                elif line.startswith("E "):
-                    self.ui.warn(_("cvs server: %s\n") % line[2:])
-                elif line.startswith("Remove"):
-                    self.readp.readline()
-                else:
-                    raise util.Abort(_("unknown CVS response: %s") % line)
-
-    def getchanges(self, rev):
-        self._parse()
-        return sorted(self.files[rev].iteritems()), {}
-
-    def getcommit(self, rev):
-        self._parse()
-        return self.changeset[rev]
-
-    def gettags(self):
-        self._parse()
-        return self.tags
-
-    def getchangedfiles(self, rev, i):
-        self._parse()
-        return sorted(self.files[rev])

File gaiaconv/cvsps.py

-# Mercurial built-in replacement for cvsps.
-#
-# Copyright 2008, Frank Kingswood <frank@kingswood-consulting.co.uk>
-#
-# This software may be used and distributed according to the terms of the
-# GNU General Public License version 2 or any later version.
-
-import os
-import re
-import cPickle as pickle
-from mercurial import util
-from mercurial.i18n import _
-from mercurial import hook
-from mercurial import util
-
-class logentry(object):
-    '''Class logentry has the following attributes:
-        .author    - author name as CVS knows it
-        .branch    - name of branch this revision is on
-        .branches  - revision tuple of branches starting at this revision
-        .comment   - commit message
-        .commitid  - CVS commitid or None
-        .date      - the commit date as a (time, tz) tuple
-        .dead      - true if file revision is dead
-        .file      - Name of file
-        .lines     - a tuple (+lines, -lines) or None
-        .parent    - Previous revision of this entry
-        .rcs       - name of file as returned from CVS
-        .revision  - revision number as tuple
-        .tags      - list of tags on the file
-        .synthetic - is this a synthetic "file ... added on ..." revision?
-        .mergepoint - the branch that has been merged from (if present in
-                      rlog output) or None
-        .branchpoints - the branches that start at the current entry or empty
-    '''
-    def __init__(self, **entries):
-        self.synthetic = False
-        self.__dict__.update(entries)
-
-    def __repr__(self):
-        items = ("%s=%r"%(k, self.__dict__[k]) for k in sorted(self.__dict__))
-        return "%s(%s)"%(type(self).__name__, ", ".join(items))
-
-class logerror(Exception):
-    pass
-
-def getrepopath(cvspath):
-    """Return the repository path from a CVS path.
-
-    >>> getrepopath('/foo/bar')
-    '/foo/bar'
-    >>> getrepopath('c:/foo/bar')
-    '/foo/bar'
-    >>> getrepopath(':pserver:10/foo/bar')
-    '/foo/bar'
-    >>> getrepopath(':pserver:10c:/foo/bar')
-    '/foo/bar'
-    >>> getrepopath(':pserver:/foo/bar')
-    '/foo/bar'
-    >>> getrepopath(':pserver:c:/foo/bar')
-    '/foo/bar'
-    >>> getrepopath(':pserver:truc@foo.bar:/foo/bar')
-    '/foo/bar'
-    >>> getrepopath(':pserver:truc@foo.bar:c:/foo/bar')
-    '/foo/bar'
-    >>> getrepopath('user@server/path/to/repository')
-    '/path/to/repository'
-    """
-    # According to CVS manual, CVS paths are expressed like:
-    # [:method:][[user][:password]@]hostname[:[port]]/path/to/repository
-    #
-    # CVSpath is splitted into parts and then position of the first occurrence
-    # of the '/' char after the '@' is located. The solution is the rest of the
-    # string after that '/' sign including it
-
-    parts = cvspath.split(':')
-    atposition = parts[-1].find('@')
-    start = 0
-
-    if atposition != -1:
-        start = atposition
-
-    repopath = parts[-1][parts[-1].find('/', start):]
-    return repopath
-
-def createlog(ui, directory=None, root="", rlog=True, cache=None):
-    '''Collect the CVS rlog'''
-
-    # Because we store many duplicate commit log messages, reusing strings
-    # saves a lot of memory and pickle storage space.
-    _scache = {}
-    def scache(s):
-        "return a shared version of a string"
-        return _scache.setdefault(s, s)
-
-    ui.status(_('collecting CVS rlog\n'))
-
-    log = []      # list of logentry objects containing the CVS state
-
-    # patterns to match in CVS (r)log output, by state of use
-    re_00 = re.compile('RCS file: (.+)$')
-    re_01 = re.compile('cvs \\[r?log aborted\\]: (.+)$')
-    re_02 = re.compile('cvs (r?log|server): (.+)\n$')
-    re_03 = re.compile("(Cannot access.+CVSROOT)|"
-                       "(can't create temporary directory.+)$")
-    re_10 = re.compile('Working file: (.+)$')
-    re_20 = re.compile('symbolic names:')
-    re_30 = re.compile('\t(.+): ([\\d.]+)$')
-    re_31 = re.compile('----------------------------$')
-    re_32 = re.compile('======================================='
-                       '======================================$')
-    re_50 = re.compile('revision ([\\d.]+)(\s+locked by:\s+.+;)?$')
-    re_60 = re.compile(r'date:\s+(.+);\s+author:\s+(.+);\s+state:\s+(.+?);'
-                       r'(\s+lines:\s+(\+\d+)?\s+(-\d+)?;)?'
-                       r'(\s+commitid:\s+([^;]+);)?'
-                       r'(.*mergepoint:\s+([^;]+);)?')
-    re_70 = re.compile('branches: (.+);$')
-
-    file_added_re = re.compile(r'file [^/]+ was (initially )?added on branch')
-
-    prefix = ''   # leading path to strip of what we get from CVS
-
-    if directory is None:
-        # Current working directory
-
-        # Get the real directory in the repository
-        try:
-            prefix = open(os.path.join('CVS','Repository')).read().strip()
-            directory = prefix
-            if prefix == ".":
-                prefix = ""
-        except IOError:
-            raise logerror(_('not a CVS sandbox'))
-
-        if prefix and not prefix.endswith(os.sep):
-            prefix += os.sep
-
-        # Use the Root file in the sandbox, if it exists
-        try:
-            root = open(os.path.join('CVS','Root')).read().strip()
-        except IOError:
-            pass
-
-    if not root:
-        root = os.environ.get('CVSROOT', '')
-
-    # read log cache if one exists
-    oldlog = []
-    date = None
-
-    if cache:
-        cachedir = os.path.expanduser('~/.hg.cvsps')
-        if not os.path.exists(cachedir):
-            os.mkdir(cachedir)
-
-        # The cvsps cache pickle needs a uniquified name, based on the
-        # repository location. The address may have all sort of nasties
-        # in it, slashes, colons and such. So here we take just the
-        # alphanumeric characters, concatenated in a way that does not
-        # mix up the various components, so that
-        #    :pserver:user@server:/path
-        # and
-        #    /pserver/user/server/path
-        # are mapped to different cache file names.
-        cachefile = root.split(":") + [directory, "cache"]
-        cachefile = ['-'.join(re.findall(r'\w+', s)) for s in cachefile if s]
-        cachefile = os.path.join(cachedir,
-                                 '.'.join([s for s in cachefile if s]))
-
-    if cache == 'update':
-        try:
-            ui.note(_('reading cvs log cache %s\n') % cachefile)
-            oldlog = pickle.load(open(cachefile))
-            for e in oldlog:
-                if not (util.safehasattr(e, 'branchpoints') and
-                        util.safehasattr(e, 'commitid') and
-                        util.safehasattr(e, 'mergepoint')):
-                    ui.status(_('ignoring old cache\n'))
-                    oldlog = []
-                    break
-
-            ui.note(_('cache has %d log entries\n') % len(oldlog))
-        except Exception, e:
-            ui.note(_('error reading cache: %r\n') % e)
-
-        if oldlog:
-            date = oldlog[-1].date    # last commit date as a (time,tz) tuple
-            date = util.datestr(date, '%Y/%m/%d %H:%M:%S %1%2')
-
-    # build the CVS commandline
-    cmd = ['cvs', '-q']
-    if root:
-        cmd.append('-d%s' % root)
-        p = util.normpath(getrepopath(root))
-        if not p.endswith('/'):
-            p += '/'
-        if prefix:
-            # looks like normpath replaces "" by "."
-            prefix = p + util.normpath(prefix)
-        else:
-            prefix = p
-    cmd.append(['log', 'rlog'][rlog])
-    if date:
-        # no space between option and date string
-        cmd.append('-d>%s' % date)
-    cmd.append(directory)
-
-    # state machine begins here
-    tags = {}     # dictionary of revisions on current file with their tags
-    branchmap = {} # mapping between branch names and revision numbers
-    state = 0
-    store = False # set when a new record can be appended
-
-    cmd = [util.shellquote(arg) for arg in cmd]
-    ui.note(_("running %s\n") % (' '.join(cmd)))
-    ui.debug("prefix=%r directory=%r root=%r\n" % (prefix, directory, root))
-
-    pfp = util.popen(' '.join(cmd))
-    peek = pfp.readline()
-    while True:
-        line = peek
-        if line == '':
-            break
-        peek = pfp.readline()
-        if line.endswith('\n'):
-            line = line[:-1]
-        #ui.debug('state=%d line=%r\n' % (state, line))
-
-        if state == 0:
-            # initial state, consume input until we see 'RCS file'
-            match = re_00.match(line)
-            if match:
-                rcs = match.group(1)
-                tags = {}
-                if rlog:
-                    filename = util.normpath(rcs[:-2])
-                    if filename.startswith(prefix):
-                        filename = filename[len(prefix):]
-                    if filename.startswith('/'):
-                        filename = filename[1:]
-                    if filename.startswith('Attic/'):
-                        filename = filename[6:]
-                    else:
-                        filename = filename.replace('/Attic/', '/')
-                    state = 2
-                    continue
-                state = 1
-                continue
-            match = re_01.match(line)
-            if match:
-                raise logerror(match.group(1))
-            match = re_02.match(line)
-            if match:
-                raise logerror(match.group(2))
-            if re_03.match(line):
-                raise logerror(line)
-
-        elif state == 1:
-            # expect 'Working file' (only when using log instead of rlog)
-            match = re_10.match(line)
-            assert match, _('RCS file must be followed by working file')
-            filename = util.normpath(match.group(1))
-            state = 2
-
-        elif state == 2:
-            # expect 'symbolic names'
-            if re_20.match(line):
-                branchmap = {}
-                state = 3
-
-        elif state == 3:
-            # read the symbolic names and store as tags
-            match = re_30.match(line)
-            if match:
-                rev = [int(x) for x in match.group(2).split('.')]
-
-                # Convert magic branch number to an odd-numbered one
-                revn = len(rev)
-                if revn > 3 and (revn % 2) == 0 and rev[-2] == 0:
-                    rev = rev[:-2] + rev[-1:]
-                rev = tuple(rev)
-
-                if rev not in tags:
-                    tags[rev] = []
-                tags[rev].append(match.group(1))
-                branchmap[match.group(1)] = match.group(2)
-
-            elif re_31.match(line):
-                state = 5
-            elif re_32.match(line):
-                state = 0
-
-        elif state == 4:
-            # expecting '------' separator before first revision
-            if re_31.match(line):
-                state = 5
-            else:
-                assert not re_32.match(line), _('must have at least '
-                                                'some revisions')
-
-        elif state == 5:
-            # expecting revision number and possibly (ignored) lock indication
-            # we create the logentry here from values stored in states 0 to 4,
-            # as this state is re-entered for subsequent revisions of a file.
-            match = re_50.match(line)
-            assert match, _('expected revision number')
-            e = logentry(rcs=scache(rcs),
-                         file=scache(filename),
-                         revision=tuple([int(x) for x in
-                                         match.group(1).split('.')]),
-                         branches=[],
-                         parent=None,
-                         commitid=None,
-                         mergepoint=None,
-                         branchpoints=set())
-
-            state = 6
-
-        elif state == 6:
-            # expecting date, author, state, lines changed
-            match = re_60.match(line)
-            assert match, _('revision must be followed by date line')
-            d = match.group(1)
-            if d[2] == '/':
-                # Y2K
-                d = '19' + d
-
-            if len(d.split()) != 3:
-                # cvs log dates always in GMT
-                d = d + ' UTC'
-            e.date = util.parsedate(d, ['%y/%m/%d %H:%M:%S',
-                                        '%Y/%m/%d %H:%M:%S',
-                                        '%Y-%m-%d %H:%M:%S'])
-            e.author = scache(match.group(2))
-            e.dead = match.group(3).lower() == 'dead'
-
-            if match.group(5):
-                if match.group(6):
-                    e.lines = (int(match.group(5)), int(match.group(6)))
-                else:
-                    e.lines = (int(match.group(5)), 0)
-            elif match.group(6):
-                e.lines = (0, int(match.group(6)))
-            else:
-                e.lines = None
-
-            if match.group(7): # cvs 1.12 commitid
-                e.commitid = match.group(8)
-
-            if match.group(9): # cvsnt mergepoint
-                myrev = match.group(10).split('.')
-                if len(myrev) == 2: # head
-                    e.mergepoint = 'HEAD'
-                else:
-                    myrev = '.'.join(myrev[:-2] + ['0', myrev[-2]])
-                    branches = [b for b in branchmap if branchmap[b] == myrev]
-                    assert len(branches) == 1, ('unknown branch: %s'
-                                                % e.mergepoint)
-                    e.mergepoint = branches[0]
-
-            e.comment = []
-            state = 7
-
-        elif state == 7:
-            # read the revision numbers of branches that start at this revision
-            # or store the commit log message otherwise
-            m = re_70.match(line)
-            if m:
-                e.branches = [tuple([int(y) for y in x.strip().split('.')])
-                                for x in m.group(1).split(';')]
-                state = 8
-            elif re_31.match(line) and re_50.match(peek):
-                state = 5
-                store = True
-            elif re_32.match(line):
-                state = 0
-                store = True
-            else:
-                e.comment.append(line)
-
-        elif state == 8:
-            # store commit log message
-            if re_31.match(line):
-                cpeek = peek
-                if cpeek.endswith('\n'):
-                    cpeek = cpeek[:-1]
-                if re_50.match(cpeek):
-                    state = 5
-                    store = True
-                else:
-                    e.comment.append(line)
-            elif re_32.match(line):
-                state = 0
-                store = True
-            else:
-                e.comment.append(line)
-
-        # When a file is added on a branch B1, CVS creates a synthetic
-        # dead trunk revision 1.1 so that the branch has a root.
-        # Likewise, if you merge such a file to a later branch B2 (one
-        # that already existed when the file was added on B1), CVS
-        # creates a synthetic dead revision 1.1.x.1 on B2.  Don't drop
-        # these revisions now, but mark them synthetic so
-        # createchangeset() can take care of them.
-        if (store and
-              e.dead and
-              e.revision[-1] == 1 and      # 1.1 or 1.1.x.1
-              len(e.comment) == 1 and
-              file_added_re.match(e.comment[0])):
-            ui.debug('found synthetic revision in %s: %r\n'
-                     % (e.rcs, e.comment[0]))
-            e.synthetic = True
-
-        if store:
-            # clean up the results and save in the log.
-            store = False
-            e.tags = sorted([scache(x) for x in tags.get(e.revision, [])])
-            e.comment = scache('\n'.join(e.comment))
-
-            revn = len(e.revision)
-            if revn > 3 and (revn % 2) == 0:
-                e.branch = tags.get(e.revision[:-1], [None])[0]
-            else:
-                e.branch = None
-
-            # find the branches starting from this revision
-            branchpoints = set()
-            for branch, revision in branchmap.iteritems():
-                revparts = tuple([int(i) for i in revision.split('.')])
-                if len(revparts) < 2: # bad tags
-                    continue
-                if revparts[-2] == 0 and revparts[-1] % 2 == 0:
-                    # normal branch
-                    if revparts[:-2] == e.revision:
-                        branchpoints.add(branch)
-                elif revparts == (1, 1, 1): # vendor branch
-                    if revparts in e.branches:
-                        branchpoints.add(branch)
-            e.branchpoints = branchpoints
-
-            log.append(e)
-
-            if len(log) % 100 == 0:
-                ui.status(util.ellipsis('%d %s' % (len(log), e.file), 80)+'\n')
-
-    log.sort(key=lambda x: (x.rcs, x.revision))
-
-    # find parent revisions of individual files
-    versions = {}
-    for e in log:
-        branch = e.revision[:-1]
-        p = versions.get((e.rcs, branch), None)
-        if p is None:
-            p = e.revision[:-2]
-        e.parent = p
-        versions[(e.rcs, branch)] = e.revision
-
-    # update the log cache
-    if cache:
-        if log:
-            # join up the old and new logs
-            log.sort(key=lambda x: x.date)
-
-            if oldlog and oldlog[-1].date >= log[0].date:
-                raise logerror(_('log cache overlaps with new log entries,'
-                                 ' re-run without cache.'))
-
-            log = oldlog + log
-
-            # write the new cachefile
-            ui.note(_('writing cvs log cache %s\n') % cachefile)
-            pickle.dump(log, open(cachefile, 'w'))
-        else:
-            log = oldlog
-
-    ui.status(_('%d log entries\n') % len(log))
-
-    hook.hook(ui, None, "cvslog", True, log=log)
-
-    return log
-
-
-class changeset(object):
-    '''Class changeset has the following attributes:
-        .id        - integer identifying this changeset (list index)
-        .author    - author name as CVS knows it
-        .branch    - name of branch this changeset is on, or None
-        .comment   - commit message
-        .commitid  - CVS commitid or None
-        .date      - the commit date as a (time,tz) tuple
-        .entries   - list of logentry objects in this changeset
-        .parents   - list of one or two parent changesets
-        .tags      - list of tags on this changeset
-        .synthetic - from synthetic revision "file ... added on branch ..."
-        .mergepoint- the branch that has been merged from or None
-        .branchpoints- the branches that start at the current entry or empty
-    '''
-    def __init__(self, **entries):
-        self.synthetic = False
-        self.__dict__.update(entries)
-
-    def __repr__(self):
-        items = ("%s=%r"%(k, self.__dict__[k]) for k in sorted(self.__dict__))
-        return "%s(%s)"%(type(self).__name__, ", ".join(items))
-
-def createchangeset(ui, log, fuzz=60, mergefrom=None, mergeto=None):
-    '''Convert log into changesets.'''
-
-    ui.status(_('creating changesets\n'))
-
-    # try to order commitids by date
-    mindate = {}
-    for e in log:
-        if e.commitid:
-            mindate[e.commitid] = min(e.date, mindate.get(e.commitid))
-
-    # Merge changesets
-    log.sort(key=lambda x: (mindate.get(x.commitid), x.commitid, x.comment,
-                            x.author, x.branch, x.date, x.branchpoints))
-
-    changesets = []
-    files = set()
-    c = None
-    for i, e in enumerate(log):
-
-        # Check if log entry belongs to the current changeset or not.
-
-        # Since CVS is file-centric, two different file revisions with
-        # different branchpoints should be treated as belonging to two
-        # different changesets (and the ordering is important and not
-        # honoured by cvsps at this point).
-        #
-        # Consider the following case:
-        # foo 1.1 branchpoints: [MYBRANCH]
-        # bar 1.1 branchpoints: [MYBRANCH, MYBRANCH2]
-        #
-        # Here foo is part only of MYBRANCH, but not MYBRANCH2, e.g. a
-        # later version of foo may be in MYBRANCH2, so foo should be the
-        # first changeset and bar the next and MYBRANCH and MYBRANCH2
-        # should both start off of the bar changeset. No provisions are
-        # made to ensure that this is, in fact, what happens.
-        if not (c and e.branchpoints == c.branchpoints and
-                (# cvs commitids
-                 (e.commitid is not None and e.commitid == c.commitid) or
-                 (# no commitids, use fuzzy commit detection
-                  (e.commitid is None or c.commitid is None) and
-                   e.comment == c.comment and
-                   e.author == c.author and
-                   e.branch == c.branch and
-                   ((c.date[0] + c.date[1]) <=
-                    (e.date[0] + e.date[1]) <=
-                    (c.date[0] + c.date[1]) + fuzz) and
-                   e.file not in files))):
-            c = changeset(comment=e.comment, author=e.author,
-                          branch=e.branch, date=e.date,
-                          entries=[], mergepoint=e.mergepoint,
-                          branchpoints=e.branchpoints, commitid=e.commitid)
-            changesets.append(c)
-
-            files = set()
-            if len(changesets) % 100 == 0:
-                t = '%d %s' % (len(changesets), repr(e.comment)[1:-1])
-                ui.status(util.ellipsis(t, 80) + '\n')
-
-        c.entries.append(e)
-        files.add(e.file)
-        c.date = e.date       # changeset date is date of latest commit in it
-
-    # Mark synthetic changesets
-
-    for c in changesets:
-        # Synthetic revisions always get their own changeset, because
-        # the log message includes the filename.  E.g. if you add file3
-        # and file4 on a branch, you get four log entries and three
-        # changesets:
-        #   "File file3 was added on branch ..." (synthetic, 1 entry)
-        #   "File file4 was added on branch ..." (synthetic, 1 entry)
-        #   "Add file3 and file4 to fix ..."     (real, 2 entries)
-        # Hence the check for 1 entry here.
-        c.synthetic = len(c.entries) == 1 and c.entries[0].synthetic
-
-    # Sort files in each changeset
-
-    def entitycompare(l, r):
-        'Mimic cvsps sorting order'
-        l = l.file.split('/')
-        r = r.file.split('/')
-        nl = len(l)
-        nr = len(r)
-        n = min(nl, nr)
-        for i in range(n):
-            if i + 1 == nl and nl < nr:
-                return -1
-            elif i + 1 == nr and nl > nr:
-                return +1
-            elif l[i] < r[i]:
-                return -1
-            elif l[i] > r[i]:
-                return +1
-        return 0
-
-    for c in changesets:
-        c.entries.sort(entitycompare)
-
-    # Sort changesets by date
-
-    def cscmp(l, r):
-        d = sum(l.date) - sum(r.date)
-        if d:
-            return d
-
-        # detect vendor branches and initial commits on a branch
-        le = {}
-        for e in l.entries:
-            le[e.rcs] = e.revision
-        re = {}
-        for e in r.entries:
-            re[e.rcs] = e.revision
-
-        d = 0
-        for e in l.entries:
-            if re.get(e.rcs, None) == e.parent:
-                assert not d
-                d = 1
-                break
-
-        for e in r.entries:
-            if le.get(e.rcs, None) == e.parent:
-                assert not d
-                d = -1
-                break
-
-        return d
-
-    changesets.sort(cscmp)
-
-    # Collect tags
-
-    globaltags = {}
-    for c in changesets:
-        for e in c.entries:
-            for tag in e.tags:
-                # remember which is the latest changeset to have this tag
-                globaltags[tag] = c
-
-    for c in changesets:
-        tags = set()
-        for e in c.entries:
-            tags.update(e.tags)
-        # remember tags only if this is the latest changeset to have it
-        c.tags = sorted(tag for tag in tags if globaltags[tag] is c)
-
-    # Find parent changesets, handle {{mergetobranch BRANCHNAME}}
-    # by inserting dummy changesets with two parents, and handle
-    # {{mergefrombranch BRANCHNAME}} by setting two parents.
-
-    if mergeto is None:
-        mergeto = r'{{mergetobranch ([-\w]+)}}'
-    if mergeto:
-        mergeto = re.compile(mergeto)
-
-    if mergefrom is None:
-        mergefrom = r'{{mergefrombranch ([-\w]+)}}'
-    if mergefrom:
-        mergefrom = re.compile(mergefrom)
-
-    versions = {}    # changeset index where we saw any particular file version
-    branches = {}    # changeset index where we saw a branch
-    n = len(changesets)
-    i = 0
-    while i < n:
-        c = changesets[i]
-
-        for f in c.entries:
-            versions[(f.rcs, f.revision)] = i
-
-        p = None
-        if c.branch in branches:
-            p = branches[c.branch]
-        else:
-            # first changeset on a new branch
-            # the parent is a changeset with the branch in its
-            # branchpoints such that it is the latest possible
-            # commit without any intervening, unrelated commits.
-
-            for candidate in xrange(i):
-                if c.branch not in changesets[candidate].branchpoints:
-                    if p is not None:
-                        break
-                    continue
-                p = candidate
-
-        c.parents = []
-        if p is not None:
-            p = changesets[p]
-
-            # Ensure no changeset has a synthetic changeset as a parent.
-            while p.synthetic:
-                assert len(p.parents) <= 1, \
-                       _('synthetic changeset cannot have multiple parents')
-                if p.parents:
-                    p = p.parents[0]
-                else:
-                    p = None
-                    break
-
-            if p is not None:
-                c.parents.append(p)
-
-        if c.mergepoint:
-            if c.mergepoint == 'HEAD':
-                c.mergepoint = None
-            c.parents.append(changesets[branches[c.mergepoint]])
-
-        if mergefrom:
-            m = mergefrom.search(c.comment)
-            if m:
-                m = m.group(1)
-                if m == 'HEAD':
-                    m = None
-                try:
-                    candidate = changesets[branches[m]]
-                except KeyError:
-                    ui.warn(_("warning: CVS commit message references "
-                              "non-existent branch %r:\n%s\n")
-                            % (m, c.comment))
-                if m in branches and c.branch != m and not candidate.synthetic:
-                    c.parents.append(candidate)
-
-        if mergeto:
-            m = mergeto.search(c.comment)
-            if m:
-                if m.groups():
-                    m = m.group(1)
-                    if m == 'HEAD':
-                        m = None
-                else:
-                    m = None   # if no group found then merge to HEAD
-                if m in branches and c.branch != m:
-                    # insert empty changeset for merge
-                    cc = changeset(
-                        author=c.author, branch=m, date=c.date,
-                        comment='convert-repo: CVS merge from branch %s'
-                        % c.branch,
-                        entries=[], tags=[],
-                        parents=[changesets[branches[m]], c])
-                    changesets.insert(i + 1, cc)
-                    branches[m] = i + 1
-
-                    # adjust our loop counters now we have inserted a new entry
-                    n += 1
-                    i += 2
-                    continue
-
-        branches[c.branch] = i
-        i += 1
-
-    # Drop synthetic changesets (safe now that we have ensured no other
-    # changesets can have them as parents).
-    i = 0
-    while i < len(changesets):
-        if changesets[i].synthetic:
-            del changesets[i]
-        else:
-            i += 1
-
-    # Number changesets
-
-    for i, c in enumerate(changesets):
-        c.id = i + 1
-
-    ui.status(_('%d changeset entries\n') % len(changesets))
-
-    hook.hook(ui, None, "cvschangesets", True, changesets=changesets)
-
-    return changesets
-
-
-def debugcvsps(ui, *args, **opts):
-    '''Read CVS rlog for current directory or named path in
-    repository, and convert the log to changesets based on matching
-    commit log entries and dates.
-    '''
-    if opts["new_cache"]:
-        cache = "write"
-    elif opts["update_cache"]:
-        cache = "update"
-    else:
-        cache = None
-
-    revisions = opts["revisions"]
-
-    try:
-        if args:
-            log = []
-            for d in args:
-                log += createlog(ui, d, root=opts["root"], cache=cache)
-        else:
-            log = createlog(ui, root=opts["root"], cache=cache)
-    except logerror, e:
-        ui.write("%r\n"%e)
-        return
-
-    changesets = createchangeset(ui, log, opts["fuzz"])
-    del log
-
-    # Print changesets (optionally filtered)
-
-    off = len(revisions)
-    branches = {}    # latest version number in each branch
-    ancestors = {}   # parent branch
-    for cs in changesets:
-
-        if opts["ancestors"]:
-            if cs.branch not in branches and cs.parents and cs.parents[0].id:
-                ancestors[cs.branch] = (changesets[cs.parents[0].id - 1].branch,
-                                        cs.parents[0].id)
-            branches[cs.branch] = cs.id
-
-        # limit by branches
-        if opts["branches"] and (cs.branch or 'HEAD') not in opts["branches"]:
-            continue
-
-        if not off:
-            # Note: trailing spaces on several lines here are needed to have
-            #       bug-for-bug compatibility with cvsps.
-            ui.write('---------------------\n')
-            ui.write(('PatchSet %d \n' % cs.id))
-            ui.write(('Date: %s\n' % util.datestr(cs.date,
-                                                 '%Y/%m/%d %H:%M:%S %1%2')))
-            ui.write(('Author: %s\n' % cs.author))
-            ui.write(('Branch: %s\n' % (cs.branch or 'HEAD')))
-            ui.write(('Tag%s: %s \n' % (['', 's'][len(cs.tags) > 1],
-                                  ','.join(cs.tags) or '(none)')))
-            if cs.branchpoints:
-                ui.write(('Branchpoints: %s \n') %
-                         ', '.join(sorted(cs.branchpoints)))
-            if opts["parents"] and cs.parents:
-                if len(cs.parents) > 1:
-                    ui.write(('Parents: %s\n' %
-                             (','.join([str(p.id) for p in cs.parents]))))
-                else:
-                    ui.write(('Parent: %d\n' % cs.parents[0].id))
-
-            if opts["ancestors"]:
-                b = cs.branch
-                r = []
-                while b:
-                    b, c = ancestors[b]
-                    r.append('%s:%d:%d' % (b or "HEAD", c, branches[b]))
-                if r:
-                    ui.write(('Ancestors: %s\n' % (','.join(r))))
-
-            ui.write(('Log:\n'))
-            ui.write('%s\n\n' % cs.comment)
-            ui.write(('Members: \n'))
-            for f in cs.entries:
-                fn = f.file
-                if fn.startswith(opts["prefix"]):
-                    fn = fn[len(opts["prefix"]):]
-                ui.write('\t%s:%s->%s%s \n' % (
-                        fn, '.'.join([str(x) for x in f.parent]) or 'INITIAL',
-                        '.'.join([str(x) for x in f.revision]),
-                        ['', '(DEAD)'][f.dead]))
-            ui.write('\n')
-
-        # have we seen the start tag?
-        if revisions and off:
-            if revisions[0] == str(cs.id) or \
-                revisions[0] in cs.tags:
-                off = False
-
-        # see if we reached the end tag
-        if len(revisions) > 1 and not off:
-            if revisions[1] == str(cs.id) or \
-                revisions[1] in cs.tags:
-                break

File gaiaconv/darcs.py

-# darcs.py - darcs support for the convert extension
-#
-#  Copyright 2007-2009 Matt Mackall <mpm@selenic.com> and others
-#
-# This software may be used and distributed according to the terms of the
-# GNU General Public License version 2 or any later version.
-
-from common import NoRepo, checktool, commandline, commit, converter_source
-from mercurial.i18n import _
-from mercurial import util
-import os, shutil, tempfile, re
-
-# The naming drift of ElementTree is fun!
-
-try:
-    from xml.etree.cElementTree import ElementTree, XMLParser
-except ImportError:
-    try:
-        from xml.etree.ElementTree import ElementTree, XMLParser
-    except ImportError:
-        try:
-            from elementtree.cElementTree import ElementTree, XMLParser
-        except ImportError:
-            try:
-                from elementtree.ElementTree import ElementTree, XMLParser
-            except ImportError:
-                pass
-
-class darcs_source(converter_source, commandline):
-    def __init__(self, ui, path, rev=None):
-        converter_source.__init__(self, ui, path, rev=rev)
-        commandline.__init__(self, ui, 'darcs')
-
-        # check for _darcs, ElementTree so that we can easily skip
-        # test-convert-darcs if ElementTree is not around
-        if not os.path.exists(os.path.join(path, '_darcs')):
-            raise NoRepo(_("%s does not look like a darcs repository") % path)
-
-        checktool('darcs')
-        version = self.run0('--version').splitlines()[0].strip()
-        if version < '2.1':
-            raise util.Abort(_('darcs version 2.1 or newer needed (found %r)') %
-                             version)
-
-        if "ElementTree" not in globals():
-            raise util.Abort(_("Python ElementTree module is not available"))
-
-        self.path = os.path.realpath(path)
-
-        self.lastrev = None
-        self.changes = {}
-        self.parents = {}
-        self.tags = {}
-
-        # Check darcs repository format
-        format = self.format()
-        if format:
-            if format in ('darcs-1.0', 'hashed'):
-                raise NoRepo(_("%s repository format is unsupported, "
-                               "please upgrade") % format)
-        else:
-            self.ui.warn(_('failed to detect repository format!'))
-
-    def before(self):
-        self.tmppath = tempfile.mkdtemp(
-            prefix='convert-' + os.path.basename(self.path) + '-')
-        output, status = self.run('init', repodir=self.tmppath)
-        self.checkexit(status)
-
-        tree = self.xml('changes', xml_output=True, summary=True,
-                        repodir=self.path)
-        tagname = None
-        child = None
-        for elt in tree.findall('patch'):
-            node = elt.get('hash')
-            name = elt.findtext('name', '')
-            if name.startswith('TAG '):
-                tagname = name[4:].strip()
-            elif tagname is not None:
-                self.tags[tagname] = node
-                tagname = None
-            self.changes[node] = elt
-            self.parents[child] = [node]
-            child = node
-        self.parents[child] = []
-
-    def after(self):
-        self.ui.debug('cleaning up %s\n' % self.tmppath)
-        shutil.rmtree(self.tmppath, ignore_errors=True)
-
-    def recode(self, s, encoding=None):
-        if isinstance(s, unicode):
-            # XMLParser returns unicode objects for anything it can't
-            # encode into ASCII. We convert them back to str to get
-            # recode's normal conversion behavior.
-            s = s.encode('latin-1')
-        return super(darcs_source, self).recode(s, encoding)
-
-    def xml(self, cmd, **kwargs):
-        # NOTE: darcs is currently encoding agnostic and will print
-        # patch metadata byte-for-byte, even in the XML changelog.
-        etree = ElementTree()
-        # While we are decoding the XML as latin-1 to be as liberal as
-        # possible, etree will still raise an exception if any
-        # non-printable characters are in the XML changelog.
-        parser = XMLParser(encoding='latin-1')
-        p = self._run(cmd, **kwargs)
-        etree.parse(p.stdout, parser=parser)
-        p.wait()
-        self.checkexit(p.returncode)
-        return etree.getroot()
-
-    def format(self):
-        output, status = self.run('show', 'repo', no_files=True,
-                                  repodir=self.path)
-        self.checkexit(status)
-        m = re.search(r'^\s*Format:\s*(.*)$', output, re.MULTILINE)
-        if not m:
-            return None
-        return ','.join(sorted(f.strip() for f in m.group(1).split(',')))
-
-    def manifest(self):
-        man = []
-        output, status = self.run('show', 'files', no_directories=True,
-                                  repodir=self.tmppath)
-        self.checkexit(status)
-        for line in output.split('\n'):
-            path = line[2:]
-            if path:
-                man.append(path)
-        return man
-
-    def getheads(self):
-        return self.parents[None]
-
-    def getcommit(self, rev):
-        elt = self.changes[rev]
-        date = util.strdate(elt.get('local_date'), '%a %b %d %H:%M:%S %Z %Y')
-        desc = elt.findtext('name') + '\n' + elt.findtext('comment', '')
-        # etree can return unicode objects for name, comment, and author,
-        # so recode() is used to ensure str objects are emitted.
-        return commit(author=self.recode(elt.get('author')),
-                      date=util.datestr(date, '%Y-%m-%d %H:%M:%S %1%2'),
-                      desc=self.recode(desc).strip(),
-                      parents=self.parents[rev])
-
-    def pull(self, rev):
-        output, status = self.run('pull', self.path, all=True,
-                                  match='hash %s' % rev,
-                                  no_test=True, no_posthook=True,
-                                  external_merge='/bin/false',
-                                  repodir=self.tmppath)
-        if status:
-            if output.find('We have conflicts in') == -1:
-                self.checkexit(status, output)
-            output, status = self.run('revert', all=True, repodir=self.tmppath)
-            self.checkexit(status, output)
-
-    def getchanges(self, rev):
-        copies = {}
-        changes = []
-        man = None
-        for elt in self.changes[rev].find('summary').getchildren():
-            if elt.tag in ('add_directory', 'remove_directory'):
-                continue
-            if elt.tag == 'move':
-                if man is None:
-                    man = self.manifest()
-                source, dest = elt.get('from'), elt.get('to')
-                if source in man:
-                    # File move
-                    changes.append((source, rev))
-                    changes.append((dest, rev))
-                    copies[dest] = source
-                else:
-                    # Directory move, deduce file moves from manifest
-                    source = source + '/'
-                    for f in man:
-                        if not f.startswith(source):
-                            continue
-                        fdest = dest + '/' + f[len(source):]
-                        changes.append((f, rev))
-                        changes.append((fdest, rev))
-                        copies[fdest] = f
-            else:
-                changes.append((elt.text.strip(), rev))
-        self.pull(rev)
-        self.lastrev = rev
-        return sorted(changes), copies
-
-    def getfile(self, name, rev):
-        if rev != self.lastrev:
-            raise util.Abort(_('internal calling inconsistency'))
-        path = os.path.join(self.tmppath, name)
-        data = util.readfile(path)
-        mode = os.lstat(path).st_mode
-        mode = (mode & 0111) and 'x' or ''
-        return data, mode
-
-    def gettags(self):
-        return self.tags

File gaiaconv/gnuarch.py

-# gnuarch.py - GNU Arch support for the convert extension
-#
-#  Copyright 2008, 2009 Aleix Conchillo Flaque <aleix@member.fsf.org>
-#  and others
-#
-# This software may be used and distributed according to the terms of the
-# GNU General Public License version 2 or any later version.
-
-from common import NoRepo, commandline, commit, converter_source
-from mercurial.i18n import _
-from mercurial import encoding, util
-import os, shutil, tempfile, stat
-from email.Parser import Parser
-
-class gnuarch_source(converter_source, commandline):
-
-    class gnuarch_rev(object):
-        def __init__(self, rev):
-            self.rev = rev
-            self.summary = ''
-            self.date = None
-            self.author = ''
-            self.continuationof = None
-            self.add_files = []
-            self.mod_files = []
-            self.del_files = []
-            self.ren_files = {}
-            self.ren_dirs = {}
-
-    def __init__(self, ui, path, rev=None):
-        super(gnuarch_source, self).__init__(ui, path, rev=rev)
-
-        if not os.path.exists(os.path.join(path, '{arch}')):
-            raise NoRepo(_("%s does not look like a GNU Arch repository")
-                         % path)
-
-        # Could use checktool, but we want to check for baz or tla.
-        self.execmd = None
-        if util.findexe('baz'):
-            self.execmd = 'baz'
-        else:
-            if util.findexe('tla'):
-                self.execmd = 'tla'
-            else:
-                raise util.Abort(_('cannot find a GNU Arch tool'))
-
-        commandline.__init__(self, ui, self.execmd)
-
-        self.path = os.path.realpath(path)
-        self.tmppath = None
-
-        self.treeversion = None
-        self.lastrev = None
-        self.changes = {}
-        self.parents = {}
-        self.tags = {}
-        self.catlogparser = Parser()
-        self.encoding = encoding.encoding
-        self.archives = []
-
-    def before(self):
-        # Get registered archives
-        self.archives = [i.rstrip('\n')
-                         for i in self.runlines0('archives', '-n')]
-
-        if self.execmd == 'tla':
-            output = self.run0('tree-version', self.path)
-        else:
-            output = self.run0('tree-version', '-d', self.path)
-        self.treeversion = output.strip()
-
-        # Get name of temporary directory
-        version = self.treeversion.split('/')
-        self.tmppath = os.path.join(tempfile.gettempdir(),
-                                    'hg-%s' % version[1])
-
-        # Generate parents dictionary
-        self.parents[None] = []
-        treeversion = self.treeversion
-        child = None
-        while treeversion:
-            self.ui.status(_('analyzing tree version %s...\n') % treeversion)
-
-            archive = treeversion.split('/')[0]
-            if archive not in self.archives:
-                self.ui.status(_('tree analysis stopped because it points to '
-                                 'an unregistered archive %s...\n') % archive)
-                break
-
-            # Get the complete list of revisions for that tree version
-            output, status = self.runlines('revisions', '-r', '-f', treeversion)
-            self.checkexit(status, 'failed retrieving revisions for %s'
-                           % treeversion)
-
-            # No new iteration unless a revision has a continuation-of header
-            treeversion = None
-
-            for l in output:
-                rev = l.strip()