Commits

Toshi MARUYAMA committed 0e04b2c

0.4c release

Add back links from file revisions to changeset revisions
Add simple transaction support
Add hg verify
Improve caching in revlog
Fix a bunch of bugs
Self-hosting now that the metadata is close to finalized

----------------
Update README to discuss remote pull, rsync, and the hg repo
add a .hgignore file

----------------
Handle nullid better for ancestor

----------------
Ignore build/ and .pyc files

----------------
Fix exception handling for reading current working version

----------------
Don't complain that help is an unknown command

----------------
Add some ignore expressions

----------------
Add MANIFEST.in to repo

----------------
Ignore MANIFEST

----------------
Ignore empty ignore patterns

  • Participants

Comments (0)

Files changed (15)

+.*~
+.*pyc
+build/.*
+dist/
+MANIFEST$
+include hg
+recursive-include mercurial *.py
+include tkmerge
+include *.txt
+include README
+Metadata-Version: 1.0
+Name: mercurial
+Version: 0.4c
+Summary: scalable distributed SCM
+Home-page: http://selenic.com/mercurial
+Author: Matt Mackall
+Author-email: mpm@selenic.com
+License: GNU GPL
+Description: UNKNOWN
+Platform: UNKNOWN
+Setting up Mercurial in your home directory:
+
+ Note: Debian fails to include bits of distutils, you'll need
+ python-dev to install. Alternately, shove everything somewhere in
+ your path.
+
+ $ tar xvzf mercurial-<ver>.tar.gz
+ $ cd mercurial-<ver>
+ $ python setup.py install --home ~
+ $ export PYTHONPATH=${HOME}/lib/python  # add this to your .bashrc
+ $ export HGMERGE=tkmerge                # customize this
+ $ hg                                    # test installation, show help
+
+ If you get complaints about missing modules, you probably haven't set
+ PYTHONPATH correctly.
+
+ You may also want to install psyco, the python specializing compiler.
+ It makes commits more than twice as fast. The relevant Debian package
+ is python-psyco
+
+Setting up a Mercurial project:
+
+ $ cd linux/
+ $ hg init         # creates .hg
+ $ hg status       # show changes between repo and working dir
+ $ hg diff         # generate a unidiff
+ $ hg addremove    # add all unknown files and remove all missing files
+ $ hg commit       # commit all changes, edit changelog entry
+
+ Mercurial will look for a file named .hgignore in the root of your
+ repository contains a set of regular expressions to ignore in file
+ paths.
+
+Mercurial commands:
+
+ $ hg history          # show changesets
+ $ hg log Makefile     # show commits per file
+ $ hg checkout         # check out the tip revision
+ $ hg checkout <hash>  # check out a specified changeset
+ $ hg add foo          # add a new file for the next commit
+ $ hg remove bar       # mark a file as removed
+ $ hg verify           # check repo integrity
+
+Branching and merging:
+
+ $ cd ..
+ $ mkdir linux-work
+ $ cd linux-work
+ $ hg branch ../linux        # create a new branch
+ $ hg checkout               # populate the working directory
+ $ <make changes>
+ $ hg commit
+ $ cd ../linux
+ $ hg merge ../linux-work    # pull changesets from linux-work
+
+Importing patches:
+
+ Fast:
+ $ patch < ../p/foo.patch
+ $ hg addremove
+ $ hg commit
+
+ Faster:
+ $ patch < ../p/foo.patch
+ $ hg commit `lsdiff -p1 ../p/foo.patch`
+
+ Fastest:
+ $ cat ../p/patchlist | xargs hg import -p1 -b ../p 
+
+Network support (highly experimental):
+
+ # pull the self-hosting hg repo
+ foo$ hg init
+ foo$ hg merge http://selenic.com/hg/
+
+ # export your .hg directory as a directory on your webserver
+ foo$ ln -s .hg ~/public_html/hg-linux 
+
+ # merge changes from a remote machine
+ bar$ hg merge http://foo/~user/hg-linux
+
+ This is just a proof of concept of grabbing byte ranges, and is not
+ expected to perform well. Fixing this needs some pipelining to reduce
+ the number of round trips. See zsync for a similar approach.
+
+ Another approach which does perform well right now is to use rsync.
+ Simply rsync the remote repo to a read-only local copy and then do a
+ local pull.
+
+#!/usr/bin/env python
+#
+# mercurial - a minimal scalable distributed SCM
+# v0.4c "oedipa maas"
+#
+# Copyright 2005 Matt Mackall <mpm@selenic.com>
+#
+# This software may be used and distributed according to the terms
+# of the GNU General Public License, incorporated herein by reference.
+
+# the psyco compiler makes commits about twice as fast
+try:
+    import psyco
+    psyco.full()
+except:
+    pass
+
+import sys, os
+from mercurial import hg, mdiff, fancyopts
+
+options = {}
+opts = [('v', 'verbose', None, 'verbose'),
+        ('d', 'debug', None, 'debug')]
+
+args = fancyopts.fancyopts(sys.argv[1:], opts, options,
+                           'hg [options] <command> [command options] [files]')
+
+try:
+    cmd = args[0]
+    args = args[1:]
+except:
+    cmd = ""
+
+ui = hg.ui(options["verbose"], options["debug"])
+    
+if cmd == "init":
+    repo = hg.repository(ui, ".", create=1)
+    sys.exit(0)
+elif cmd == "branch" or cmd == "clone":
+    os.system("cp -al %s/.hg .hg" % args[0])
+    sys.exit(0)
+else:
+    repo = hg.repository(ui=ui)
+
+if cmd == "checkout" or cmd == "co":
+    node = repo.changelog.tip()
+    if len(args): rev = int(args[0])
+    repo.checkout(node)
+
+elif cmd == "add":
+    repo.add(args)
+
+elif cmd == "remove" or cmd == "rm" or cmd == "del" or cmd == "delete":
+    repo.remove(args)
+
+elif cmd == "commit" or cmd == "checkin" or cmd == "ci":
+    if 1:
+        if len(args) > 0:
+            repo.commit(args)
+        else:
+            repo.commit()
+
+elif cmd == "import" or cmd == "patch":
+    ioptions = {}
+    opts = [('p', 'strip', 1, 'path strip'),
+            ('b', 'base', "", 'base path')]
+    
+    args = fancyopts.fancyopts(args, opts, ioptions,
+                              'hg import [options] <patch names>')
+    d = ioptions["base"]
+    strip = ioptions["strip"]
+
+    for patch in args:
+        ui.status("applying %s\n" % patch)
+        pf = d + patch
+        os.system("patch -p%d < %s > /dev/null" % (strip, pf))
+        f = os.popen("lsdiff --strip %d %s" % (strip, pf))
+        files = f.read().splitlines()
+        f.close()
+        repo.commit(files)
+
+elif cmd == "status":
+    (c, a, d) = repo.diffdir(repo.root)
+    for f in c: print "C", f
+    for f in a: print "?", f
+    for f in d: print "R", f
+
+elif cmd == "diff":
+    mmap = {}
+    if repo.current:
+        change = repo.changelog.read(repo.current)
+        mmap = repo.manifest.read(change[0])
+
+    (c, a, d) = repo.diffdir(repo.root)
+    for f in c:
+        to = repo.file(f).read(mmap[f])
+        tn = file(f).read()
+        sys.stdout.write(mdiff.unidiff(to, tn, f))
+    for f in a:
+        to = ""
+        tn = file(f).read()
+        sys.stdout.write(mdiff.unidiff(to, tn, f))
+    for f in d:
+        to = repo.file(f).read(mmap[f])
+        tn = ""
+        sys.stdout.write(mdiff.unidiff(to, tn, f))
+
+elif cmd == "addremove":
+    (c, a, d) = repo.diffdir(repo.root)
+    repo.add(a)
+    repo.remove(d)
+    
+elif cmd == "history":
+    for i in range(repo.changelog.count()):
+        n = repo.changelog.node(i)
+        changes = repo.changelog.read(n)
+        (p1, p2) = repo.changelog.parents(n)
+        (h, h1, h2) = map(hg.hex, (n, p1, p2))
+        (i1, i2) = map(repo.changelog.rev, (p1, p2))
+        print "rev:      %4d:%s" % (i, h)
+        print "parents:  %4d:%s" % (i1, h1)
+        if i2: print "          %4d:%s" % (i2, h2)
+        print "manifest: %4d:%s" % (repo.manifest.rev(changes[0]),
+                                    hg.hex(changes[0]))
+        print "user:", changes[1]
+        print "files:", len(changes[3])
+        print "description:"
+        print changes[4]
+
+elif cmd == "log":
+    if args:
+        r = repo.file(args[0])
+        for i in range(r.count()):
+            n = r.node(i)
+            (p1, p2) = r.parents(n)
+            (h, h1, h2) = map(hg.hex, (n, p1, p2))
+            (i1, i2) = map(r.rev, (p1, p2))
+            cr = r.linkrev(n)
+            cn = hg.hex(repo.changelog.node(cr))
+            print "rev:       %4d:%s" % (i, h)
+            print "changeset: %4d:%s" % (cr, cn)
+            print "parents:   %4d:%s" % (i1, h1)
+            if i2: print "           %4d:%s" % (i2, h2)
+    else:
+        print "missing filename"
+
+elif cmd == "dump":
+    if args:
+        r = repo.file(args[0])
+        n = r.tip()
+        if len(args) > 1: n = hg.bin(args[1])
+        sys.stdout.write(r.read(n))
+    else:
+        print "missing filename"
+
+elif cmd == "dumpmanifest":
+    n = repo.manifest.tip()
+    if len(args) > 0:
+        n = hg.bin(args[0])
+    m = repo.manifest.read(n)
+    files = m.keys()
+    files.sort()
+
+    for f in files:
+        print hg.hex(m[f]), f
+
+elif cmd == "merge":
+    if args:
+        other = hg.repository(ui, args[0])
+        repo.merge(other)
+    else:
+        print "missing source repository"
+
+elif cmd == "verify":
+    filelinkrevs = {}
+    filenodes = {}
+    manifestchangeset = {}
+    changesets = revisions = files = 0
+
+    print "checking changesets"
+    for i in range(repo.changelog.count()):
+        changesets += 1
+        n = repo.changelog.node(i)
+        changes = repo.changelog.read(n)
+        manifestchangeset[changes[0]] = n
+        for f in changes[3]:
+            revisions += 1
+            filelinkrevs.setdefault(f, []).append(i)
+
+    print "checking manifests"
+    for i in range(repo.manifest.count()):
+        n = repo.manifest.node(i)
+        ca = repo.changelog.node(repo.manifest.linkrev(n))
+        cc = manifestchangeset[n]
+        if ca != cc:
+            print "manifest %s points to %s, not %s" % \
+                  (hg.hex(n), hg.hex(ca), hg.hex(cc))
+        m = repo.manifest.read(n)
+        for f, fn in m.items():
+            filenodes.setdefault(f, {})[fn] = 1
+
+    print "crosschecking files in changesets and manifests"
+    for f in filenodes:
+        if f not in filelinkrevs:
+            print "file %s in manifest but not in changesets"
+
+    for f in filelinkrevs:
+        if f not in filenodes:
+            print "file %s in changeset but not in manifest"
+
+    print "checking files"
+    for f in filenodes:
+        files += 1
+        fl = repo.file(f)
+        nodes = {"\0"*20: 1}
+        for i in range(fl.count()):
+            n = fl.node(i)
+            if n not in filenodes[f]:
+                print "%s:%s not in manifests" % (f, hg.hex(n))
+            if fl.linkrev(n) not in filelinkrevs[f]:
+                print "%s:%s points to unknown changeset %s" \
+                      % (f, hg.hex(n), hg.hex(fl.changeset(n)))
+            t = fl.read(n)
+            (p1, p2) = fl.parents(n)
+            if p1 not in nodes:
+                print "%s:%s unknown parent 1 %s" % (f, hg.hex(n), hg.hex(p1))
+            if p2 not in nodes:
+                print "file %s:%s unknown parent %s" % (f, hg.hex(n), hg.hex(p1))
+
+            nodes[n] = 1
+
+    print "%d files, %d changesets, %d total revisions" % (files, changesets,
+                                                           revisions)
+    
+else:
+    if cmd != "help":
+        print "unknown command\n"
+
+    print """\
+ commands:
+
+ init                  create a new repository in this directory
+ branch <path>         create a branch of <path> in this directory
+ merge <path>          merge changes from <path> into local repository
+ checkout [changeset]  checkout the latest or given changeset
+ status                show new, missing, and changed files in working dir
+ add [files...]        add the given files in the next commit
+ remove [files...]     remove the given files in the next commit
+ addremove             add all new files, delete all missing files
+ commit                commit all changes to the repository
+ history               show changeset history
+ log <file>            show revision history of a single file
+ dump <file> [rev]     dump the latest or given revision of a file
+ dumpmanifest [rev]    dump the latest or given revision of the manifest
+"""
+    sys.exit(1)

File mercurial/__init__.py

Empty file added.

File mercurial/byterange.py

+#   This library is free software; you can redistribute it and/or
+#   modify it under the terms of the GNU Lesser General Public
+#   License as published by the Free Software Foundation; either
+#   version 2.1 of the License, or (at your option) any later version.
+#
+#   This library is distributed in the hope that it will be useful,
+#   but WITHOUT ANY WARRANTY; without even the implied warranty of
+#   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+#   Lesser General Public License for more details.
+#
+#   You should have received a copy of the GNU Lesser General Public
+#   License along with this library; if not, write to the 
+#      Free Software Foundation, Inc., 
+#      59 Temple Place, Suite 330, 
+#      Boston, MA  02111-1307  USA
+
+# This file is part of urlgrabber, a high-level cross-protocol url-grabber
+# Copyright 2002-2004 Michael D. Stenner, Ryan Tomayko
+
+# $Id: byterange.py,v 1.9 2005/02/14 21:55:07 mstenner Exp $
+
+import os
+import stat
+import urllib
+import urllib2
+import rfc822
+
+try:    
+    from cStringIO import StringIO
+except ImportError, msg: 
+    from StringIO import StringIO
+
+class RangeError(IOError):
+    """Error raised when an unsatisfiable range is requested."""
+    pass
+    
+class HTTPRangeHandler(urllib2.BaseHandler):
+    """Handler that enables HTTP Range headers.
+    
+    This was extremely simple. The Range header is a HTTP feature to
+    begin with so all this class does is tell urllib2 that the 
+    "206 Partial Content" reponse from the HTTP server is what we 
+    expected.
+    
+    Example:
+        import urllib2
+        import byterange
+        
+        range_handler = range.HTTPRangeHandler()
+        opener = urllib2.build_opener(range_handler)
+        
+        # install it
+        urllib2.install_opener(opener)
+        
+        # create Request and set Range header
+        req = urllib2.Request('http://www.python.org/')
+        req.header['Range'] = 'bytes=30-50'
+        f = urllib2.urlopen(req)
+    """
+    
+    def http_error_206(self, req, fp, code, msg, hdrs):
+        # 206 Partial Content Response
+        r = urllib.addinfourl(fp, hdrs, req.get_full_url())
+        r.code = code
+        r.msg = msg
+        return r
+    
+    def http_error_416(self, req, fp, code, msg, hdrs):
+        # HTTP's Range Not Satisfiable error
+        raise RangeError('Requested Range Not Satisfiable')
+
+class RangeableFileObject:
+    """File object wrapper to enable raw range handling.
+    This was implemented primarilary for handling range 
+    specifications for file:// urls. This object effectively makes 
+    a file object look like it consists only of a range of bytes in 
+    the stream.
+    
+    Examples:
+        # expose 10 bytes, starting at byte position 20, from 
+        # /etc/aliases.
+        >>> fo = RangeableFileObject(file('/etc/passwd', 'r'), (20,30))
+        # seek seeks within the range (to position 23 in this case)
+        >>> fo.seek(3)
+        # tell tells where your at _within the range_ (position 3 in
+        # this case)
+        >>> fo.tell()
+        # read EOFs if an attempt is made to read past the last
+        # byte in the range. the following will return only 7 bytes.
+        >>> fo.read(30)
+    """
+    
+    def __init__(self, fo, rangetup):
+        """Create a RangeableFileObject.
+        fo       -- a file like object. only the read() method need be 
+                    supported but supporting an optimized seek() is 
+                    preferable.
+        rangetup -- a (firstbyte,lastbyte) tuple specifying the range
+                    to work over.
+        The file object provided is assumed to be at byte offset 0.
+        """
+        self.fo = fo
+        (self.firstbyte, self.lastbyte) = range_tuple_normalize(rangetup)
+        self.realpos = 0
+        self._do_seek(self.firstbyte)
+        
+    def __getattr__(self, name):
+        """This effectively allows us to wrap at the instance level.
+        Any attribute not found in _this_ object will be searched for
+        in self.fo.  This includes methods."""
+        if hasattr(self.fo, name):
+            return getattr(self.fo, name)
+        raise AttributeError, name
+    
+    def tell(self):
+        """Return the position within the range.
+        This is different from fo.seek in that position 0 is the 
+        first byte position of the range tuple. For example, if
+        this object was created with a range tuple of (500,899),
+        tell() will return 0 when at byte position 500 of the file.
+        """
+        return (self.realpos - self.firstbyte)
+    
+    def seek(self,offset,whence=0):
+        """Seek within the byte range.
+        Positioning is identical to that described under tell().
+        """
+        assert whence in (0, 1, 2)
+        if whence == 0:   # absolute seek
+            realoffset = self.firstbyte + offset
+        elif whence == 1: # relative seek
+            realoffset = self.realpos + offset
+        elif whence == 2: # absolute from end of file
+            # XXX: are we raising the right Error here?
+            raise IOError('seek from end of file not supported.')
+        
+        # do not allow seek past lastbyte in range
+        if self.lastbyte and (realoffset >= self.lastbyte):
+            realoffset = self.lastbyte
+        
+        self._do_seek(realoffset - self.realpos)
+        
+    def read(self, size=-1):
+        """Read within the range.
+        This method will limit the size read based on the range.
+        """
+        size = self._calc_read_size(size)
+        rslt = self.fo.read(size)
+        self.realpos += len(rslt)
+        return rslt
+    
+    def readline(self, size=-1):
+        """Read lines within the range.
+        This method will limit the size read based on the range.
+        """
+        size = self._calc_read_size(size)
+        rslt = self.fo.readline(size)
+        self.realpos += len(rslt)
+        return rslt
+    
+    def _calc_read_size(self, size):
+        """Handles calculating the amount of data to read based on
+        the range.
+        """
+        if self.lastbyte:
+            if size > -1:
+                if ((self.realpos + size) >= self.lastbyte):
+                    size = (self.lastbyte - self.realpos)
+            else:
+                size = (self.lastbyte - self.realpos)
+        return size
+        
+    def _do_seek(self,offset):
+        """Seek based on whether wrapped object supports seek().
+        offset is relative to the current position (self.realpos).
+        """
+        assert offset >= 0
+        if not hasattr(self.fo, 'seek'):
+            self._poor_mans_seek(offset)
+        else:
+            self.fo.seek(self.realpos + offset)
+        self.realpos+= offset
+        
+    def _poor_mans_seek(self,offset):
+        """Seek by calling the wrapped file objects read() method.
+        This is used for file like objects that do not have native
+        seek support. The wrapped objects read() method is called
+        to manually seek to the desired position.
+        offset -- read this number of bytes from the wrapped
+                  file object.
+        raise RangeError if we encounter EOF before reaching the 
+        specified offset.
+        """
+        pos = 0
+        bufsize = 1024
+        while pos < offset:
+            if (pos + bufsize) > offset:
+                bufsize = offset - pos
+            buf = self.fo.read(bufsize)
+            if len(buf) != bufsize:
+                raise RangeError('Requested Range Not Satisfiable')
+            pos+= bufsize
+
+class FileRangeHandler(urllib2.FileHandler):
+    """FileHandler subclass that adds Range support.
+    This class handles Range headers exactly like an HTTP
+    server would.
+    """
+    def open_local_file(self, req):
+        import mimetypes
+        import mimetools
+        host = req.get_host()
+        file = req.get_selector()
+        localfile = urllib.url2pathname(file)
+        stats = os.stat(localfile)
+        size = stats[stat.ST_SIZE]
+        modified = rfc822.formatdate(stats[stat.ST_MTIME])
+        mtype = mimetypes.guess_type(file)[0]
+        if host:
+            host, port = urllib.splitport(host)
+            if port or socket.gethostbyname(host) not in self.get_names():
+                raise URLError('file not on local host')
+        fo = open(localfile,'rb')
+        brange = req.headers.get('Range',None)
+        brange = range_header_to_tuple(brange)
+        assert brange != ()
+        if brange:
+            (fb,lb) = brange
+            if lb == '': lb = size
+            if fb < 0 or fb > size or lb > size:
+                raise RangeError('Requested Range Not Satisfiable')
+            size = (lb - fb)
+            fo = RangeableFileObject(fo, (fb,lb))
+        headers = mimetools.Message(StringIO(
+            'Content-Type: %s\nContent-Length: %d\nLast-modified: %s\n' %
+            (mtype or 'text/plain', size, modified)))
+        return urllib.addinfourl(fo, headers, 'file:'+file)
+
+
+# FTP Range Support 
+# Unfortunately, a large amount of base FTP code had to be copied
+# from urllib and urllib2 in order to insert the FTP REST command.
+# Code modifications for range support have been commented as 
+# follows:
+# -- range support modifications start/end here
+
+from urllib import splitport, splituser, splitpasswd, splitattr, \
+                   unquote, addclosehook, addinfourl
+import ftplib
+import socket
+import sys
+import ftplib
+import mimetypes
+import mimetools
+
+class FTPRangeHandler(urllib2.FTPHandler):
+    def ftp_open(self, req):
+        host = req.get_host()
+        if not host:
+            raise IOError, ('ftp error', 'no host given')
+        host, port = splitport(host)
+        if port is None:
+            port = ftplib.FTP_PORT
+
+        # username/password handling
+        user, host = splituser(host)
+        if user:
+            user, passwd = splitpasswd(user)
+        else:
+            passwd = None
+        host = unquote(host)
+        user = unquote(user or '')
+        passwd = unquote(passwd or '')
+        
+        try:
+            host = socket.gethostbyname(host)
+        except socket.error, msg:
+            raise URLError(msg)
+        path, attrs = splitattr(req.get_selector())
+        dirs = path.split('/')
+        dirs = map(unquote, dirs)
+        dirs, file = dirs[:-1], dirs[-1]
+        if dirs and not dirs[0]:
+            dirs = dirs[1:]
+        try:
+            fw = self.connect_ftp(user, passwd, host, port, dirs)
+            type = file and 'I' or 'D'
+            for attr in attrs:
+                attr, value = splitattr(attr)
+                if attr.lower() == 'type' and \
+                   value in ('a', 'A', 'i', 'I', 'd', 'D'):
+                    type = value.upper()
+            
+            # -- range support modifications start here
+            rest = None
+            range_tup = range_header_to_tuple(req.headers.get('Range',None))    
+            assert range_tup != ()
+            if range_tup:
+                (fb,lb) = range_tup
+                if fb > 0: rest = fb
+            # -- range support modifications end here
+            
+            fp, retrlen = fw.retrfile(file, type, rest)
+            
+            # -- range support modifications start here
+            if range_tup:
+                (fb,lb) = range_tup
+                if lb == '': 
+                    if retrlen is None or retrlen == 0:
+                        raise RangeError('Requested Range Not Satisfiable due to unobtainable file length.')
+                    lb = retrlen
+                    retrlen = lb - fb
+                    if retrlen < 0:
+                        # beginning of range is larger than file
+                        raise RangeError('Requested Range Not Satisfiable')
+                else:
+                    retrlen = lb - fb
+                    fp = RangeableFileObject(fp, (0,retrlen))
+            # -- range support modifications end here
+            
+            headers = ""
+            mtype = mimetypes.guess_type(req.get_full_url())[0]
+            if mtype:
+                headers += "Content-Type: %s\n" % mtype
+            if retrlen is not None and retrlen >= 0:
+                headers += "Content-Length: %d\n" % retrlen
+            sf = StringIO(headers)
+            headers = mimetools.Message(sf)
+            return addinfourl(fp, headers, req.get_full_url())
+        except ftplib.all_errors, msg:
+            raise IOError, ('ftp error', msg), sys.exc_info()[2]
+
+    def connect_ftp(self, user, passwd, host, port, dirs):
+        fw = ftpwrapper(user, passwd, host, port, dirs)
+        return fw
+
+class ftpwrapper(urllib.ftpwrapper):
+    # range support note:
+    # this ftpwrapper code is copied directly from
+    # urllib. The only enhancement is to add the rest
+    # argument and pass it on to ftp.ntransfercmd
+    def retrfile(self, file, type, rest=None):
+        self.endtransfer()
+        if type in ('d', 'D'): cmd = 'TYPE A'; isdir = 1
+        else: cmd = 'TYPE ' + type; isdir = 0
+        try:
+            self.ftp.voidcmd(cmd)
+        except ftplib.all_errors:
+            self.init()
+            self.ftp.voidcmd(cmd)
+        conn = None
+        if file and not isdir:
+            # Use nlst to see if the file exists at all
+            try:
+                self.ftp.nlst(file)
+            except ftplib.error_perm, reason:
+                raise IOError, ('ftp error', reason), sys.exc_info()[2]
+            # Restore the transfer mode!
+            self.ftp.voidcmd(cmd)
+            # Try to retrieve as a file
+            try:
+                cmd = 'RETR ' + file
+                conn = self.ftp.ntransfercmd(cmd, rest)
+            except ftplib.error_perm, reason:
+                if str(reason)[:3] == '501':
+                    # workaround for REST not supported error
+                    fp, retrlen = self.retrfile(file, type)
+                    fp = RangeableFileObject(fp, (rest,''))
+                    return (fp, retrlen)
+                elif str(reason)[:3] != '550':
+                    raise IOError, ('ftp error', reason), sys.exc_info()[2]
+        if not conn:
+            # Set transfer mode to ASCII!
+            self.ftp.voidcmd('TYPE A')
+            # Try a directory listing
+            if file: cmd = 'LIST ' + file
+            else: cmd = 'LIST'
+            conn = self.ftp.ntransfercmd(cmd)
+        self.busy = 1
+        # Pass back both a suitably decorated object and a retrieval length
+        return (addclosehook(conn[0].makefile('rb'),
+                            self.endtransfer), conn[1])
+
+
+####################################################################
+# Range Tuple Functions
+# XXX: These range tuple functions might go better in a class.
+
+_rangere = None
+def range_header_to_tuple(range_header):
+    """Get a (firstbyte,lastbyte) tuple from a Range header value.
+    
+    Range headers have the form "bytes=<firstbyte>-<lastbyte>". This
+    function pulls the firstbyte and lastbyte values and returns
+    a (firstbyte,lastbyte) tuple. If lastbyte is not specified in
+    the header value, it is returned as an empty string in the
+    tuple.
+    
+    Return None if range_header is None
+    Return () if range_header does not conform to the range spec 
+    pattern.
+    
+    """
+    global _rangere
+    if range_header is None: return None
+    if _rangere is None:
+        import re
+        _rangere = re.compile(r'^bytes=(\d{1,})-(\d*)')
+    match = _rangere.match(range_header)
+    if match: 
+        tup = range_tuple_normalize(match.group(1,2))
+        if tup and tup[1]: 
+            tup = (tup[0],tup[1]+1)
+        return tup
+    return ()
+
+def range_tuple_to_header(range_tup):
+    """Convert a range tuple to a Range header value.
+    Return a string of the form "bytes=<firstbyte>-<lastbyte>" or None
+    if no range is needed.
+    """
+    if range_tup is None: return None
+    range_tup = range_tuple_normalize(range_tup)
+    if range_tup:
+        if range_tup[1]: 
+            range_tup = (range_tup[0],range_tup[1] - 1)
+        return 'bytes=%s-%s' % range_tup
+    
+def range_tuple_normalize(range_tup):
+    """Normalize a (first_byte,last_byte) range tuple.
+    Return a tuple whose first element is guaranteed to be an int
+    and whose second element will be '' (meaning: the last byte) or 
+    an int. Finally, return None if the normalized tuple == (0,'')
+    as that is equivelant to retrieving the entire file.
+    """
+    if range_tup is None: return None
+    # handle first byte
+    fb = range_tup[0]
+    if fb in (None,''): fb = 0
+    else: fb = int(fb)
+    # handle last byte
+    try: lb = range_tup[1]
+    except IndexError: lb = ''
+    else:  
+        if lb is None: lb = ''
+        elif lb != '': lb = int(lb)
+    # check if range is over the entire file
+    if (fb,lb) == (0,''): return None
+    # check that the range is valid
+    if lb < fb: raise RangeError('Invalid byte range: %s-%s' % (fb,lb))
+    return (fb,lb)
+

File mercurial/fancyopts.py

+import sys, os, getopt
+
+def fancyopts(args, options, state, syntax=''):
+    long=[]
+    short=''
+    map={}
+    dt={}
+
+    def help(state, opt, arg, options=options, syntax=syntax):
+        print "Usage: ", syntax
+
+        for s, l, d, c in options:
+            opt=' '
+            if s: opt = opt + '-' + s + ' '
+            if l: opt = opt + '--' + l + ' '
+            if d: opt = opt + '(' + str(d) + ')'
+            print opt
+            if c: print '   %s' % c
+        sys.exit(0)
+
+    if len(args) == 0:
+        help(state, None, args)
+
+    options=[('h', 'help', help, 'Show usage info')] + options
+    
+    for s, l, d, c in options:
+        map['-'+s] = map['--'+l]=l
+        state[l] = d
+        dt[l] = type(d)
+        if not d is None and not type(d) is type(help): s, l=s+':', l+'='      
+        if s: short = short + s
+        if l: long.append(l)
+
+    if os.environ.has_key("HG_OPTS"):
+        args = os.environ["HG_OPTS"].split() + args
+
+    try:
+        opts, args = getopt.getopt(args, short, long)
+    except getopt.GetoptError:
+        help(state, None, args)
+        sys.exit(-1)
+
+    for opt, arg in opts:
+        if dt[map[opt]] is type(help): state[map[opt]](state,map[opt],arg)
+        elif dt[map[opt]] is type(1): state[map[opt]] = int(arg)
+        elif dt[map[opt]] is type(''): state[map[opt]] = arg
+        elif dt[map[opt]] is type([]): state[map[opt]].append(arg)
+        elif dt[map[opt]] is type(None): state[map[opt]] = 1
+        
+    return args
+

File mercurial/hg.py

+# hg.py - repository classes for mercurial
+#
+# Copyright 2005 Matt Mackall <mpm@selenic.com>
+#
+# This software may be used and distributed according to the terms
+# of the GNU General Public License, incorporated herein by reference.
+
+import sys, struct, sha, socket, os, time, base64, re, urllib2, binascii
+from mercurial import byterange
+from mercurial.transaction import *
+from mercurial.revlog import *
+
+def hex(node): return binascii.hexlify(node)
+def bin(node): return binascii.unhexlify(node)
+
+class filelog(revlog):
+    def __init__(self, opener, path):
+        s = self.encodepath(path)
+        revlog.__init__(self, opener, os.path.join("data", s + "i"),
+                        os.path.join("data", s))
+
+    def encodepath(self, path):
+        s = sha.sha(path).digest()
+        s = base64.encodestring(s)[:-3]
+        s = re.sub("\+", "%", s)
+        s = re.sub("/", "_", s)
+        return s
+
+    def read(self, node):
+        return self.revision(node)
+    def add(self, text, transaction, link, p1=None, p2=None):
+        return self.addrevision(text, transaction, link, p1, p2)
+
+    def resolvedag(self, old, new, transaction, link):
+        """resolve unmerged heads in our DAG"""
+        if old == new: return None
+        a = self.ancestor(old, new)
+        if old == a: return new
+        return self.merge3(old, new, a, transaction, link)
+
+    def merge3(self, my, other, base, transaction, link):
+        """perform a 3-way merge and append the result"""
+        def temp(prefix, node):
+            (fd, name) = tempfile.mkstemp(prefix)
+            f = os.fdopen(fd, "w")
+            f.write(self.revision(node))
+            f.close()
+            return name
+
+        a = temp("local", my)
+        b = temp("remote", other)
+        c = temp("parent", base)
+
+        cmd = os.environ["HGMERGE"]
+        r = os.system("%s %s %s %s" % (cmd, a, b, c))
+        if r:
+            raise "Merge failed, implement rollback!"
+
+        t = open(a).read()
+        os.unlink(a)
+        os.unlink(b)
+        os.unlink(c)
+        return self.addrevision(t, transaction, link, my, other)
+
+    def merge(self, other, transaction, linkseq, link):
+        """perform a merge and resolve resulting heads"""
+        (o, n) = self.mergedag(other, transaction, linkseq)
+        return self.resolvedag(o, n, transaction, link)
+
+class manifest(revlog):
+    def __init__(self, opener):
+        self.mapcache = None
+        self.listcache = None
+        self.addlist = None
+        revlog.__init__(self, opener, "00manifest.i", "00manifest.d")
+
+    def read(self, node):
+        if self.mapcache and self.mapcache[0] == node:
+            return self.mapcache[1]
+        text = self.revision(node)
+        map = {}
+        self.listcache = text.splitlines(1)
+        for l in self.listcache:
+            (f, n) = l.split('\0')
+            map[f] = bin(n[:40])
+        self.mapcache = (node, map)
+        return map
+
+    def diff(self, a, b):
+        # this is sneaky, as we're not actually using a and b
+        if self.listcache:
+            return mdiff.diff(self.listcache, self.addlist, 1)
+        else:
+            return mdiff.diff(a, b)
+
+    def add(self, map, transaction, link, p1=None, p2=None):
+        files = map.keys()
+        files.sort()
+
+        self.addlist = ["%s\000%s\n" % (f, hex(map[f])) for f in files]
+        text = "".join(self.addlist)
+
+        n = self.addrevision(text, transaction, link, p1, p2)
+        self.mapcache = (n, map)
+        self.listcache = self.addlist
+
+        return n
+
+class changelog(revlog):
+    def __init__(self, opener):
+        revlog.__init__(self, opener, "00changelog.i", "00changelog.d")
+
+    def extract(self, text):
+        last = text.index("\n\n")
+        desc = text[last + 2:]
+        l = text[:last].splitlines()
+        manifest = bin(l[0])
+        user = l[1]
+        date = l[2]
+        files = l[3:]
+        return (manifest, user, date, files, desc)
+
+    def read(self, node):
+        return self.extract(self.revision(node))
+
+    def add(self, manifest, list, desc, transaction, p1=None, p2=None):
+        try: user = os.environ["HGUSER"]
+        except: user = os.environ["LOGNAME"] + '@' + socket.getfqdn()
+        date = "%d %d" % (time.time(), time.timezone)
+        list.sort()
+        l = [hex(manifest), user, date] + list + ["", desc]
+        text = "\n".join(l)
+        return self.addrevision(text, transaction, self.count(), p1, p2)
+
+    def merge3(self, my, other, base):
+        pass
+
+class dircache:
+    def __init__(self, opener):
+        self.opener = opener
+        self.dirty = 0
+        self.map = None
+    def __del__(self):
+        if self.dirty: self.write()
+    def __getitem__(self, key):
+        try:
+            return self.map[key]
+        except TypeError:
+            self.read()
+            return self[key]
+        
+    def read(self):
+        if self.map is not None: return self.map
+
+        self.map = {}
+        try:
+            st = self.opener("dircache").read()
+        except: return
+
+        pos = 0
+        while pos < len(st):
+            e = struct.unpack(">llll", st[pos:pos+16])
+            l = e[3]
+            pos += 16
+            f = st[pos:pos + l]
+            self.map[f] = e[:3]
+            pos += l
+        
+    def update(self, files):
+        if not files: return
+        self.read()
+        self.dirty = 1
+        for f in files:
+            try:
+                s = os.stat(f)
+                self.map[f] = (s.st_mode, s.st_size, s.st_mtime)
+            except IOError:
+                self.remove(f)
+
+    def taint(self, files):
+        if not files: return
+        self.read()
+        self.dirty = 1
+        for f in files:
+            self.map[f] = (0, -1, 0)
+
+    def remove(self, files):
+        if not files: return
+        self.read()
+        self.dirty = 1
+        for f in files:
+            try: del self[f]
+            except: pass
+
+    def clear(self):
+        self.map = {}
+        self.dirty = 1
+
+    def write(self):
+        st = self.opener("dircache", "w")
+        for f, e in self.map.items():
+            e = struct.pack(">llll", e[0], e[1], e[2], len(f))
+            st.write(e + f)
+        self.dirty = 0
+
+    def copy(self):
+        self.read()
+        return self.map.copy()
+
+# used to avoid circular references so destructors work
+def opener(base):
+    p = base
+    def o(path, mode="r"):
+        f = os.path.join(p, path)
+        if p[:7] == "http://":
+            return httprangereader(f)
+
+        if mode != "r" and os.path.isfile(f):
+            s = os.stat(f)
+            if s.st_nlink > 1:
+                file(f + ".tmp", "w").write(file(f).read())
+                os.rename(f+".tmp", f)
+
+        return file(f, mode)
+
+    return o
+
+class repository:
+    def __init__(self, ui, path=None, create=0):
+        self.remote = 0
+        if path and path[:7] == "http://":
+            self.remote = 1
+            self.path = path
+        else:
+            if not path:
+                p = os.getcwd()
+                while not os.path.isdir(os.path.join(p, ".hg")):
+                    p = os.path.dirname(p)
+                    if p == "/": raise "No repo found"
+                path = p
+            self.path = os.path.join(path, ".hg")
+
+        self.root = path
+        self.ui = ui
+
+        if create:
+            os.mkdir(self.path)  
+            os.mkdir(self.join("data"))
+
+        self.opener = opener(self.path)
+        self.manifest = manifest(self.opener)
+        self.changelog = changelog(self.opener)
+        self.ignorelist = None
+
+        if not self.remote:
+            self.dircache = dircache(self.opener)
+            try:
+                self.current = bin(self.opener("current").read())
+            except IOError:
+                self.current = None
+
+    def setcurrent(self, node):
+        self.current = node
+        self.opener("current", "w").write(hex(node))
+      
+    def ignore(self, f):
+        if self.ignorelist is None:
+            self.ignorelist = []
+            try:
+                l = open(os.path.join(self.root, ".hgignore")).readlines()
+                for pat in l:
+                    if pat != "\n":
+                        self.ignorelist.append(re.compile(pat[:-1]))
+            except IOError: pass
+        for pat in self.ignorelist:
+            if pat.search(f): return True
+        return False
+
+    def join(self, f):
+        return os.path.join(self.path, f)
+
+    def file(self, f):
+        return filelog(self.opener, f)
+
+    def transaction(self):
+        return transaction(self.opener, self.join("journal"))
+
+    def merge(self, other):
+        tr = self.transaction()
+        changed = {}
+        new = {}
+        nextrev = seqrev = self.changelog.count()
+
+        # helpers for back-linking file revisions to local changeset
+        # revisions so we can immediately get to changeset from annotate
+        def accumulate(text):
+            n = nextrev
+            # track which files are added in which changeset and the
+            # corresponding _local_ changeset revision
+            files = self.changelog.extract(text)[3]
+            for f in files:
+                changed.setdefault(f, []).append(n)
+            n += 1
+
+        def seq(start):
+            while 1:
+                yield start
+                start += 1
+
+        def lseq(l):
+            for r in l:
+                yield r
+
+        # begin the import/merge of changesets
+        self.ui.status("merging new changesets\n")
+        (co, cn) = self.changelog.mergedag(other.changelog, tr,
+                                           seq(seqrev), accumulate)
+        resolverev = self.changelog.count()
+
+        # is there anything to do?
+        if co == cn:
+            tr.close()
+            return
+        
+        # do we need to resolve?
+        simple = (co == self.changelog.ancestor(co, cn))
+
+        # merge all files changed by the changesets,
+        # keeping track of the new tips
+        changelist = changed.keys()
+        changelist.sort()
+        for f in changelist:
+            sys.stdout.write(".")
+            sys.stdout.flush()
+            r = self.file(f)
+            node = r.merge(other.file(f), tr, lseq(changed[f]), resolverev)
+            if node:
+                new[f] = node
+        sys.stdout.write("\n")
+
+        # begin the merge of the manifest
+        self.ui.status("merging manifests\n")
+        (mm, mo) = self.manifest.mergedag(other.manifest, tr, seq(seqrev))
+
+        # For simple merges, we don't need to resolve manifests or changesets
+        if simple:
+            tr.close()
+            return
+
+        ma = self.manifest.ancestor(mm, mo)
+
+        # resolve the manifest to point to all the merged files
+        self.ui.status("resolving manifests\n")
+        mmap = self.manifest.read(mm) # mine
+        omap = self.manifest.read(mo) # other
+        amap = self.manifest.read(ma) # ancestor
+        nmap = {}
+
+        for f, mid in mmap.iteritems():
+            if f in omap:
+                if mid != omap[f]: 
+                    nmap[f] = new.get(f, mid) # use merged version
+                else:
+                    nmap[f] = new.get(f, mid) # they're the same
+                del omap[f]
+            elif f in amap:
+                if mid != amap[f]: 
+                    pass # we should prompt here
+                else:
+                    pass # other deleted it
+            else:
+                nmap[f] = new.get(f, mid) # we created it
+                
+        del mmap
+
+        for f, oid in omap.iteritems():
+            if f in amap:
+                if oid != amap[f]:
+                    pass # this is the nasty case, we should prompt
+                else:
+                    pass # probably safe
+            else:
+                nmap[f] = new.get(f, oid) # remote created it
+
+        del omap
+        del amap
+
+        node = self.manifest.add(nmap, tr, resolverev, mm, mo)
+
+        # Now all files and manifests are merged, we add the changed files
+        # and manifest id to the changelog
+        self.ui.status("committing merge changeset\n")
+        new = new.keys()
+        new.sort()
+        if co == cn: cn = -1
+
+        edittext = "\n"+"".join(["HG: changed %s\n" % f for f in new])
+        edittext = self.ui.edit(edittext)
+        n = self.changelog.add(node, new, edittext, tr, co, cn)
+
+        tr.close()
+
+    def commit(self, update = None, text = ""):
+        tr = self.transaction()
+        
+        try:
+            remove = [ l[:-1] for l in self.opener("to-remove") ]
+            os.unlink(self.join("to-remove"))
+
+        except IOError:
+            remove = []
+
+        if update == None:
+            update = self.diffdir(self.root)[0]
+
+        # check in files
+        new = {}
+        linkrev = self.changelog.count()
+        for f in update:
+            try:
+                t = file(f).read()
+            except IOError:
+                remove.append(f)
+                continue
+            r = self.file(f)
+            new[f] = r.add(t, tr, linkrev)
+
+        # update manifest
+        mmap = self.manifest.read(self.manifest.tip())
+        mmap.update(new)
+        for f in remove:
+            del mmap[f]
+        mnode = self.manifest.add(mmap, tr, linkrev)
+
+        # add changeset
+        new = new.keys()
+        new.sort()
+
+        edittext = text + "\n"+"".join(["HG: changed %s\n" % f for f in new])
+        edittext = self.ui.edit(edittext)
+
+        n = self.changelog.add(mnode, new, edittext, tr)
+        tr.close()
+
+        self.setcurrent(n)
+        self.dircache.update(new)
+        self.dircache.remove(remove)
+
+    def checkdir(self, path):
+        d = os.path.dirname(path)
+        if not d: return
+        if not os.path.isdir(d):
+            self.checkdir(d)
+            os.mkdir(d)
+
+    def checkout(self, node):
+        # checkout is really dumb at the moment
+        # it ought to basically merge
+        change = self.changelog.read(node)
+        mmap = self.manifest.read(change[0])
+
+        l = mmap.keys()
+        l.sort()
+        stats = []
+        for f in l:
+            r = self.file(f)
+            t = r.revision(mmap[f])
+            try:
+                file(f, "w").write(t)
+            except:
+                self.checkdir(f)
+                file(f, "w").write(t)
+
+        self.setcurrent(node)
+        self.dircache.clear()
+        self.dircache.update(l)
+
+    def diffdir(self, path):
+        dc = self.dircache.copy()
+        changed = []
+        mf = {}
+        added = []
+
+        if self.current:
+            change = self.changelog.read(self.current)
+            mf = self.manifest.read(change[0])
+
+        for dir, subdirs, files in os.walk(self.root):
+            d = dir[len(self.root)+1:]
+            if ".hg" in subdirs: subdirs.remove(".hg")
+            
+            for f in files:
+                fn = os.path.join(d, f)
+                try: s = os.stat(fn)
+                except: continue
+                if fn in dc:
+                    c = dc[fn]
+                    del dc[fn]
+                    if c[1] != s.st_size:
+                        changed.append(fn)
+                    elif c[0] != s.st_mode or c[2] != s.st_mtime:
+                        t1 = file(fn).read()
+                        t2 = self.file(fn).revision(mf[fn])
+                        if t1 != t2:
+                            changed.append(fn)
+                else:
+                    if self.ignore(fn): continue
+                    added.append(fn)
+
+        deleted = dc.keys()
+        deleted.sort()
+
+        return (changed, added, deleted)
+
+    def add(self, list):
+        self.dircache.taint(list)
+
+    def remove(self, list):
+        dl = self.opener("to-remove", "a")
+        for f in list:
+            dl.write(f + "\n")
+
+class ui:
+    def __init__(self, verbose=False, debug=False):
+        self.verbose = verbose
+    def write(self, *args):
+        for a in args:
+            sys.stdout.write(str(a))
+    def prompt(self, msg, pat):
+        while 1:
+            sys.stdout.write(msg)
+            r = sys.stdin.readline()[:-1]
+            if re.match(pat, r):
+                return r
+    def status(self, *msg):
+        self.write(*msg)
+    def warn(self, msg):
+        self.write(*msg)
+    def note(self, msg):
+        if self.verbose: self.write(*msg)
+    def debug(self, msg):
+        if self.debug: self.write(*msg)
+    def edit(self, text):
+        (fd, name) = tempfile.mkstemp("hg")
+        f = os.fdopen(fd, "w")
+        f.write(text)
+        f.close()
+
+        editor = os.environ.get("EDITOR", "vi")
+        r = os.system("%s %s" % (editor, name))
+        if r:
+            raise "Edit failed!"
+
+        t = open(name).read()
+        t = re.sub("(?m)^HG:.*\n", "", t)
+
+        return t
+
+    
+class httprangereader:
+    def __init__(self, url):
+        self.url = url
+        self.pos = 0
+    def seek(self, pos):
+        self.pos = pos
+    def read(self, bytes=None):
+        opener = urllib2.build_opener(byterange.HTTPRangeHandler())
+        urllib2.install_opener(opener)
+        req = urllib2.Request(self.url)
+        end = ''
+        if bytes: end = self.pos + bytes
+        req.add_header('Range', 'bytes=%d-%s' % (self.pos, end))
+        f = urllib2.urlopen(req)
+        return f.read()

File mercurial/mdiff.py

+#!/usr/bin/python
+import difflib, struct
+from cStringIO import StringIO
+
+def unidiff(a, b, fn):
+    a = a.splitlines(1)
+    b = b.splitlines(1)
+    l = difflib.unified_diff(a, b, fn, fn)
+    return "".join(l)
+
+def textdiff(a, b):
+    return diff(a.splitlines(1), b.splitlines(1))
+
+def sortdiff(a, b):
+    la = lb = 0
+
+    while 1:
+        if la >= len(a) or lb >= len(b): break
+        if b[lb] < a[la]:
+            si = lb
+            while lb < len(b) and b[lb] < a[la] : lb += 1
+            yield "insert", la, la, si, lb
+        elif a[la] < b[lb]:
+            si = la
+            while la < len(a) and a[la] < b[lb]: la += 1
+            yield "delete", si, la, lb, lb
+        else:
+            la += 1
+            lb += 1
+
+    si = lb
+    while lb < len(b):
+        lb += 1
+        yield "insert", la, la, si, lb
+
+    si = la
+    while la < len(a):
+        la += 1
+        yield "delete", si, la, lb, lb
+
+def diff(a, b, sorted=0):
+    bin = []
+    p = [0]
+    for i in a: p.append(p[-1] + len(i))
+
+    if sorted:
+        d = sortdiff(a, b)
+    else:
+        d = difflib.SequenceMatcher(None, a, b).get_opcodes()
+
+    for o, m, n, s, t in d:
+        if o == 'equal': continue
+        s = "".join(b[s:t])
+        bin.append(struct.pack(">lll", p[m], p[n], len(s)) + s)
+
+    return "".join(bin)
+
+def patch(a, bin):
+    last = pos = 0
+    r = []
+
+    while pos < len(bin):
+        p1, p2, l = struct.unpack(">lll", bin[pos:pos + 12])
+        pos += 12
+        r.append(a[last:p1])
+        r.append(bin[pos:pos + l])
+        pos += l
+        last = p2
+    r.append(a[last:])
+
+    return "".join(r)
+
+
+
+
+

File mercurial/revlog.py

+# revlog.py - storage back-end for mercurial
+#
+# This provides efficient delta storage with O(1) retrieve and append
+# and O(changes) merge between branches
+#
+# Copyright 2005 Matt Mackall <mpm@selenic.com>
+#
+# This software may be used and distributed according to the terms
+# of the GNU General Public License, incorporated herein by reference.
+
+import zlib, struct, sha, os, tempfile
+from mercurial import mdiff
+
+def compress(text):
+    return zlib.compress(text)
+
+def decompress(bin):
+    return zlib.decompress(bin)
+
+def hash(text, p1, p2):
+    l = [p1, p2]
+    l.sort()
+    return sha.sha(l[0] + l[1] + text).digest()
+
+nullid = "\0" * 20
+indexformat = ">4l20s20s20s"
+
+class revlog:
+    def __init__(self, opener, indexfile, datafile):
+        self.indexfile = indexfile
+        self.datafile = datafile
+        self.index = []
+        self.opener = opener
+        self.cache = None
+        self.nodemap = { -1: nullid, nullid: -1 }
+        # read the whole index for now, handle on-demand later
+        try:
+            n = 0
+            i = self.opener(self.indexfile).read()
+            s = struct.calcsize(indexformat)
+            for f in range(0, len(i), s):
+                # offset, size, base, linkrev, p1, p2, nodeid, changeset
+                e = struct.unpack(indexformat, i[f:f + s])
+                self.nodemap[e[6]] = n
+                self.index.append(e)
+                n += 1
+        except IOError: pass
+
+    def tip(self): return self.node(len(self.index) - 1)
+    def count(self): return len(self.index)
+    def node(self, rev): return rev < 0 and nullid or self.index[rev][6]
+    def rev(self, node): return self.nodemap[node]
+    def linkrev(self, node): return self.index[self.nodemap[node]][3]
+    def parents(self, node):
+        if node == nullid: return (nullid, nullid)
+        return self.index[self.nodemap[node]][4:6]
+
+    def start(self, rev): return self.index[rev][0]
+    def length(self, rev): return self.index[rev][1]
+    def end(self, rev): return self.start(rev) + self.length(rev)
+    def base(self, rev): return self.index[rev][2]
+
+    def revisions(self, list):
+        # this can be optimized to do spans, etc
+        # be stupid for now
+        for r in list:
+            yield self.revision(r)
+
+    def diff(self, a, b):
+        return mdiff.textdiff(a, b)
+
+    def patch(self, text, patch):
+        return mdiff.patch(text, patch)
+
+    def revision(self, node):
+        if node is nullid: return ""
+        if self.cache and self.cache[0] == node: return self.cache[2]
+
+        text = None
+        rev = self.rev(node)
+        base = self.base(rev)
+        start = self.start(base)
+        end = self.end(rev)
+
+        if self.cache and self.cache[1] >= base and self.cache[1] < rev:
+            base = self.cache[1]
+            start = self.start(base + 1)
+            text = self.cache[2]
+            last = 0
+
+        f = self.opener(self.datafile)
+        f.seek(start)
+        data = f.read(end - start)
+
+        if not text:
+            last = self.length(base)
+            text = decompress(data[:last])
+
+        for r in range(base + 1, rev + 1):
+            s = self.length(r)
+            b = decompress(data[last:last + s])
+            text = self.patch(text, b)
+            last = last + s
+
+        (p1, p2) = self.parents(node)
+        if self.node(rev) != hash(text, p1, p2):
+            raise "Consistency check failed on %s:%d" % (self.datafile, rev)
+
+        self.cache = (node, rev, text)
+        return text  
+
+    def addrevision(self, text, transaction, link, p1=None, p2=None):
+        if text is None: text = ""
+        if p1 is None: p1 = self.tip()
+        if p2 is None: p2 = nullid
+
+        node = hash(text, p1, p2)
+
+        n = self.count()
+        t = n - 1
+
+        if n:
+            start = self.start(self.base(t))
+            end = self.end(t)
+            prev = self.revision(self.tip())
+            if 0:
+                dd = self.diff(prev, text)
+                tt = self.patch(prev, dd)
+                if tt != text:
+                    print prev
+                    print text
+                    print tt
+                    raise "diff+patch failed"
+            data = compress(self.diff(prev, text))
+
+        # full versions are inserted when the needed deltas
+        # become comparable to the uncompressed text
+        if not n or (end + len(data) - start) > len(text) * 2:
+            data = compress(text)
+            base = n
+        else:
+            base = self.base(t)
+
+        offset = 0
+        if t >= 0:
+            offset = self.end(t)
+
+        e = (offset, len(data), base, link, p1, p2, node)
+        
+        self.index.append(e)
+        self.nodemap[node] = n
+        entry = struct.pack(indexformat, *e)
+
+        transaction.add(self.datafile, e[0])
+        self.opener(self.datafile, "a").write(data)
+        transaction.add(self.indexfile, n * len(entry))
+        self.opener(self.indexfile, "a").write(entry)
+
+        self.cache = (node, n, text)
+        return node
+
+    def ancestor(self, a, b):
+        def expand(e1, e2, a1, a2):
+            ne = []
+            for n in e1:
+                (p1, p2) = self.parents(n)
+                if p1 in a2: return p1
+                if p2 in a2: return p2
+                if p1 != nullid and p1 not in a1:
+                    a1[p1] = 1
+                    ne.append(p1)
+                if p2 != nullid and p2 not in a1:
+                    a1[p2] = 1
+                    ne.append(p2)
+            return expand(e2, ne, a2, a1)
+        return expand([a], [b], {a:1}, {b:1})
+
+    def mergedag(self, other, transaction, linkseq, accumulate = None):
+        """combine the nodes from other's DAG into ours"""
+        old = self.tip()
+        i = self.count()
+        l = []
+
+        # merge the other revision log into our DAG
+        for r in range(other.count()):
+            id = other.node(r)
+            if id not in self.nodemap:
+                (xn, yn) = other.parents(id)
+                l.append((id, xn, yn))
+                self.nodemap[id] = i
+                i += 1
+
+        # merge node date for new nodes
+        r = other.revisions([e[0] for e in l])
+        for e in l:
+            t = r.next()
+            if accumulate: accumulate(t)
+            self.addrevision(t, transaction, linkseq.next(), e[1], e[2])
+
+        # return the unmerged heads for later resolving
+        return (old, self.tip())

File mercurial/transaction.py

+# transaction.py - simple journalling scheme for mercurial
+#
+# This transaction scheme is intended to gracefully handle program
+# errors and interruptions. More serious failures like system crashes
+# can be recovered with an fsck-like tool. As the whole repository is
+# effectively log-structured, this should amount to simply truncating
+# anything that isn't referenced in the changelog.
+#
+# Copyright 2005 Matt Mackall <mpm@selenic.com>
+#
+# This software may be used and distributed according to the terms
+# of the GNU General Public License, incorporated herein by reference.
+
+import os
+
+class transaction:
+    def __init__(self, opener, journal):
+        self.opener = opener
+        self.entries = []
+        self.journal = journal
+
+        # abort here if the journal already exists
+        if os.path.exists(self.journal):
+            raise "Journal already exists!"
+        self.file = open(self.journal, "w")
+
+    def __del__(self):
+        if self.entries: self.abort()
+
+    def add(self, file, offset):
+        self.entries.append((file, offset))
+        # add enough data to the journal to do the truncate
+        self.file.write("%s\0%d\n" % (file, offset))
+        self.file.flush()
+
+    def close(self):
+        self.file.close()
+        self.entries = []
+        os.unlink(self.journal)
+
+    def abort(self):
+        if not self.entries: return
+
+        print "transaction abort!"
+
+        for f, o in self.entries:
+            self.opener(f, "a").truncate(o)
+
+        self.entries = []
+
+        try:
+            os.unlink(self.journal)
+            self.file.close()
+        except: pass
+
+        print "rollback completed"
+        
+    def recover(self):
+        for l in open(self.journal).readlines():
+            f, o = l.split('\0')
+            self.opener(f, "a").truncate(int(o))
+
+Some notes about Mercurial's design
+
+Revlogs:
+
+The fundamental storage type in Mercurial is a "revlog". A revlog is
+the set of all revisions to a file. Each revision is either stored
+compressed in its entirety or as a compressed binary delta against the
+previous version. The decision of when to store a full version is made
+based on how much data would be needed to reconstruct the file. This
+lets us ensure that we never need to read huge amounts of data to
+reconstruct a file, regardless of how many revisions of it we store.
+
+In fact, we should always be able to do it with a single read,
+provided we know when and where to read. This is where the index comes
+in. Each revlog has an index containing a special hash (nodeid) of the
+text, hashes for its parents, and where and how much of the revlog
+data we need to read to reconstruct it. Thus, with one read of the
+index and one read of the data, we can reconstruct any version in time
+proportional to the file size.
+
+Similarly, revlogs and their indices are append-only. This means that
+adding a new version is also O(1) seeks.
+
+Generally revlogs are used to represent revisions of files, but they
+also are used to represent manifests and changesets.
+
+Manifests:
+
+A manifest is simply a list of all files in a given revision of a
+project along with the nodeids of the corresponding file revisions. So
+grabbing a given version of the project means simply looking up its
+manifest and reconstruction all the file revisions pointed to by it.
+
+Changesets:
+
+A changeset is a list of all files changed in a check-in along with a
+change description and some metadata like user and date. It also
+contains a nodeid to the relevent revision of the manifest. Changesets
+and manifests are one-to-one, but contain different data for
+convenience.
+
+Nodeids:
+
+Nodeids are unique ids that are used to represent the contents of a
+file AND its position in the project history. That is, if you change a
+file and then change it back, the result will have a different nodeid
+because it has different history. This is accomplished by including
+the parents of a given revision's nodeids with the revision's text
+when calculating the hash.
+
+Graph merging:
+
+Nodeids are implemented as they are to simplify merging. Merging a
+pair of directed acyclic graphs (aka "the family tree" of the file
+history) requires some method of determining if nodes in different
+graphs correspond. Simply comparing the contents of the node (by
+comparing text of given revisions or their hashes) can get confused by
+identical revisions in the tree.
+
+The nodeid approach makes it trivial - the hash uniquely describes a
+revision's contents and its graph position relative to the root, so
+merge is simply checking whether each nodeid in graph A is in the hash
+table of graph B. If not, we pull them in, adding them sequentially to
+the revlog.
+
+Graph resolving:
+
+Mercurial does branching by copying (or COWing) a repository and thus
+keeps everything nice and linear within a repository. However, when a
+merge of repositories (a "pull") is done, we may often have two head
+revisions in a given graph. To keep things simple, Mercurial forces
+the head revisions to be merged.
+
+It first finds the closest common ancestor of the two heads. If one is
+a child of the other, it becomes the new head. Otherwise, we call out
+to a user-specified 3-way merge tool.
+
+Merging files, manifests, and changesets:
+
+We begin by comparing changeset DAGs, pulling all nodes we don't have
+in our DAG from the other repository. As we do so, we collect a list
+of changed files to merge.
+
+Then for each file, we perform a graph merge and resolve as above.
+It's important to merge files using per-file DAGs rather than just
+changeset level DAGs as this diagram illustrates: 
+
+M   M1   M2
+
+AB
+ |`-------v     M2 clones M
+aB       AB     file A is change in mainline
+ |`---v  AB'    file B is changed in M2
+ |   aB / |     M1 clones M
+ |   ab/  |     M1 changes B
+ |   ab'  |     M1 merges from M2, changes to B conflict
+ |    |  A'B'   M2 changes A
+  `---+--.|
+      |  a'B'   M2 merges from mainline, changes to A conflict
+      `--.|
+         ???    depending on which ancestor we choose, we will have
+	        to redo A hand-merge, B hand-merge, or both
+                but if we look at the files independently, everything
+		is fine
+
+After we've merged files, we merge the manifest log DAG and resolve
+additions and deletions. Then we are ready to resolve the changeset
+DAG - if our merge required any changes (the new head is not a
+decendent of our tip), we must create a new changeset describing all
+of the changes needed to merge it into the tip.
+
+Merge performance:
+
+The I/O operations for performing a merge are O(changed files), not
+O(total changes) and in many cases, we needn't even unpack the deltas
+to add them to our repository (though this optimization isn't
+necessary).
+
+Rollback:
+
+Rollback is not yet implemented, but will be easy to add. When
+performing a commit or a merge, we order things so that the changeset
+entry gets added last. We keep a transaction log of the name of each
+file and its length prior to the transaction. On abort, we simply
+truncate each file to its prior length. This is one of the nice
+properties of the append-only structure of the revlogs.
+
+Remote access:
+
+Mercurial currently supports pulling from "serverless" repositories.
+Simply making the repo directory accessibly via the web and pointing
+hg at it can accomplish a pull. This is relatively bandwidth efficient
+but no effort has been spent on pipelining, so it won't work
+especially well over LAN yet.
+
+It's also quite amenable to rsync, if you don't mind keeping an intact
+copy of the master around locally.
+
+Also note the append-only and ordering properties of the commit
+guarantee that readers will always see a repository in a consistent
+state and no special locking is necessary. As there is generally only
+one writer to an hg repository, there is in fact no exclusion
+implemented yet. 
+
+
+Some comparisons to git:
+
+Most notably, Mercurial uses delta compression and repositories
+created with it will grow much more slowly over time. This also allows
+it to be much more bandwidth efficient. I expect repos sizes and sync
+speeds to be similar to or better than BK, given the use of binary diffs.
+
+Mercurial is roughly the same performance as git and is faster in
+others as it keeps around more metadata. One example is listing and
+retrieving past versions of a file, which it can do without reading
+all the changesets. This metadata will also allow it to perform better
+merges as described above.
+
+
+#!/usr/bin/env python
+
+# This is the mercurial setup script. 
+#
+# './setup.py install', or
+# './setup.py --help' for more options
+
+from distutils.core import setup
+
+setup(name='mercurial',
+            version='0.4c',
+            author='Matt Mackall',
+            author_email='mpm@selenic.com',
+            url='http://selenic.com/mercurial',
+            description='scalable distributed SCM',
+            license='GNU GPL',
+            packages=['mercurial'],
+            scripts=['hg'])
+merge $1 $3 $2 || tkdiff -conflict $1 -o $1
+