python-peps / pep-0347.txt

PEP: 347
Title: Migrating the Python CVS to Subversion
Version: $Revision$
Last-Modified: $Date$
Author: Martin von Löwis <martin@v.loewis.de>
Discussions-To: <python-dev@python.org>
Status: Final
Type: Process
Content-Type: text/x-rst
Created: 14-Jul-2004
Post-History: 14-Jul-2004


Abstract
========

The Python source code is currently managed in a CVS repository on
sourceforge.net.  This PEP proposes to move it to a Subversion
repository on svn.python.org.


Rationale
=========

This change has two aspects: moving from CVS to Subversion, and moving
from SourceForge to python.org.  For each, a rationale will be given.


Moving to Subversion
--------------------

CVS has a number of limitations that have been eliminated by
Subversion.  For the development of Python, the most notable
improvements are:

- the ability to rename files and directories, and to remove
  directories, while keeping the history of these files.

- support for change sets (sets of correlated changes to multiple
  files) through global revision numbers.  Change sets are
  transactional.

- atomic, fast tagging: a cvs tag might take many minutes; a
  Subversion tag (svn cp) will complete quickly, and atomically.
  Likewise, branches are very efficient.

- support for offline diffs, which is useful when creating patches.


Moving to python.org
--------------------

SourceForge has kindly provided an important infrastructure for the
past years.  Unfortunately, the attention that SF received has also
caused repeated overload situations in the past, to which the SF
operators could not always respond in a timely manner.  In particular,
for CVS, they had to reduce the load on the primary CVS server by
introducing a second, read-only CVS server for anonymous access.  This
server is regularly synchronized, but lags behind the read-write CVS
repository between synchronizations.  As a result, users without
commit access can see recent changes to the repository only after a
delay.

On python.org, it would be possible to make the repository accessible
for anonymous access.


Migration Procedure
===================

To move the Python CVS repository, the following steps need to be
executed.  The steps are elaborated upon in the following sections.

1. Collect SSH keys for all current committers, along with usernames
   to appear in commit messages.

2. At the beginning of the migration, announce that the repository on
   SourceForge closed.

3. 24 hours after the last commit, download the CVS repository.

4. Convert the CVS repository into a Subversion repository.

5. Publish the repository with write access for committers, and
   read-only anonymous access.

6. Disable CVS access on SF.


Collect SSH keys
----------------

After some discussion, svn+ssh was selected as the best method
for write access to the repository. Developers can continue to
use their SSH keys, but they must be installed on python.org.

In order to avoid having to create a new Unix user for each
developer, a single account should be used, with command=
attributes in the authorized_keys files.

The lines in the authorized_keys file should read like this
(wrapped for better readability)::

  command="/usr/bin/svnserve --root=/svnroot -t
  --tunnel-user='<username>'",no-port-forwarding,
  no-X11-forwarding,no-agent-forwarding,no-pty
  ssh-dss <key> <comment>

As the usernames, the real names should be used instead of
the SF account names, so that people can be better identified
in log messages.

Administrator Access
--------------------

Administrator access to the pythondev account should be granted
to all current admins of the Python SF project. To distinguish
between shell login and svnserve login, admins need to maintain
two keys. Using OpenSSH, the following procedure can be
used to create a second key::

  cd .ssh
  ssh-keygen -t DSA -f pythondev -C <user>@pythondev
  vi config

In the config file, the following lines need to be added::

  Host pythondev
    Hostname dinsdale.python.org
    User pythondev
    IdentityFile ~/.ssh/pythondev

Then, shell login becomes possible through "ssh pythondev".

Downloading the CVS Repository
------------------------------

The CVS repository can be downloaded from

    http://cvs.sourceforge.net/cvstarballs/python-cvsroot.tar.bz2

Since this tarball is generated only once a day, some time must pass
after the repository freeze before the tarball can be picked up.  It
should be verified that the last commit, as recorded on the
python-commits mailing list, is indeed included in the tarball.

After the conversion, the converted CVS tarball should be kept
forever on www.python.org/archive/python-cvsroot-<date>.tar.bz2


Converting the CVS Repository
-----------------------------

The Python CVS repository contains two modules: distutils and python.
The python module is further structured into dist and nondist,
where dist only contains src (the python code proper). nondist
contains various subdirectories.

These should be reorganized in the Subversion repository to get
shorter URLs, following the <project>/{trunk,tags,branches}
structure.  A project will be created for each nondist directory,
plus for src (called python), plus distutils.  Reorganizing the
repository is best done in the CVS tree, as shown below.

The fsfs backend should be used as the repository format (which
requires Subversion 1.1).  The fsfs backend has the advantage of being
more backup-friendly, as it allows incremental repository backups,
without requiring any dump commands to be run.

The conversion should be done using the cvs2svn utility, available
e.g. in the cvs2svn Debian package.  As cvs2svn does not currently
support the project/trunk structure, each project needs to be
converted separately.  To get each conversion result into a separate
directory in the target repository, svnadmin load must be used.

Subversion has a different view on binary-vs-text files than CVS.
To correctly carry the CVS semantics forward, svn:eol-style should
be set to native on all files that are not marked binary in the
CVS.

In summary, the conversion script is::

  #!/bin/sh
  rm cvs2svn-*
  rm -rf python py.new
  tar xjf python-cvsroot.tar.bz2
  rm -rf python/CVSROOT
  svnadmin create --fs-type fsfs py.new
  mv python/python python/orig
  mv python/orig/dist/src python/python
  mv python/orig/nondist/* python
  # nondist/nondist is empty
  rmdir python/nondist
  rm -rf python/orig
  for a in python/*
  do
    b=`basename $a`
    cvs2svn -q --dump-only --encoding=latin1 --force-branch=cnri-16-start \
    --force-branch=descr-branch --force-branch=release152p1-patches \
    --force-tag=r16b1 $a
    svn mkdir -m"Conversion to SVN" file:///`pwd`/py.new/$b
    svnadmin load -q --parent-dir $b py.new < cvs2svn-dump
    rm cvs2svn-dump
  done

Sample results of this conversion are available at

    http://www.dcl.hpi.uni-potsdam.de/pysvn/


Publish the Repository
------------------------

The repository should be published at http://svn.python.org/projects.
Read-write access should be granted to all current SF committers
through svn+ssh://pythondev@svn.python.org/;
read-only anonymous access through WebDAV should also be
granted.

As an option, websvn (available e.g. from the Debian websvn package)
could be provided. Unfortunately, in the test installation, websvn
breaks because it runs out of memory.

The current SF project admins should get write access to the
authorized_keys2 file of the pythondev account.


Disable CVS
-----------

It appears that CVS cannot be disabled entirely.  Only the user
interface can be removed from the project page; the repository itself
remains available.  If desired, write access to the python and
distutils modules can be disabled through a CVS commitinfo entry.


Discussion
==========

Several alternatives had been suggested to the procedure above.
The rejected alternatives are shortly discussed here:

- create multiple repositories, one for python and one for
  distutils. This would have allowed even shorter URLs, but
  was rejected because a single repository supports moving code
  across projects.

- Several people suggested to create the project/trunk structure
  through standard cvs2svn, followed by renames. This would have
  the disadvantage that old revisions use different path names
  than recent revisions; the suggested approach through dump files
  works without renames.

- Several people also expressed concern about the administrative
  overhead that hosting the repository on python.org would cause
  to pydotorg admins.  As a specific alternative, BerliOS has been
  suggested.  The pydotorg admins themselves haven\'t objected
  to the additional workload; migrating the repository again if
  they get overworked is an option.

- Different authentication strategies were discussed. As
  alternatives to svn+ssh were suggested

  * Subversion over WebDAV, using SSL and basic authentication,
    with pydotorg-generated passwords mailed to the user. People
    did not like that approach, since they would need to store
    the password on disk (because they can't remember it); this
    is a security risk.

  * Subversion over WebDAV, using SSL client certificates. This would
    work, but would require us to administer a certificate authority.

- Instead of hosting this on python.org, people suggested hosting
  it elsewhere. One issue is whether this alternative should be
  free or commercial; several people suggested it should better
  be commercial, to reduce the load on the volunteers. In
  particular:

  * Greg Stein suggested http://www.wush.net/subversion.php. They
    offer 5 GB for $90/month, with 200 GB download/month.
    The data is on a RAID drive and fully backed up. Anonymous
    access and email commit notifications are supported. wush.net
    elaborated the following details:

    - The machine would be a Virtuozzo Virtual Private Server (VPS),
      hosted at PowerVPS.

    - The default repository URL would be http://python.wush.net/svn/projectname/,
      but anything else could be arranged

    - we would get SSH login to the machine, with sudo capabilities.

    - They have a Web interface for management of the various SVN
      repositories that we want to host, and to manage user accounts.
      While svn+ssh would be supported, the user interface does not
      yet support it.

    - For offsite mirroring/backup, they suggest to use rsync
      instead of download of repository tarballs.

    Bob Ippolito reported that they had used wush.net for a
    commercial project for about 6 months, after which time they
    left wush.net, because the service was down for three days,
    with nobody reachable, and no explanation when it came back.


Copyright
=========

This document has been placed in the public domain.


..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End:
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.