Source

peps / pep-0385.txt

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
PEP: 385
Title: Migrating from Subversion to Mercurial
Version: $Revision$
Last-Modified: $Date$
Author: Dirkjan Ochtman <dirkjan@ochtman.nl>,
        Antoine Pitrou <solipsis@pitrou.net>,
        Georg Brandl <georg@python.org>
Status: Final
Type: Process
Content-Type: text/x-rst
Created: 25-May-2009


Motivation
==========

After having decided to switch to the Mercurial DVCS, the actual
migration still has to be performed.  In the case of an important
piece of infrastructure like the version control system for a large,
distributed project like Python, this is a significant effort.  This
PEP is an attempt to describe the steps that must be taken for further
discussion.  It's somewhat similar to `PEP 347`_, which discussed the
migration to SVN.

To make the most of hg, we would like to make a high-fidelity
conversion, such that (a) as much of the svn metadata as possible is
retained, and (b) all metadata is converted to formats that are common
in Mercurial.  This way, tools written for Mercurial can be optimally
used.  In order to do this, we want to use the `hgsubversion`_
software to do an initial conversion.  This hg extension is focused on
providing high-quality conversion from Subversion to Mercurial for use
in two-way correspondence, meaning it doesn't throw away as much
available metadata as other solutions.

Such a conversion also seems like a good time to reconsider the
contents of the repository and determine if some things are still
valuable.  In this spirit, the following sections also propose
discarding some of the older metadata.

.. _PEP 347: http://www.python.org/dev/peps/pep-0347/
.. _hgsubversion: http://bitbucket.org/durin42/hgsubversion/


Timeline
========

The current schedule for conversion milestones:

- 2011-02-24: availability of a test repo at hg.python.org

  Test commits will be allowed (and encouraged) from all committers to
  the Subversion repository.  The test repository and all test commits
  will be removed once the final conversion is done.  The server-side
  hooks will be installed for the test repository, in order to test
  buildbot, diff-email and whitespace checking integration.

- 2011-03-05: final conversion (tentative)

  Commits to the Subversion branches now maintained in Mercurial will
  be blocked.  Developers should refrain from pushing to the Mercurial
  repositories until all infrastructure is ensured to work after their
  switch over to the new repository.


Transition plan
===============

Branch strategy
---------------

Mercurial has two basic ways of using branches: cloned branches, where
each branch is kept in a separate repository, and named branches,
where each revision keeps metadata to note on which branch it belongs.
The former makes it easier to distinguish branches, at the expense of
requiring more disk space on the client.  The latter makes it a little
easier to switch between branches, but all branch names are a
persistent part of history. [1]_

Differences between named branches and cloned branches:

* Tags in a different (maintenance) clone aren't available in the
  local clone
* Clones with named branches will be larger, since they contain more
  data

We propose to use named branches for release branches and adopt cloned
branches for feature branches.


History management
------------------

In order to minimize the loss of information due to the conversion, we
propose to provide several repositories as a conversion result:

* A repository trimmed to the mainline trunk (and py3k), as well as
  past and present maintenance branches -- this is called the
  "working" repo and is where development continues.  This repository has
  all the history needed for development work, including annotating
  source files with changes back up to 1990 and other common history-digging
  operations.

  The ``default`` branch in that repo is what is known as ``py3k`` in
  Subversion, while the Subversion trunk lives on with the branch name
  ``legacy-trunk``; however in Mercurial this branch will be closed.
  Release branches are named after their major.minor version, e.g. ``3.2``.

* A repository with the full, unedited conversion of the Subversion
  repository (actually, its /python subdirectory) -- this is called
  the "historic" or "archive" repo and will be offered as a read-only
  resource. [2]_

* One more repository per active feature branch; "active" means that
  at least one core developer asks for the branch to be provided.  Each
  such repository will contain both the feature branch and all ancestor
  changesets from mainline (coming from ``trunk`` and/or ``py3k`` in SVN).

Since all branches are present in the historic repo, they can later be
extracted as separate repositories at any time should it prove to be
necessary.

The final revision map between SVN revision numbers, Mercurial changesets
and SVN branch names will be made available in a file stored in the ``Misc``
directory.  Its format is as following::

    [...]
    88483 e65daae6cf4499a0863cb7645109a4798c28d83e issue10276-snowleopard
    88484 835cb57abffeceaff0d85c2a3aa0625458dd3e31 py3k
    88485 d880f9d8492f597a030772c7485a34aadb6c4ece release32-maint
    88486 0c431b8c22f5dbeb591414c154acb7890c1809df py3k
    88487 82cda1f21396bbd10db8083ea20146d296cb630b release32-maint
    88488 8174d00d07972d6f109ed57efca8273a4d59302c release27-maint
    [...]


Converting tags
---------------

The SVN tags directory contains a lot of old stuff.  Some of these are
not, in fact, full tags, but contain only a smaller subset of the
repository.  All release tags will be kept; other tags will be
included based on requests from the developer community.  We propose
to make the tag naming scheme consistent, in this style: ``v3.2.1a2``.


Author map
----------

In order to provide user names the way they are common in hg (in the
'First Last <user@example.org>' format), we need an author map to map
cvs and svn user names to real names and their email addresses.  We
have a complete version of such a map in the migration tools
repository (not publicly accessible to avoid leaking addresses to
harvesters).  The email addresses in it might be out of date; that's
bound to happen, although it would be nice to try and have as many
people as possible review it for addresses that are out of date.  The
current version also still seems to contain some encoding problems.


Generating .hgignore
--------------------

The .hgignore file can be used in Mercurial repositories to help
ignore files that are not eligible for version control.  It does this
by employing several possible forms of pattern matching.  The current
Python repository already includes a rudimentary .hgignore file to
help with using the hg mirrors.

Since the current Python repository already includes a .hgignore file
(for use with hg mirrors), we'll just use that.  Generating full
history of the file was debated but deemed impractical (because it's
relatively hard with fairly little gain, since ignoring is less
important for older revisions).


Repository size
---------------

A bare conversion result of the current Python repository weighs 1.9
GB; although this is smaller than the Subversion repository (2.7 GB)
it is not feasible.

The size becomes more manageable by the trimming applied to the
working repository, and by a process called "revlog reordering" that
optimizes the layout of internal Mercurial storage very efficiently.

After all optimizations done, the size of the working repository is
around 180 MB on disk.  The amount of data transferred over the
network when cloning is estimated to be around 80 MB.


Other repositories
------------------

There are a number of other projects hosted in svn.python.org's
"projects" repository.  The "peps" directory will be converted along
with the main Python one.  Richard Tew has indicated that he'd like the
Stackless repository to also be converted.  What other projects in the
svn.python.org repository should be converted?

There's now an initial stab at converting the Jython repository.  The
current tip of hgsubversion unfortunately fails at some point.
Pending investigation.

Other repositories that would like to converted to Mercurial can
announce themselves to me after the main Python migration is done, and
I'll take care of their needs.


Infrastructure
==============

hg-ssh
------

Developers should access the repositories through ssh, similar to the
current setup.  Public keys can be used to grant people access to a
shared hg@ account.  A hgwebdir instance also has been set up at
``hg.python.org`` for easy browsing and read-only access.  It is
configured so that developers can trivially start new clones (for
longer-term features that profit from development in a separate
repository).

Also, direct creation of public repositories is allowed for core developers,
although it is not yet decided which naming scheme will be enforced::

    $ hg init ssh://hg@hg.python.org/sandbox/mywork
    repo created, public URL is http://hg.python.org/sandbox/mywork


Hooks
-----

A number of hooks is currently in use.  The hg equivalents for these
should be developed and deployed.  The following hooks are being used:

* check whitespace: a hook to reject commits in case the whitespace
  doesn't match the rules for the Python codebase.  In a changegroup,
  only the tip is checked (this allows cleanup commits for changes
  pulled from third-party repos).  We can also offer a whitespace hook
  for use with client-side repositories that people can use; it could
  either warn about whitespace issues and/or truncate trailing
  whitespace from changed lines.

* push mails: Emails will include diffs for each changeset pushed
  to the public repository, including the username which pushed the
  changesets (this is not necessarily the same as the author recorded
  in the changesets).

* buildbots: the python.org build master will be notified of each changeset
  pushed to the ``cpython`` repository, and will trigger an appropriate build
  on every build slave for the branch in which the changeset occurs.

The `hooks repository`_ contains ports of these server-side hooks to
Mercurial, as well as a couple additional ones:

* check branch heads: a hook to reject pushes which create a new head on
  an existing branch.  The pusher then has to merge the excess heads
  and try pushing again.

* check branches: a hook to reject all changesets not on an allowed named
  branch.  This hook's whitelist will have to be updated when we want to
  create new maintenance branches.

* check line endings: a hook, based on the `eol extension`_, to reject all
  changesets committing files with the wrong line endings.  The commits then
  have to be stripped and redone, possibly with the `eol extension`_ enabled
  on the comitter's computer.

One additional hook could be beneficial:

* check contributors: in the current setup, all changesets bear the
  username of committers, who must have signed the contributor
  agreement.  We might want to use a hook to check if the committer is
  a contributor if we keep a list of registered contributors.  Then,
  the hook might warn users that push a group of revisions containing
  changesets from unknown contributors.

.. _hooks repository: http://hg.python.org/hooks/


End-of-line conversions
-----------------------

Discussion about the lack of end-of-line conversion support in
Mercurial, which was provided initially by the `win32text extension`_,
led to the development of the new `eol extension`_ that supports a
versioned management of line-ending conventions on a file-by-file
basis, akin to Subversion's ``svn:eol-style`` properties.  This
information is kept in a versioned file called ``.hgeol``, and such a
file has already been checked into the Subversion repository.

A hook also exists on the server side to reject any changeset
introducing inconsistent newline data (see above).

.. _eol extension: http://mercurial.selenic.com/wiki/EolExtension
.. _win32text extension: http://mercurial.selenic.com/wiki/Win32TextExtension


hgwebdir
--------

A more or less stock hgwebdir installation should be set up.  We might
want to come up with a style to match the Python website.

A small WSGI application has been written that can look up
Subversion revisions and redirect to the appropriate hgweb page for
the given changeset, regardless in which repository the converted
revision ended up (since one big Subversion repository is converted
into several Mercurial repositories).  It can also look up Mercurial
changesets by their hexadecimal ID.


roundup
-------

By pointing Roundup to the URL of the lookup script mentioned above,
links to SVN revisions will continue to work, and links to Mercurial
changesets can be created as well, without having to give repository
*and* changeset ID.


After migration
===============

Where to get code
-----------------

After migration, the hgwebdir will live at hg.python.org.  This is an
accepted standard for many organizations, and an easy parallel to
svn.python.org.  The working repo might live at
http://hg.python.org/cpython/, for example, with the archive repo at
http://hg.python.org/cpython-archive/.  For write access, developers
will have to use ssh, which could be ssh://hg@hg.python.org/cpython/.

code.python.org was also proposed as the hostname.  We think that
using the VCS name in the hostname is good because it prevents
confusion: it should be clear that you can't use svn or bzr for
hg.python.org.

hgwebdir can already provide tarballs for every changeset.  This
obviates the need for daily snapshots; we can just point users to
tip.tar.gz instead, meaning they will get the latest.  If desired, we
could even use buildbot results to point to the last good changeset.


Python-specific documentation
-----------------------------

hg comes with good built-in documentation (available through hg help)
and a `wiki`_ that's full of useful information and recipes, not to
mention a popular `book`_ (readable online).

In addition to that, the recently overhauled `Python Developer's
Guide`_ already has a branch with instructions for Mercurial instead
of Subversion; an online `build of this branch`_ is also available.

.. _Python Developer's Guide: http://docs.python.org/devguide/
.. _build of this branch: http://potrou.net/hgdevguide/
.. _wiki: http://mercurial.selenic.com/wiki/
.. _book: http://hgbook.red-bean.com/

Proposed workflow
-----------------

We propose two workflows for the migration of patches between several
branches.

For migration within 2.x or 3.x branches, we propose a patch always
gets committed to the oldest branch where it applies first.  Then, the
resulting changeset can be merged using hg merge to all newer branches
within that series (2.x or 3.x).  If it does not apply as-is to the
newer branch, hg revert can be used to easily revert to the
new-branch-native head, patch in some alternative version of the patch
(or none, if it's not applicable), then commit the merge.  The premise
here is that all changesets from an older branch within the series are
eventually merged to all newer branches within the series.

The upshot is that this provides for the most painless merging
procedure.  This means that in the general case, people have to think
about the oldest branch to which the patch should be applied before
actually applying it.  Usually, that is one of only two branches: the
latest maintenance branch and the trunk, except for security fixes
applicable to older branches in security-fix-only mode.

For merging bug fixes from the 3.x to the 2.7 maintenance branch (2.6
and 2.5 are in security-fix-only mode and their maintenance will
continue in the Subversion repository), changesets should be
transplanted (not merged) in some other way.  The transplant
extension, import/export and bundle/unbundle work equally well here.

Choosing this approach allows 3.x not to carry all of the 2.x
history-since-it-was-branched, meaning the clone is not as big and the
merges not as complicated.


The future of Subversion
------------------------

What happens to the Subversion repositories after the migration?
Since the svn server contains a bunch of repositories, not just the
CPython one, it will probably live on for a bit as not every project
may want to migrate or it takes longer for other projects to migrate.
To prevent people from staying behind, we may want to move migrated
projects from the repository to a new, read-only repository with a new
name.


Build identification
--------------------

Python currently provides the sys.subversion tuple to allow Python
code to find out exactly what version of Python it's running against.
The current version looks something like this:

* ('CPython', 'tags/r262', '71600')
* ('CPython', 'trunk', '73128M')

Another value is returned from Py_GetBuildInfo() in the C API, and
available to Python code as part of sys.version:

* 'r262:71600, Jun  2 2009, 09:58:33'
* 'trunk:73128M, Jun  2 2009, 01:24:14'

I propose that the revision identifier will be the short version of
hg's revision hash, for example 'dd3ebf81af43', augmented with '+'
(instead of 'M') if the working directory from which it was built was
modified.  This mirrors the output of the hg id command, which is
intended for this kind of usage.  The sys.subversion value will also
be renamed to sys.mercurial to reflect the change in VCS.

For the tag/branch identifier, I propose that hg will check for tags
on the currently checked out revision, use the tag if there is one
('tip' doesn't count), and uses the branch name otherwise.
sys.subversion becomes

* ('CPython', 'v2.6.2', 'dd3ebf81af43')
* ('CPython', 'default', 'af694c6a888c+')

and the build info string becomes

* 'v2.6.2:dd3ebf81af43, Jun  2 2009, 09:58:33'
* 'default:af694c6a888c+, Jun  2 2009, 01:24:14'

This reflects that the default branch in hg is called 'default'
instead of Subversion's 'trunk', and reflects the proposed new tag
format.

Mercurial also allows to find out the latest tag and the number of
changesets separating the current changeset from that tag, allowing for
a descriptive version string::

    $ hg parent --template "{latesttag}+{latesttagdistance}-{node|short}\n"
    v3.2+37-4b5d0d260e72
    $ hg up 2.7
    3316 files updated, 0 files merged, 379 files removed, 0 files unresolved
    $ hg parent --template "{latesttag}+{latesttagdistance}-{node|short}\n"
    v2.7.1+216-9619d21d8198


Footnotes
=========

.. [1] The Mercurial book discourages the use of named branches, but
   it is, in this respect, somewhat outdated.  Named branches have
   gotten much easier to use since that comment was written, due to
   improvements in hg.

.. [2] Since the initial working repo is a subset of the archive repo,
   it would also be feasible to pull changes from the working repo
   into the archive repo periodically.


Copyright
=========

This document has been placed in the public domain.



..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End: