Source

gsoc-2010 / application.rst

Porting Mercurial to Python 3

Author: Renato Cunha

Abstract

Mercurial is growing in popularity with many software developers. Knowing that the future of Python development is on the python 3 branch, and that the adoption of Python 3.x depends on having available tools and libraries for it, this project proposal intends to deal with a mercurial port so that it can run on Python 3.x.

Outline of this proposal

As the project described in this proposed is not trivial by any means, this section provides an outline of what I consider the main parts of the work needed. The following sections will, then, describe the listed activities and the time expected for finishing the work.

  1. Deliverables
  2. Studying relevant material for the porting work
  3. Diving into mercurial's code
  4. Python code porting approaches
  5. Porting the core python code
  6. Porting C extension modules
  7. Porting the mercurial extensions
  8. Conclusion

Deliverables

As aforementioned, the objective of the project described in this proposal is to implement a Python3 compatible mercurial port.

Ideally, the full range of mercurial commands/operations will be fully ported, along with official extensions shipped with mercurial. This ideal might not be reachable within the three months of work dedicated to work on it, even though it looks like a plausible task today.

That being said, a project that doesn't fully port mercurial to python 3 shouldn't be considered a failure. It is my belief that the porting process can be gradual, and that once the most basic commands/operations have been ported, the others will become more viable. Unfortunately I can't tell in advance what should be the set of ported features to consider the outcome of this project a success, but I believe that with a few interactions with the mentoring organization can shed some light in this issue.

Studying relevant material for the porting work

The Python community has produced various documents describing the changes between python 2.x and python 3.x [1], [2], a tool, 2to3 [3], to help on porting the "trivial" changes, documents outlining the porting process for general python projects [4] [5] (and extension modules [6]), reports describing the hard-earned lessons on porting specific projects [7] [8] and projects demonstrating the incompatibilities between python 2 and 3 [9].

Given that there are so many resources available for guiding a porting work, I would, naturally, begin by reviewing them so I can be fully aware of what will be expecting me. The insights acquired from this may even help me identify unforeseen problems in this proposal and fix them before the program actually starts.

The steps described here might take place in the interim period, between April 9 and April 21.

Diving into mercurial's code

Mercurial is composed of more than 200 python modules, some C extension modules, plus mercurial extensions, test suites, and documentation. This amounts to approximately 43000 source lines of code (according to cloc [10]).

Of the 66 python modules in the mercurial package directory, only 8 (of which two are package bookkeeping modules - __init__.py & __version__.py) are successfully parsed by python3. Even after a call to 2to3, only 10 modules are parsed successfully by python3, which reinforces the non-triviality of this project.

Knowing that mercurial makes heavy use of both strings as byte sequences and as text, I'd use the knowledge obtained in the task described in the previous section to identify the main candidates of causing porting problems while studying mercurial's source code.

I'd probably need to talk to developers or dig the wiki to know which are the most important parts of the program, so I can have a basic working implementation as soon as possible.

Note

"Basic" is a kind of fuzzy definition. But I think this work should be done in parts. So, if I can come up with a port that initially only creates an empty repository, I'd be happy with it. Then I would incrementally improve it until it is able to work with the full range of mercurial operations.

Time estimates

This part will probably take place in the "Community Bonding Period" described in GSoC's timeline page [11] and probably will span throughout all the implementation period, as each module's code nuances might be discovered while I'll be working on them.

Python code porting approaches

There are a few approaches that can be taken while working on this project. They are, as described in the Python wiki (all of them involves working in a separately cloned mercurial development repository):

  1. Implementing a Python 3-only version.
  2. Making a 2.6 (2.7) and 3.x compatible version;
  3. Implementing an abstraction layer that maintains mercurial compatible to all the currently supported versions, plus 3.x.

According to the information in the python wiki [12], since code written for python 2.6 (and 2.7) can be made forward-compatible with python 3, options one and two can be merged. Should that option be selected by the mentors, I'd probably use the method outlined in PEP3000 [13], which is:

  1. Port mercurial to Python 2.6. (Already done)
  2. Turn on the Py3k warnings mode.
  3. Test and edit until no warnings remain.
  4. Use the 2to3 tool to convert this source code to 3.x syntax.
  5. Test the converted source code under 3.x.
  6. If problems are found, make corrections to the 2.6 version of the source code and go back to step 3.
  7. When it's time to release, release separate 2.6 and 3.x tarballs (or use distutils' 2to3 integration).

For the testing part, the mercurial test suite would be an invaluable tool to verify that here were no regressions. Logging all the output generated by mercurial then would, then generate the warnings that will most likely need to be fixed.

As describe in porting to py3k reports, 2to3 makes some semantically incorrect changes, and for each module, I'd try to isolate this code from 2to3.

From a user point of view, option number three is the most attractive, since it would support the widest range of users. Like one would expect, it is also the hardest to implement, and I'd prefer to discard it as an option. But, as an exercise, I'll try to describe the work involved in this.

Since there is no support in python 2.5 and earlier from most of the changes introduced in py3k (which 2.6 is, at least, aware of), some calls in the current mercurial code base would need to be translated to another calls (like converting the use of the print statement to calls to sys.stdout.write) or an abstraction layer would be implemented. In some cases, both approaches should be needed. Like in the case of python 2.x expecting str objects for some operations, while python 3.x would require unicode objects.

Some sources of inspiration for this approach can be found in the repository of the django py3k port [14].

Porting the core python modules

Porting the core python modules will be the most work-intensive part of this project, since this is where most of the code lies. As described in the previous section, one porting approach needs to be chosen before work starts on this task. As already discussed, I'd prioritize working first on the most used modules to have a basic working port as soon as possible. For that, I'd need to analyze the code and get some insights with the core mercurial developers.

After finishing this basic port (like making a version capable of init'ing a repository), I'd start work on the other modules (in the same prioritized scheme) to bring the rest of the functionality to the py3k port.

Time estimates

Given the size of mercurial's code, this part can take an arbitrary amount of work as was discussed in the "Deliverables" section.

Given the limited amount of time to work on GSoC, I'd fixate an upper limit of two months and a half to work in this part. Should the complete port of the core be complete before this time is due, I'd start working in the extensions shipped with mercurial.

Porting the C extension modules

Currently, mercurial uses a few (six, in the core, plus one extension, inotify) C extension modules, written mostly with performance in mind.

Considering that this code is small, two approaches can be taken:

  1. Porting the python interfaces to python 3 using conditional compilation to separate API calls from incompatible versions;
  2. Implementing python-only modules to substitute the C versions in the py3k port.

Though option 2 is interesting from the point of view that it could be usable for eventual mercurial ports to other pythons [15] [16], I'd probably try to adapt the existing C code to py3k according to the python documentation [17]. Unless the mentoring organization prefers option 2.

Time estimates

This task overlaps with the task of porting the python modules to py3k. Given that the C modules implement basic patching and diff operations, this task would have to be completed as soon as someone wants diff/patch support in the py3k port.

In an optimistic scenario, I believe one week should be enough to port and test this part in the Linux, Windows and Mac OS X Operating systems.

Note

I'd probably have a bit of trouble in working with Windows, since it has been quite some time since I last developed for it. Should I have problems with that platform, it is likely that this activity will take more than one week. Possibly two.

Porting the mercurial extensions

The core mercurial extensions are written in pure python. Even though they extend mercurial's behavior in many interesting ways, they aren't required for it to work properly, and, thus, the porting of this part of mercurial would receive the least priority in my project.

Time allowing for work on the extensions, I'd use the very same approach, the possible approaches were discussed in the "Python code porting approaches" section, used to port mercurial's core.

Time estimates

The time allocated to this task would be the total time to finish the GSoC project minus the time needed to finish the porting of mercurial's core (if it is positive or zero).

Conclusion

This proposal outlined the approach I'd take to port mercurial to run on py3k while presenting the major tasks involved in said project, some of the possible approaches that one could take and a brief discussion of what should be delivered by the end of this project.

It is my belief that this project would be beneficial for both the mercurial and the python communities, since it would enable more users to try development with py3k and because it would also prepare mercurial for python's future.