Source

django-3k / fiji_release_notes.txt

Full commit
--------------------------------------------------------------------------------
					Team Fiji Release Notes
--------------------------------------------------------------------------------

Our goal was this project was to port the Django source code from its native 
Python 2 to Python 3. This release does not manage to achieve a complete port. 
Instead, we ported everything using a Python converter called Python 2to3 to get
a base Python 3 code, and then fix the test suite so that it would completely 
pass. The test cases we focused on were related to the SQLite3 backend and web 
development server.

The following document outlines Team Fiji's progress, which includes the tasks 
taken on, any difficulty encountered, and all the solutions implemented.

--------------------------------------------------------------------------------
			Conversion specific issues relating to doctests
--------------------------------------------------------------------------------

A good portion of Django's test suite is comprised of doctests. The initial 
conversion process with Martin v. Lowis' django-3k however yields a surprisingly
bad result for a doctest test-suite run. We found a number of issues specific 
to Python 2 vs. Python 3 implementation that contribute to a large number of 
failing doctest cases, which we will detail in the following documentation.

First of all, please note that Django's _doctest.py module is the module 
responsible for the implementation and execution of doctest functionalities. 
Extra caution should be made to distinguish between Django's _doctest.py and 
Python's doctest.py, the latter being a module that the Python community 
actively supports and the former being Django community's own adaptation of 
doctest.py, which enables us to implement a lot of interesting custom 
functionalities within the doctest framework. The _doctest.py module can be 
found in test/_doctest.py.

The most immediate issue that we fixed was the fact that all doctest cases 
testing for an expected Exception all returned errors and aborted the running of
the test suite, which made it hard for developers to gauge the extent to which 
the test cases work in Python 3. This issue originates from the fact that 
Python 3's exception stack trace mechanism had changed from its 2.x 
predecessor. Appropriate restructuring of the doctest code has been applied to 
Python's doctest.py, but since Django's _doctest.py differ from Python's a 
custom fix had to be applied.

Another prevalent issue that affect the entire test suite was the fact that 
existing doctest cases expect the old 2.x Unicoded string, whereas in Python 3.x
all strings are already Unicode. A vast majority of test cases that 
specifically test for Unicode compatibility still expect strings output that 
begin with a leading u (which denote Unicode type string in Python 2.x) and 
consequently fail when Python 3.x return a normal string without leading u. 
To address this issue, we implemented a custom displayhook in _doctest.py that 
intercept output of Python doctest results and strip away leading u'' and b'' 
(which denote byte string, a special type of string in Python 3.x but 
non-existent in 2.x). We had invested efforts in making sure that the custom 
py3_displayhook can withstand the variety of test cases conceived of by Django 
developers, including but not limited to: output of strings, lists, tuples, 
dictionaries, looped outputs, reprs of data types, single quotes vs. double 
quotes, None types, etc.

Another issue related to disparity between Python 2.x and 3.x had to do with the
changes made by the Python community in regards to error messages. For 
instance, the wording of an error message for TypeError had changed slightly in
Python 3.x and thus caused doctests to fail, because they had not expected the 
different wording. We had fixed this issue.

In other cases where the fix would be difficult to implement in _doctest.py, the
change had to made to the actual test case. For instance, if the doctest wants 
a bytestring representation of a unicode value as the expected ouput, then the 
source has to be forced to display in bytestring representation as well because
python3.x by default converts bytestrings to unicode. Another case where 
modification to the test case is needed are in test cases where it calls the 
'print' function. By calling 'print' on a variable, the sequence of execution 
bypasses our custom displayhook, therefore we remove the 'print' function to 
force the test case to go through our displayhook.

Another issue related to the Python 2 to 3 conversion was improper list 
wrapping. Some functions, such as SQL queries, would have their results 
converted to a list type. However, 2to3 would not wrap the SQL query properly.
However, an issue for this could not be implemented because the advice of 
Martin van Loewis was needed, and he did not reply in time.

--------------------------------------------------------------------------------
							String Implementation
--------------------------------------------------------------------------------

One issue between Python 2 to 3 was the differences between the implementations
of strings. Python 2 used both Unicode and byte strings; Python 3 only has 
Unicode strings, as well as much better support for them. One of the issues was
the encoding and decoding of byte strings in Python 2. These operations were 
removed from Python 3 because conversion between byte and unicode strings is 
unnecessary. However, a workaround for this was achieved by simply removing 
these conversions. The internal documentation in Django however details that 
byte strings should not be in use, and the conversion is unnecessary anyway.

Another method by which we fixed common, recurring errors was the creation and
utilise use of a Python 3 module with some utility functions, mostly for the
encoding of Python 2 strings to UTF8 strings and bytes, if it detects that
it is running on Python 3. This fix removed many of the different TypeErrors,
and also fixed a fair amount of buffer API errors. Overall, it was instrumental
in resolving many errors in regression tests.

Unicode and bytestring issues were also common in the salted hashing process. 
The function sha_construct was given Bytes instead of Unicode, which was fixed 
in a similar fashion as above, which greatly reduced the quantity of errors in
the comment_test module of regressiontests.

--------------------------------------------------------------------------------
							Dictionary Methods
--------------------------------------------------------------------------------

In Python 3, the Python 2 method for sorting dictionary items was removed. A
simple workaround for this was using the general sorted() method. 2to3 encapsu-
lated calls to dictionaries' keys(), values(), and items() methods, as well as
iterkeys(), etc. Some methods in Django were being converted into infinitely
recursive methods as a result. This was fixed by rewriting the methods to be
functionally equivalent but not match 2to3's fixer pattern. Use of this general
fix of rewriting code to solve problems caused by 2to3 conversion errors has
been common to many of our individual fixes.

2to3's overwriting of dictionary methods also caused some problems in the cache
code. The dictionary fixer was overwriting all calls to .has_key(). This caused
another max recursion error when trying to return a key. Again, after finding
the code causing the problem, it was fixed by rephrasing the call to self.has_key
to an equivalent but non-overwritten call.

Related to the solution for the dictionary fixer problems, 2to3 was also over-
writing certain import calls it should not have been. This caused a myriad of
errors in the test suite as well as a significant amount of failures. Finding
and rewriting the guilty line or two of code to avoid inadvertent overwriting
was a recurring theme. Originally, it was our intent to resolve many of these
issues by excluding fixers from the 2to3 packages, or by adding additional 
fixers to correct the errors of the current ones. Rephrasing snippets of code 
to avoid the overwriting was so much faster and easier than this, and so many
of the changes to the regressiontest code were done in this fashion we dropped 
custom fixers altogether.