Issue #6 resolved

issue with unicode doctest - string repr

Johan Harjono
created an issue

Problem: doctests expect a normal string in Python 3.x because in 3.x all strings are unicode by default. However it gets a unicode string in 2.x format


File "/home/johan/workspace/fiji/build/tests/modeltests/basic/", line ?, in modeltests.basic.models.test.API_TESTS Failed example: a6.headline Expected: u'Default headline' Got: 'Default headline' }}}

Comments (15)

  1. Johan Harjono reporter

    2to3 probably won't help, the issue is that running a python program through 2to3 would change unicoded string to simple string on Python3.x

    # in Python 2.x
    s = u'I am the unicode! Fear Me!'


    #after running 2to3
    s = "I am the unicode! Fear Me!"


    # in Python 2.x
    s = "u'I become the unicode within a string, the destroyer of doctests"


    # after running 2to3
    s = "u'I become the unicode within a string, the destroyer of doctests"

    As of now, 2to3 has no support for this, see this open bug:

    It is extremely likely then that we would have to create some sort of "glue code" as bug 3020 calls it

  2. Johan Harjono reporter

    Suppose we have this

    def blah():
        # normal doctest
        >>> simple = 1 + 1
        >>> print(simple)
        # a unicode character
        >>> a = u'\u1234'
        >>> a
    if __name__ == "__main__":
        import doctest

    after running 2to3 with the -d flag that enable doctest conversion

    def blah():
        # normal doctest
        >>> simple = 1 + 1
        >>> print(simple)
        # a unicode character
        >>> a = '\u1234' # this line gets converted
        >>> a
        u'\u1234' # but this line does not get converted
    if __name__ == "__main__":
        import doctest

    Mmm could it be I'm calling 2to3 with the wrong flag? or maybe it's my 2to3 installation?

  3. Johan Harjono reporter

    So I asked Martin v. Lowis about his opinion on how we should handle this problem.

    It seems that doctest conversion doesn't attempt to convert the output
    at all - nor can it probably do so in any reasonable way. The output may
    not be Python code at all, so whether or not a "leading" u should be
    removed depends on how it gets produced.
    I (now) think that this should be resolved in doctest somehow, where I
    can see two options:
    a) the repr output is generated somehow such that the u"" prefix is
    uniformly used for both 2.x and 3.x (i.e. either it is generated in both
    versions, or in neither). This can be achieved, in principle, using
    sys.displayhook. In 2.x, it would be possible to remove the u"" prefix
    after calling repr(); alternatively adding it in 3.x would be more
    difficult (one would somehow have to convince str.__repr__ to work
    differently, or use pprint in the first place).
    b) the doctest result comparison method is told to ignore differences
    that only affect a leading u"" prefix in 3.x.
    I think this should also be discussed on python-dev; I'll do that.

    So it seems to be more of a design issue with 2to3 rather than a bug, and I suspect we'll hear back from Martin about how we should handle this case once he finish discussing with #python-dev

    I'll put this issue on hiatus for now, in the meantime please find other issues to work on if you suspect your problem can be traced to this one

  4. Johan Harjono reporter

    Martin has got back to us

    There are two fractions: one proposes to rewrite the test cases to rely
    on print instead of repr where reasonable. E.g. if the current test is
    then replacing that with
    >>> print
    should still preserve the test purpose in most cases (it's an inherent
    flaw in doctesting that the test purpose is unclear in many cases).
    [Of course, you would then still need to run 2to3 on the test case,
    so that the print statement gets replaced with the print function]
    The other fraction (including GvR) recommends approaches that require no
    changes to the test suite; the recommended approach would be to set
    sys.displayhook - e.g. in 3.x, to change the repr of strings to include
    a leading u"".

    I am personally inclined to apply the latter approach (the recommended one) since it requires no changes to the test suite. Expect a fix by Friday or Saturday

  5. Johan Harjono reporter

    managed to figure out how to use sys.displayhook and implemented Martin's recommended approach. It passes the failing test cases ... but introduced false positives

    so now we have:

    File "/home/johan/workspace/fiji/build/tests/modeltests/basic/", line ?, in modeltests.basic.models.__test__.API_TESTS
    Failed example:
        'Article 6'
        u'Article 6'

    which is the opposite of our original problem, there's a perverse balance in this somewhere

  6. Log in to comment