[regression] UnicodeEncodeError in AsStringVisitor.visit_functiondef

Create issue
Issue #273 closed
notsqrt created an issue

Hi !

Just found a regression with astroid 1.4

In https://bitbucket.org/logilab/astroid/commits/9367f99b6d66f63b482f32da5d944e56fb2fdb02#Lastroid/as_string.pyF271, the visit_functiondef method switched from % to .format, which fails on unicode characters.

With python 2.7:

>>> b"%s" % u"\u2019"
u'\u2019'
>>> b"{}".format(u"\u2019")
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 0: ordinal not in range(128)

Which means that docstrings in particular can't contain non-ascii characters..

Comments (9)

  1. notsqrt reporter

    Can't get my head around mercurial ...

    Here is a patch:

    diff -r 90a41f5f9e8f astroid/as_string.py
    --- a/astroid/as_string.py  Sun Nov 29 22:45:14 2015 +0200
    +++ b/astroid/as_string.py  Mon Nov 30 20:22:29 2015 +0100
    @@ -287,13 +287,13 @@
                 trailer = return_annotation + ":"
             else:
                 trailer = ":"
    -        def_format = "\n{decorators}def {name}({args}){trailer}{docs}\n{body}"
    -        return def_format.format(decorators=decorate,
    -                                 name=node.name,
    -                                 args=node.args.accept(self),
    -                                 trailer=trailer,
    -                                 docs=docs,
    -                                 body=self._stmt_list(node.body))
    +        def_format = "\n%sdef %s(%s)%s%s\n%s"
    +        return def_format % (decorate,
    +                             node.name,
    +                             node.args.accept(self),
    +                             trailer,
    +                             docs,
    +                             self._stmt_list(node.body))
    
         def visit_generatorexp(self, node):
             """return an astroid.GeneratorExp node as string"""
    diff -r 90a41f5f9e8f astroid/tests/unittest_regrtest.py
    --- a/astroid/tests/unittest_regrtest.py    Sun Nov 29 22:45:14 2015 +0200
    +++ b/astroid/tests/unittest_regrtest.py    Mon Nov 30 20:22:29 2015 +0100
    @@ -282,6 +282,23 @@
             ''')
             self.assertRaises(exceptions.InferenceError, next, node.infer())
    
    +    def test_unicode_in_docstring(self):
    +        # Crashed for astroid==1.4.1
    +        # Test for https://bitbucket.org/logilab/astroid/issues/273/
    +
    +        # In a regular file, "coding: utf-8" would have been used.
    +        node = extract_node(u'''
    +        from __future__ import unicode_literals
    +
    +        class MyClass(object):
    +            def method(self):
    +                "With unicode : %s "
    +
    +        instance = MyClass()
    +        ''' % u"\u2019")
    +
    +        next(node.value.infer()).as_string()
    +
    
     class Whatever(object):
         a = property(lambda x: x, lambda x: x)
    
  2. metaist

    Thank you! This was a major headache today when a dependency (path.py) used a Unicode character in its docs and suddenly pylint was dying.

  3. Log in to comment