Issue #1166 resolved

gettext target can't handle non-ascii files

created an issue

When attempting to use a template with non-ASCII characters (in my case, a language switcher in layout.html), the export blows up with the following error:

# Sphinx version: 1.2b1
# Python version: 2.7.3
# Docutils version: 0.10 release
# Jinja2 version: 2.6
Traceback (most recent call last):
  File "lib/python2.7/site-packages/sphinx/", line 247, in main, filenames)
  File "lib/python2.7/site-packages/sphinx/", line 211, in build
  File "lib/python2.7/site-packages/sphinx/builders/", line 211, in build_update
    'out of date' % len(to_build))
  File "lib/python2.7/site-packages/sphinx/builders/", line 150, in build
  File "lib/python2.7/site-packages/sphinx/builders/", line 145, in _extract_from_template
    for line, meth, msg in extract_translations(context):
  File "lib/python2.7/site-packages/jinja2/", line 209, in _extract
    source = self.environment.parse(source)
  File "lib/python2.7/site-packages/jinja2/", line 391, in parse
    return self._parse(source, name, filename)
  File "lib/python2.7/site-packages/jinja2/", line 398, in _parse
    return Parser(self, source, name, _encode_filename(filename)).parse()
  File "lib/python2.7/site-packages/jinja2/", line 32, in __init__ = environment._tokenize(source, name, filename, state)
  File "lib/python2.7/site-packages/jinja2/", line 429, in _tokenize
    source = self.preprocess(source, name, filename)
  File "lib/python2.7/site-packages/jinja2/", line 423, in preprocess
    self.iter_extensions(), unicode(source))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 292: ordinal not in range(128)

Looking at the code, at a "TODO: encoding" comment is left. Jinja2 then blows up when coercing the template's source code to unicode.

I would suggest the encoding should be assumed utf-8 (rather than ascii) even if not configurable given Jinja2's template loaders all default to UTF-8 and the unicode section of the documentation specifically notes:

We recommend utf-8 as Encoding for Python modules and templates as it’s possible to represent every Unicode character in utf-8 and because it’s backwards compatible to ASCII. For Jinja2 the default encoding of templates is assumed to be utf-8.

(emphasis mine)

Comments (6)

  1. pcav

    Thanks for this. Any roadmap or estimated release time for it? BTW, any plan to add the language switch mantioned above as a standard feature? Thanks.

  2. Log in to comment