Issue #705 resolved

Encoding error when source file encoding and preferred encoding is different

Joongi Kim
created an issue

I'm using the latest trunk version of Sphinx for Windows 7 64bit Korean + Python 3.2 64bit. I found that I cannot compile my source codes written in UTF-8 encoded docstrings with autodoc extension.

The following is the error trace: (ignore "Python31", it's a just old directory name which I overwrote with Python 3.2.) {{{ G:\Projects\SOME_PROJECT\docs>python32 C:\Development\Python31\Scripts\ -P -b html . _build/h tml Running Sphinx v1.1pre loading pickled environment... not yet created building [html]: targets for 3 source files that are out of date updating environment: 3 added, 0 changed, 0 removed reading sources... [ 33%] crawler Exception occurred while building, starting debugger: Traceback (most recent call last): File "C:\Development\Python31\lib\site-packages\sphinx-1.1predev_20110526-py3.2.egg\sphinx\", line 762, in read_doc pub.publish() File "C:\Development\Python31\lib\site-packages\docutils-0.7-py3.2.egg\docutils\", line 203, in publish self.settings) File "C:\Development\Python31\lib\site-packages\docutils-0.7-py3.2.egg\docutils\", line 69, in read self.parse() File "C:\Development\Python31\lib\site-packages\docutils-0.7-py3.2.egg\docutils\", line 75, in parse self.parser.parse(self.input, document) File "C:\Development\Python31\lib\site-packages\docutils-0.7-py3.2.egg\docutils\parsers\", line 157, in parse, document, inliner=self.inliner) File "C:\Development\Python31\lib\site-packages\docutils-0.7-py3.2.egg\docutils\parsers\rst\", line 170, in run input_source=document['source']) File "C:\Development\Python31\lib\site-packages\docutils-0.7-py3.2.egg\docutils\", line 233, in run context, state, transitions) File "C:\Development\Python31\lib\site-packages\docutils-0.7-py3.2.egg\docutils\", line 453, in check_line return method(match, context, next_state) File "C:\Development\Python31\lib\site-packages\docutils-0.7-py3.2.egg\docutils\parsers\rst\", line 2706, in underline self.section(title, source, style, lineno - 1, messages) File "C:\Development\Python31\lib\site-packages\docutils-0.7-py3.2.egg\docutils\parsers\rst\", line 329, in section self.new_subsection(title, lineno, messages) File "C:\Development\Python31\lib\site-packages\docutils-0.7-py3.2.egg\docutils\parsers\rst\", line 398, in new_subsection node=section_node, match_titles=1) File "C:\Development\Python31\lib\site-packages\docutils-0.7-py3.2.egg\docutils\parsers\rst\", line 284, in nested_parse node=node, match_titles=match_titles) File "C:\Development\Python31\lib\site-packages\docutils-0.7-py3.2.egg\docutils\parsers\rst\", line 195, in run results =, input_lines, input_offset) File "C:\Development\Python31\lib\site-packages\docutils-0.7-py3.2.egg\docutils\", line 233, in run context, state, transitions) File "C:\Development\Python31\lib\site-packages\docutils-0.7-py3.2.egg\docutils\", line 453, in check_line return method(match, context, next_state) File "C:\Development\Python31\lib\site-packages\docutils-0.7-py3.2.egg\docutils\parsers\rst\", line 2281, in explicit_markup nodelist, blank_finish = self.explicit_construct(match) File "C:\Development\Python31\lib\site-packages\docutils-0.7-py3.2.egg\docutils\parsers\rst\", line 2293, in explicit_construct return method(self, expmatch) File "C:\Development\Python31\lib\site-packages\docutils-0.7-py3.2.egg\docutils\parsers\rst\", line 2035, in directive directive_class, match, type_name, option_presets) File "C:\Development\Python31\lib\site-packages\docutils-0.7-py3.2.egg\docutils\parsers\rst\", line 2086, in run_directive result = File "C:\Development\Python31\lib\site-packages\sphinx-1.1predev_20110526-py3.2.egg\sphinx\ext\", line 1298, in run documenter.generate(more_content=self.content) File "C:\Development\Python31\lib\site-packages\sphinx-1.1predev_20110526-py3.2.egg\sphinx\ext\", line 693, in generate self.analyzer = ModuleAnalyzer.for_module(self.real_modname) File "C:\Development\Python31\lib\site-packages\sphinx-1.1predev_20110526-py3.2.egg\sphinx\", line 202, in for_module obj = cls.for_file(source, modname) File "C:\Development\Python31\lib\site-packages\sphinx-1.1predev_20110526-py3.2.egg\sphinx\", line 185, in for_file obj = cls(fileobj, modname, filename) File "C:\Development\Python31\lib\site-packages\sphinx-1.1predev_20110526-py3.2.egg\sphinx\", line 219, in init self.encoding = detect_encoding(self.source.readline) File "C:\Development\Python31\lib\site-packages\sphinx-1.1predev_20110526-py3.2.egg\sphinx\", line 254, in detect_encoding first = read_or_stop() File "C:\Development\Python31\lib\site-packages\sphinx-1.1predev_20110526-py3.2.egg\sphinx\", line 227, in read_or_stop return readline() UnicodeDecodeError: 'cp949' codec can't decode bytes in position 674-675: illegal multibyte sequence

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Development\Python31\lib\site-packages\sphinx-1.1predev_20110526-py3.2.egg\sphinx\", line 188, in main, filenames) File "C:\Development\Python31\lib\site-packages\sphinx-1.1predev_20110526-py3.2.egg\sphinx\", line 204, in build self.builder.build_update() File "C:\Development\Python31\lib\site-packages\sphinx-1.1predev_20110526-py3.2.egg\sphinx\", line 196, in build_update 'out of date' % len(to_build)) File "C:\Development\Python31\lib\site-packages\sphinx-1.1predev_20110526-py3.2.egg\sphinx\", line 216, in build purple, length): File "C:\Development\Python31\lib\site-packages\sphinx-1.1predev_20110526-py3.2.egg\sphinx\", line 120, in status_iterator for item in iterable: File "C:\Development\Python31\lib\site-packages\sphinx-1.1predev_20110526-py3.2.egg\sphinx\", line 615, in update_generator self.read_doc(docname, app=app) File "C:\Development\Python31\lib\site-packages\sphinx-1.1predev_20110526-py3.2.egg\sphinx\", line 765, in read_doc raise SphinxError(str(err)) sphinx.errors.SphinxError: 'cp949' codec can't decode bytes in position 674-675: illegal multibyte sequence

c:\development\python31\lib\site-packages\sphinx-1.1predev_20110526-py3.2.egg\sphinx\ -> raise SphinxError(str(err)) }}}

To see what happened, I ran: {{{ (Pdb) sys.getdefaultencoding() 'utf-8' (Pdb) import locale (Pdb) local.getpreferredencoding() 'cp949' (Pdb) f = open('some-existing-text-file', 'r') (Pdb) f.encoding 'cp949' }}}

In the pycode/ line 182, it calls open() without the encoding parameter, causing use of locale.getpreferredencoding() according to the Python Standard Library manual, which is different from my source code encoding 'utf-8'.

I have inserted -*- coding: utf-8 -*- to all my source files, so I think this error should not happen.

Comments (10)

  1. Joongi Kim reporter

    I have found that the following monkey patch works with me:

    # HG changeset patch
    # User daybreaker
    # Date 1306421524 -32400
    # Branch py3k-workaround
    # Node ID 0067e50fad8a16e68ac20f594f1a7dbbe13e0cf1
    # Parent  47a94f723e803489e0608eeb39a89494b214c068
    Made it working with Python 3.2 on Windows 7 Korean version + UTF-8 encoded source files.
    Another workaround is required due to `next()` -> `__next__()`, see
    diff -r 47a94f723e80 -r 0067e50fad8a sphinx/pycode/
    --- a/sphinx/pycode/	Sun May 15 14:54:52 2011 +0200
    +++ b/sphinx/pycode/	Thu May 26 23:52:04 2011 +0900
    @@ -179,7 +179,7 @@
             if ('file', filename) in cls.cache:
                 return cls.cache['file', filename]
    -            fileobj = open(filename, 'r')
    +            fileobj = open(filename, 'r', encoding='utf8')
             except Exception, err:
                 raise PycodeError('error opening %r' % filename, err)
             obj = cls(fileobj, modname, filename)
    diff -r 47a94f723e80 -r 0067e50fad8a sphinx/util/
    --- a/sphinx/util/	Sun May 15 14:54:52 2011 +0200
    +++ b/sphinx/util/	Thu May 26 23:52:04 2011 +0900
    @@ -241,7 +241,8 @@
         def find_cookie(line):
    -            line_string = line.decode('ascii')
    +            #line_string = line.decode('ascii')
    +            line_string = line
             except UnicodeDecodeError:
                 return None
    @@ -252,9 +253,10 @@
         default = sys.getdefaultencoding()
         first = read_or_stop()
    -    if first and first.startswith(BOM_UTF8):
    -        first = first[3:]
    -        default = 'utf-8-sig'
    +    # The file is read using a specific encoding already.
    +    #if first and first.startswith(BOM_UTF8):
    +    #    first = first[3:]
    +    #    default = 'utf-8-sig'
         if not first:
             return default
         encoding = find_cookie(first)

    The problems lies where the encoding detection codes which must be modified to use bytes are mixed with use of strings.

    And finally, I had to apply another workaround #635 for final result.

  2. Log in to comment