1. Georg Brandl
  2. pygments-main
Issue #801 resolved

Python 2 fails to detect terminal encoding correctly

Reuben Thomas
created an issue

If I install pygments 1.5 with Python 2.7, and then pipe some input into stdin, where I'm using a UTF-8 terminal, pygments attempts to use ASCII encoding:

$ cat microtype.dtx | pygmentize -f 256 -O full,style=rrt -l latex

*** Error while highlighting: UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 17827: ordinal not in range(128) (file "/home/rrt/.local/lib/python2.7/site-packages/Pygments-1.5-py2.7.egg/pygments/lexer.py", line 165, in get_tokens)

The file in question is NOT UTF-8 encoded, but that is not the real problem here: the real problem is that pygments defaults to 'ascii', not 'UTF-8' on its input. If I run pygments on the file directly, I get the expected result:

$ pygmentize -f 256 -O full,style=rrt -l latex microtype.dtx

*** Error while highlighting: UnicodeDecodeError: 'utf8' codec can't decode byte 0xa0 in position 17827: invalid start byte (file "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode)

Note that here pygments correctly found the terminal encoding to be UTF-8.

This problem occurs with Python 2.7, but is easily fixed by using Python 3.2. This suggests that the problem is in the Python 2-specific encoding detection code. Maybe it would be sufficient to note in the documentation that encoding detection doesn't work so well in Python 2?

Comments (1)

  1. Log in to comment