If I install pygments 1.5 with Python 2.7, and then pipe some input into stdin, where I'm using a UTF-8 terminal, pygments attempts to use ASCII encoding:
$ cat microtype.dtx | pygmentize -f 256 -O full,style=rrt -l latex
*** Error while highlighting: UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 17827: ordinal not in range(128) (file "/home/rrt/.local/lib/python2.7/site-packages/Pygments-1.5-py2.7.egg/pygments/lexer.py", line 165, in get_tokens)
The file in question is NOT UTF-8 encoded, but that is not the real problem here: the real problem is that pygments defaults to 'ascii', not 'UTF-8' on its input. If I run pygments on the file directly, I get the expected result:
$ pygmentize -f 256 -O full,style=rrt -l latex microtype.dtx
*** Error while highlighting: UnicodeDecodeError: 'utf8' codec can't decode byte 0xa0 in position 17827: invalid start byte (file "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode)
Note that here pygments correctly found the terminal encoding to be UTF-8.
This problem occurs with Python 2.7, but is easily fixed by using Python 3.2. This suggests that the problem is in the Python 2-specific encoding detection code. Maybe it would be sufficient to note in the documentation that encoding detection doesn't work so well in Python 2?