Japanese characters are mangled when guessing lexer type

Scott Patten avatarScott Patten created an issue

If I take a file with Japanese characters in it, then the output is correct when I tell Pygments what lexer to use, but the characters are mangled when I ask Pygments to guess the lexer.

Using the attached "test.rb" file

cat test.rb | pygmentize -l ruby -O encoding=utf-8

and

pygmentize -O encoding=utf-8 test.rb

work correctly, while

cat test.rb | pygmentize -g -O encoding=utf-8

results in mangled characters.

This happens in all of the formatters.

Tested on Pygments 1.5 and Pygments 1.6, rc1. It happens on multiple lexers, as well.

Comments (1)

  1. Log in to comment
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.