Anonymous avatar Anonymous committed 460ddf6

[svn] Elaborate the Unicode docs a bit.

Comments (0)

Files changed (1)

docs/src/unicode.txt

-===============
-Unicode Support
-===============
+=====================
+Unicode and Encodings
+=====================
 
-Since Pygments 0.6, the lexers use unicode strings internally. Because of that
-you might discover the occasional `UnicodeDecodeError` if you pass strings with the
+Since Pygments 0.6, all lexers use unicode strings internally. Because of that
+you might encounter the occasional `UnicodeDecodeError` if you pass strings with the
 wrong encoding.
 
-Per default all lexers have `encoding` set to `latin1`. If you pass a lexer a
-string object (not unicode) it tries to decode the data using this encoding.
+Per default all lexers have their input encoding set to `latin1`.
+If you pass a lexer a string object (not unicode), it tries to decode the data
+using this encoding.
 You can override the encoding using the `encoding` lexer option. If you have the
 `chardet`_ library installed and set the encoding to ``chardet`` if will ananlyse
 the text and fetch the best encoding automatically:
 The best way is to pass Pygments unicode objects. In that case you can't get
 unexpected output.
 
-The formatters now send unicode objects to the stream if you don't set the
-encoding. You can do so by passing the formatters an `encoding` option:
+The formatters now send Unicode objects to the stream if you don't set the
+output encoding. You can do so by passing the formatters an `encoding` option:
 
 .. sourcecode:: python
 
     from pygments.formatters import HtmlFormatter
     f = HtmlFormatter(encoding='utf-8')
 
+**You will have to set this option if you have non-ASCII characters in the
+source and the output stream does not accept Unicode written to it!**
+This is the case for all regular files and for terminals.
+
+Note: The Terminal formatter tries to be smart: if its output stream has an
+`encoding` attribute, it will encode any Unicode string with this encoding
+before writing it. This is the case for `sys.stdout`, for example. The other
+formatters don't have that behavior.
+
+*New in Pygments 0.7*: the formatters now also accept an `outencoding` option
+which will override the `encoding` option if given. This makes it possible to
+use a single options dict with lexers and formatters, and still have different
+input and output encodings.
+
 .. _chardet: http://chardet.feedparser.org/
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.