will print the help for the Python lexer, etc.
+Pygments tries to be smart regarding encodings in the formatting process:
+* If you give an ``encoding`` option, it will be used as the input and
+* If you give an ``outencoding`` option, it will override ``encoding``
+ as the output encoding.
+* If you don't give an encoding and have given an output file, the default
+ encoding for lexer and formatter is ``latin1`` (which will pass through
+ all non-ASCII characters).
+* If you don't give an encoding and haven't given an output file (that means
+ output is written to the console), the default encoding for lexer and
+ formatter is the terminal encoding (`sys.stdout.encoding`).
.. _a particular formatter: formatters.txt
This is the case for all regular files and for terminals.
Note: The Terminal formatter tries to be smart: if its output stream has an
-`encoding` attribute, and you haven't set the option,
-it will encode any Unicode string with this encoding
-before writing it. This is the case for `sys.stdout`, for example. The other
-formatters don't have that behavior.
+`encoding` attribute, and you haven't set the option, it will encode any
+Unicode string with this encoding before writing it. This is the case for
+`sys.stdout`, for example. The other formatters don't have that behavior.
+Another note: If you call Pygments via the command line (`pygmentize`),
+encoding is handled differently, see `the command line docs <cmdline.txt>`_.
*New in Pygments 0.7*: the formatters now also accept an `outencoding` option
which will override the `encoding` option if given. This makes it possible to
+ # No encoding given? Use latin1 if output file given,
+ # stdin/stdout encoding otherwise.
+ # (This is a compromise, I'm not too happy with it...)
+ if 'encoding' not in O_opts and 'outencoding' not in O_opts:
+ # encoding pass-through
+ fmter.encoding = 'latin1'
+ # use terminal encoding
+ lexer.encoding = sys.stdin.encoding
+ fmter.encoding = sys.stdout.encoding
if not enc and hasattr(outfile, "encoding") and \
hasattr(outfile, "isatty") and outfile.isatty():
- encode = lambda value: value.encode(enc)
- encode = lambda value: value
for ttype, value in tokensource:
+ value = value.encode(enc)
while ttype and not_found:
If given, must be an encoding name. This encoding will be used to
convert the input string to Unicode, if it is not already a Unicode
- string. The default is to use latin1 (default: 'latin1').
- Can also be 'guess' to use a simple UTF-8 / Latin1 detection, or
- 'chardet' to use the chardet library, if it is installed.
+ string (default: ``'latin1'``).
+ Can also be ``'guess'`` to use a simple UTF-8 / Latin1 detection, or
+ ``'chardet'`` to use the chardet library, if it is installed.