Commits

Anonymous committed 00bdb3e

[svn] Smartify pygmentize encoding handling.

Comments (0)

Files changed (5)

docs/src/cmdline.txt

 will print the help for the Python lexer, etc.
 
 
+A note on encodings
+-------------------
+
+Pygments tries to be smart regarding encodings in the formatting process:
+
+* If you give an ``encoding`` option, it will be used as the input and
+  output encoding.
+
+* If you give an ``outencoding`` option, it will override ``encoding``
+  as the output encoding.
+
+* If you don't give an encoding and have given an output file, the default
+  encoding for lexer and formatter is ``latin1`` (which will pass through
+  all non-ASCII characters).
+
+* If you don't give an encoding and haven't given an output file (that means
+  output is written to the console), the default encoding for lexer and
+  formatter is the terminal encoding (`sys.stdout.encoding`).
+
+
 .. _a particular formatter: formatters.txt

docs/src/unicode.txt

 This is the case for all regular files and for terminals.
 
 Note: The Terminal formatter tries to be smart: if its output stream has an
-`encoding` attribute, and you haven't set the option,
-it will encode any Unicode string with this encoding
-before writing it. This is the case for `sys.stdout`, for example. The other
-formatters don't have that behavior.
+`encoding` attribute, and you haven't set the option, it will encode any
+Unicode string with this encoding before writing it. This is the case for
+`sys.stdout`, for example. The other formatters don't have that behavior.
+
+Another note: If you call Pygments via the command line (`pygmentize`),
+encoding is handled differently, see `the command line docs <cmdline.txt>`_.
 
 *New in Pygments 0.7*: the formatters now also accept an `outencoding` option
 which will override the `encoding` option if given. This makes it possible to

pygments/cmdline.py

             return 2
         code = sys.stdin.read()
 
+    # No encoding given? Use latin1 if output file given,
+    # stdin/stdout encoding otherwise.
+    # (This is a compromise, I'm not too happy with it...)
+    if 'encoding' not in O_opts and 'outencoding' not in O_opts:
+        if outfn:
+            # encoding pass-through
+            fmter.encoding = 'latin1'
+        else:
+            # use terminal encoding
+            lexer.encoding = sys.stdin.encoding
+            fmter.encoding = sys.stdout.encoding
+
     # ... and do it!
     try:
         # process filters

pygments/formatters/terminal256.py

         if not enc and hasattr(outfile, "encoding") and \
            hasattr(outfile, "isatty") and outfile.isatty():
             enc = outfile.encoding
-        if enc:
-            encode = lambda value: value.encode(enc)
-        else:
-            encode = lambda value: value
 
         for ttype, value in tokensource:
-            value = encode(value)
+            if enc:
+                value = value.encode(enc)
 
             not_found = True
             while ttype and not_found:

pygments/lexer.py

     ``encoding``
         If given, must be an encoding name. This encoding will be used to
         convert the input string to Unicode, if it is not already a Unicode
-        string. The default is to use latin1 (default: 'latin1').
-        Can also be 'guess' to use a simple UTF-8 / Latin1 detection, or
-        'chardet' to use the chardet library, if it is installed.
+        string (default: ``'latin1'``).
+        Can also be ``'guess'`` to use a simple UTF-8 / Latin1 detection, or
+        ``'chardet'`` to use the chardet library, if it is installed.
     """
 
     #: Name of the lexer
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.