Pygments sets wrong inputenc if utf8 is used

Issue #512 on hold
Former user created an issue

If you pygmentize an utf 8 encoded file to LaTeX, pygmentize will create a line

{{{ \usepackage[utf-8]{inputenc} }}}

which is wrong. Instead there should be a line similar to;

{{{ \usepackage[utf8]{inputenc} }}}

Reported by fuzxxl

Comments (3)

  1. Former user Account Deleted

    I can confirm this bug. At first i thought it was a bug in latex as pacaked for debian -- but simply changing the usepackage-line as above, results in a tex-file that compiles without error.

    This should be a one-line (one byte, techincally) fix -- getting it into pygments would be very much appreciated.

  2. Eirik Schwenke

    In HEAD, the default encoding chosen is 'latin1', unless "self.encoding" is set. This is in pygments/formatters/ In older versions this whas set to 'utf-8'. A simple way of patching this to work with utf8 is appended below. Not sure what the best way to do this -- a more comprehensive mapping of python encoding names to (La)TeX encding names might be needed.

    Unfortunately some more work is needed, as apparently self.encoding is *still* set to 'latin1' for documents that are encoded in utf-8 (testet with a js-file in utf8 with Norwegian characters -- note that these are in latin1 as well -- but the resulting latex-file does *not* work -- the utf-8 characters are embeded as utf8 not latin1).

    Hence the patch below "defaults" to utf-8 (again) -- however as stated self.encoding is still incorrectly set to latin1 in some other part of the library.

  3. Tim Hatch

    I believe I see the bug. Could you provide a short test file that demonstrates the bug, so we can include it in the distribution?

  4. Log in to comment