No arbitrary unicode in HTML reports in Windows

Issue #124 resolved
Devin Jeanpierre created an issue

In 3.x running "coverage run $" and then "coverage html" produces an exception on windows, because the file coverage opens to dump HTML into is not opened with any specific encoding and so defaults to the locale default encoding. On windows, there is no way to alter the locale encoding to be able to handle the codepoint used in that file. It's strictly limited to using one of 14 codepages with a hundred non-ascii characters each, none of which encode U+FFFF (or a number of other characters). On other OSes the encoding is a legitimate part of the locale and can be set to pretty much anything. This is just a Windows issue.

A mostly-unrelated issue is that the encoded data is just put in a file without any data about the encoding given to the browser. This could cause mojibake. It's possible to give the encoding to the browser by embedding locale.getpreferredencoding(), but I think it'd be simpler to solve both of these issues together, by turning

{{{ fhtml = open(html_path, 'w') }}} into {{{ fhtml = open(html_path, 'w', encoding='ascii', errors='xmlcharrefreplace') }}}

although the particulars of doing this without breaking Python 2.x support are probably a bit more complex.

I'd also appreciate advice on how to add this to the test suite (for reproducibility concerns etc.)

Comments (6)

  1. Devin Jeanpierre reporter

    I'm under the impression that code in is run in both 2.x and 3.x unaltered, so this should work (although it can't run in 1.6).

  2. Log in to comment