HTMLReporter fails when source file is encoded in UTF-8 with BOM signature

Issue #179 resolved
pablodcar created an issue

Hi, I'm thankful for this wonderful tool. We are using it very extensively and I hope to contribute adding new APIs and features in the future.

When a source code is encoded in UTF-8 with BOM signature, //coverage.phystokens.source_encoding// returns the correct encoding: //"utf-8-sig"//. But when the file is rendered inside the html template, using that encoding to write the report to disk, it raises a //UnicodeDecodeError//, because the BOM can not be in the middle of the final output:

{{{ File "/home/pablo/baco-dyn/lib/python2.6/site-packages/coverage/", line 603, in html_report File "/home/pablo/baco-dyn/lib/python2.6/site-packages/coverage/", line 87, in report self.report_files(self.html_file, morfs, self.config.html_dir) File "/home/pablo/baco-dyn/lib/python2.6/site-packages/coverage/", line 83, in report_files report_fn(cu, self.coverage._analyze(cu)) File "/home/pablo/baco-dyn/lib/python2.6/site-packages/coverage/", line 222, in html_file html = html.encode(encoding) File "/home/pablo/baco-dyn/lib/python2.6/encodings/", line 15, in encode return (codecs.BOM_UTF8 + codecs.utf_8_encode(input, errors)[0], len(input)) UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 18296: ordinal not in range(128) }}}

I'm attaching a patch to decode and encode the source file in advance, using UTF-8 when utf-8-sig is detected. I hope you can review it and consider adding this change.

Thanks in advance,

Pablo Carballo

