rogerjhu avatar rogerjhu committed f48fd44

Make UTF-8 detection more robust.

If the 1st line of the Python is blank/empty, the function assumes that the encoding is 'ascii' and
doesn't try for the 2nd line.

Comments (0)

Files changed (2)


         bom_found = True
         first = first[3:]
         default = 'utf-8-sig'
-    if not first:
+    if first is None:
         return default
     encoding = find_cookie(first)


 import os, re
 from tests.coveragetest import CoverageTest
-from coverage.phystokens import source_token_lines
+from coverage.phystokens import source_token_lines, source_encoding
 SIMPLE = """\
         stress = os.path.join(HERE, "stress_phystoken_dos.tok")
+    def test_source_encoding_detect_utf8(self):
+        source = """\
+# coding=utf-8
+        self.assertEqual(source_encoding(source), 'utf-8')
+    def test_source_encoding_second_line_detect_utf8(self):
+        """ Verifies that UTF-8 encoding will still be detected in spite of the newline."""
+        source = """\
+# coding=utf-8
+        self.assertEqual(source_encoding(source), 'utf-8')
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.