Commits

Andy Li committed dc6d25c

Remove BOM when the input is unicode.

  • Participants
  • Parent commits 4a27030
  • Branches BOM

Comments (0)

Files changed (2)

pygments/lexer.py

                 text = decoded
             else:
                 text = text.decode(self.encoding)
+        else:
+            if text.startswith(u'\ufeff'): 
+                text = text[len(u'\ufeff'):]
+        
         # text now *is* a unicode string
-        text = text.lstrip(u'\xef\xbb\xbf\ufeff') # remove BOM
         text = text.replace('\r\n', '\n')
         text = text.replace('\r', '\n')
         if self.stripall:

tests/test_examplefiles.py

         text = fp.read()
     finally:
         fp.close()
-    text = text.lstrip(u'\xef\xbb\xbf\ufeff') #remove BOM
     text = text.replace(b('\r\n'), b('\n'))
     text = text.strip(b('\n')) + b('\n')
     try:
         text = text.decode('utf-8')
+        if text.startswith(u'\ufeff'):
+            text = text[len(u'\ufeff'):]
     except UnicodeError:
         text = text.decode('latin1')
     ntext = []