encoding marker may only appear on the first two lines per PEP263

Issue #529 resolved
Gregory P. Smith created an issue

The file text encoding marker must only be paid attention to if it falls on the first two lines of the file.

(confirmed by testing that a coding marker in a comment on line 3 is ignored in CPython 2.7 and 3.4)

The bug appears to be in https://bitbucket.org/ned/coveragepy/src/eb9f4a94096cca9863352f600994340caf4f6927/coverage/phystokens.py?at=default&fileviewer=file-view-default#phystokens.py-291 phystokens.neuter_encoding_declaration which neuters the first two encoding declarations seen in a file instead of only within the first two lines.

Comments (7)

  1. Gregory P. Smith reporter

    This may be related to issue #443 which was filed showing an invalid coding specifier not within the first two lines in its example. I'm not sure which reason that issue was filed for.

  2. Maurice Pud

    I think this is easiest to illustrate with some test cases.

      def testTwoEncodingDeclaration(self):
        """Two encoding declarations in the first two lines.
    
        This works with the original upstream version of the function.
        """
        input_src = textwrap.dedent(
            u"""\
            # -*- coding: ascii -*-
            # -*- coding: utf-8 -*-
            # -*- coding: utf-16 -*-
            """)
        expected_src = textwrap.dedent(
            u"""\
            # (deleted declaration) -*-
            # (deleted declaration) -*-
            # -*- coding: utf-16 -*-
            """)
        output_src = phystokens.neuter_encoding_declaration(input_src)
        self.assertEqual(expected_src, output_src)
    
      def testOneEncodingDeclaration(self):
        """One encoding declaration in the first two lines.
    
        This does NOT work with the original upstream version of the function.
        """
        input_src = textwrap.dedent(
            u"""\
            # -*- coding: utf-16 -*-
            # Just a comment.
            # -*- coding: ascii -*-
            """)
        expected_src = textwrap.dedent(
            u"""\
            # (deleted declaration) -*-
            # Just a comment.
            # -*- coding: ascii -*-
            """)
        output_src = phystokens.neuter_encoding_declaration(input_src)
        self.assertEqual(expected_src, output_src)
    
  3. Ned Batchelder repo owner

    I can see that neuter_encoding_declaration is changing more declarations than it needs to. But is there an actual product-level problem? What source files is coverage.py treating incorrectly? I'd like to understand the whole problem.

  4. Maurice Pud

    Here's a simple test file that passes when run normally but fails when running under coverage. The reason it fails is that the # -*- coding: utf-8 -*- line in src1 gets rewritten to be # (deleted declaration) -*-.

    # -*- coding: utf-8 -*-
    
    import unittest
    
    
    class FailsUnderCoverageTest(unittest.TestCase):
    
      def test_fails_under_coverage(self):
        src1 = u"""\
            # -*- coding: utf-8 -*-
            # Just a comment.
            """
        src2 = u"""\
            # -*- coding: utf-8 -*-
            # Just a comment.
            """
        self.assertEqual(src1, src2)
    
    
    if __name__ == "__main__":
      unittest.main()
    
  5. Log in to comment