io.IncrementalNewlineDecoder.newlines behaves differently than CPython's

Create issue
Issue #3012 new
Joshua Oreman created an issue

When io.IncrementalNewlineDecoder in translate mode is fed input one character at a time, it does correctly translate “\r\n” sequences to “\n”, but does not record “\r\n” in its newlines attribute (which I understand is intended to represent the newline sequences that were encountered in the input). Reproducer:

import io
import codecs

info = codecs.lookup("utf-8")
inner_decoder = info.incrementaldecoder("strict")
outer_decoder = io.IncrementalNewlineDecoder(inner_decoder, True)
msg = b"The quick brown fox jumps over the lazy dog.\r\n\n\r\r\n\n"
decoded = ""
for ch in msg:
    decoded += outer_decoder.decode(bytes([ch]))
decoded += outer_decoder.decode(b"", final=True)
assert decoded == "The quick brown fox jumps over the lazy dog.\n\n\n\n\n"
assert set(outer_decoder.newlines) == {"\r", "\n", "\r\n"}

This passes on CPython but fails on PyPy-3.6 v7.1.1: on the last line, outer_decoder.newlines contains “\r” and “\n” but not “\r\n”.

There is no issue in non-translating mode; the newlines property does wind up containing all three sequences in that case.

Comments (1)

  1. mattip

    Thanks for the report. Happens both on 2.7 and 3.6. Maybe fixed in 436eebe7adb1 by extending a fast path, hopefully it didn’t break anything else

  2. Log in to comment