Continuing on from 3f96afe7cdc2, I looked for all other possible data loss, when EAGAIN's could be thrown.
The common scenario is that many layers of the stream stack temporarily cache read data, while read()ing in a loop, and then return the data, concatenated. This breaks when you receive an EAGAIN.
I started by auditing the code to find possible problems, then writing tests that could expose them, then fixing the bugs. There isn't a strict relationship between tests and patches, rather, vaguley-comprehensive tests.
The tests are a bit ugly, I can't see any way to avoid massive code duplication or multiple asserts per test. Suggestions welcome :)
Yes, but where can I define that function. One can't call neighbor methods from the test methods. (I've never met pytest before, and it seems rather magicky. Esp with the multiple objectspaces in pypy)
I did my best to copy cpython (2.7) and general C behavior. Basically, any read() can throw EAGAIN if there's nothing to be read. And a readline() can return a partial line if it runs out of data. Of course, this isn't quite as easy to test in cpython, because we can't play with internals, but I get the same results when running bursty jobs in subprocess.
In fact, the reason I did any of this at all was so that we could get pygame's test suite to run under Pypy. Without this patch, it randomly looses data that subprocesses output. Here's a slightly modified example of that test suite's problem (the sleep may need tweaking between cpython and pypy to get nice bursty reads): http://paste.pocoo.org/show/497069/
Could you also add tests that run on a translated pypy-c?
Don't think so. Could create one or two tests, showing the type of problem, but all the stream classes need to be tested, and one can't create them all from app-space (or simulate this problem with all the types of stream that one *can* create, from app space).