hang with redirection

Issue #12 resolved
Former user created an issue

When using redirection that leads to an ioerror, sarge can hang waiting for spawned processes that will never complete due to waiting on a full pipe that no one is reading.

A simple test case:

import sarge
pipeline = sarge.run('head -c 65K /dev/zero | cat > /does/not/exist')

The first command hangs on a write to the pipe, which is never closed and sarge hangs on a wait() for this process that will never finish.

Comments (10)

  1. Vinay Sajip repo owner

    A wait() on a subprocess can't be timed out, unfortunately. If you pass async=True to the run() call, you get an IOError raised from the run().

  2. Former user Account Deleted

    Right, but even with async I still seem to leak a pipe file descriptor. I can also open the file up-front and pass this to stdout (at least in this example).

    It seems like there should be some way to cleanup before sarge blocks on wait(). For instance, this initial approach seems dirty, but seems to resolve the immediate issue (and passes the test suite in a linux environment):

    https://bitbucket.org/abg/sarge/commits/3a639f410e1cfd99aa9ad399985629537cb6c67c

  3. Former user Account Deleted

    On further inspection, this is not simply an artifact of failed redirection. Even if redirecting is successful a hang can be observed with this test case:

    import logging
    import sarge
    
    logging.basicConfig(level=logging.DEBUG)
    
    with open('issue12.input', 'wb') as fileobj:
        fileobj.write(b'\0'*256*1024)
    cmd = 'cat issue12.input | head -c 128K | cat > output'
    sarge.run(cmd)
    

    This hangs regardless of whether opening the output succeeds or fails. The underlying cause in both cases is the same - dangling pipe descriptors held open by the parent process.

    In the successful redirect case, the hang here is caused by cat blocking on write(stdout, ...) - head closes its copy and exits, but the parent still has an open descriptor so the first cat never sees EPIPE and blocks waiting on someone to read from the other end of the pipe.

    In the failed redirect case, the hang is on head waiting on write(stdout, ...), and again the parent has a stale pipe descriptor that no one reads from.

    In light of this, my previous fix is obviously insufficient. The closing of descriptors in the parent needs to happen before Command.wait(). This is very similar to the cleanup logic in Pipeline.close(), but I think this needs to happen before waiting on child processes.

  4. Vinay Sajip repo owner

    This change seems more reasonable. However, with the

    cat issue12.input | head -c 128K 
    

    command above, I get a cat: write error: Broken pipe error, though there's no hang. I believe it's the first cat which produces the message.

  5. Former user Account Deleted

    cat obviously hits a sigpipe - head only reads the first 128KB and closes it's stdin, so this error is expected and stracing the same commandline in bash shows the same behavior.

    The spurious output is from http://bugs.python.org/issue1652, I think. Restoring the sigpipe with preexec_fn avoids this message. Python3.2+'s subprocess also fixes this. I've had to integrate the preexec_fn workaround in some projects in the past.

    I fixed the sloppy exception handling in my fork so this works on py3 here:

    https://bitbucket.org/abg/sarge/commits/4ae0f1a466f2043778f1ea0958113171bcaa603a

    Perhaps it make sense to integrate restore_signals support from python3.2 in the sarge Popen subclass for python2.

  6. Vinay Sajip repo owner
    • changed status to open

    I just noticed that this change doesn't work on PyPy - signal must be called from the main thread. Strictly it shouldn't work on the other implementations either (though the tests pass), as the signal module docs do warn to call signal only from the main thread.

  7. Log in to comment