Issue #12 resolved
Heungsub Lee
created an issue

Hi jgehrcke,

gipc-0.4.0 resets SIGPIPE handler to SIG_DFL so a subprocess crashes if it writes to a closed socket. Python ignores SIGPIPE by default. Is it expected behavior? There's no workaround to avoid subprocess crashing?

Regards, Heungsub

Comments (11)

  1. Jan-Philip Gehrcke repo owner

    It is 'expected' behavior, according to this warning in the docs:

    Please note that in order to provide reliable signal handling in the
    context of libev, the default disposition (action) is restored for all
    signals in the child before executing the user-given target function.
    You can (re)install any signal handler within target.
    

    I believe it is good to have this simple rule applied.

    There is a simple solution for you, although I really would not call this workaround. Just import signal and call

    signal.signal(signal.SIGPIPE, signal.SIG_IGN)
    

    in your child's target function. This approach has clear semantics. That code explicitly states how it behaves.

  2. Heungsub Lee reporter

    Thanks to your answer and this awesome project.

    By default, Python runtime sets SIG_IGN as the behavior of SIGPIPE and SIGXFSZ. See initsigs() at pythonrun.c. I think that it is better behavior instead of SIG_DFL because Python can handle these by IOError without interruption.

    So I suggest that gipc caches the default signal handlers using signal.getsignal(...) at the load time and each sub process refers them when they reset the signal handlers.

    For example,

    _signals_to_reset = [getattr(signal, s) for s in
        set([s for s in dir(signal) if s.startswith("SIG")]) -
        set(['SIG_DFL', 'SIGSTOP', 'SIGKILL'])]
    _signal_and_handlers_to_reset = {s: signal.getsignal(s) for s in _signals_to_reset}
    
    def _reset_signal_handlers():
        for s, h in _signal_and_handlers_to_reset:
            signal.signal(s, h)
    

    How you think?

  3. Jan-Philip Gehrcke repo owner

    Thanks for your input. I think the approach you suggested contains a bit too much magic, for my taste. But let's take this discussion step by step.

    First of all, I need to know whether a call to signal.signal(signal.SIGPIPE, signal.SIG_IGN) in your child fixes the issue that you are observing.

    Then, some more background: gipc should be as painless to use as possible for the casual user. The casual user should be able to use gevent.signal in the parent without worrying about a libev signal watcher to do unexpected things in the child. That is why we have to "normalize" signals in the child. This came up in issue 7 (https://bitbucket.org/jgehrcke/gipc/issue/7/) and we decided -- as a general purpose solution -- to reset all signals.

    Now, when we reset signals, we could implement special rules for special signals or treat all signals the same. The decision was to treat all signals the same, i.e. reset all of them, to SIG_DFL. The clear advantage is that this is well-defined behavior and a simple rule to keep in mind.

    Now the important thing for me is to find out what disadvantages this solution brings along in your case. If the "disadvantage" just is that you need to put signal.signal(signal.SIGPIPE, signal.SIG_IGN) into your early child code, I would say that it is not really a disadvantage, but rather a helpful clarification of what your code really does. If there are other disadvantages, please tell me.

    Looking forward to hearing your feedback!

    Edit: Just to clarify, are you basically saying that people expect from Python to always ignore SIGPIPE and so we should just stick with that paradigm, yes? The corresponding documentation quote would be "SIGPIPE is ignored (so write errors on pipes and sockets can be reported as ordinary Python exceptions) and SIGINT is translated into a KeyboardInterrupt exception." -- what would you suggest to do with SIGINT in a gipc child (that's in fact a difficult question)?

  4. Heungsub Lee reporter

    Yes, signal.signal(signal.SIGPIPE, signal.SIG_IGN) you suggested fixes the issue well. You are right, that code just has a cheap disadvantage. Now I agree to your decision.

    Thanks for the explanation!

  5. Miguel Turner

    For the record, I was having a problem with gipc child processes dying for no apparent reason, and came across this issue while trying to discover why. This issue turned out to be the cause of my problem, but it took a while to realize it because I couldn't see why my processes were dying (I wasn't checking the exit code).

    While I agree that it's less magical to treat all signals the same, the effect of exiting the process on SIGPIPE will not be what most Python users expect. If it doesn't make sense to change behavior for this special case (do other Python implementations behave differently?), I would recommend adding documentation to clarify how gipc deviates from CPython behavior.

  6. Alex Besogonov

    Please, add SIGPIPE to the ignored list. The current behavior causes very hard-to-find bugs the first time one encounters the broken connections.

    And since in most cases SIGPIPE is a rare event (requiring a client to break a connection), it's extremely hard to replicate it.

  7. Jan-Philip Gehrcke repo owner

    I appreciate your input, Miguel and Alex. Sorry for taking so long to get back to you. I have reconsidered ignoring SIGPIPE in child processes and decided to do it. That is, the next release will contain this breaking change.

    On the one hand, I still think that calling code should explicitly state how it desires to treat incoming signals. On the other hand, you are right that people in the (C)Python world tend to assume that SIGPIPE is ignored and that the underlying error is detected and reported by Python via exception (and traceback, if not handled).

    While I initially found the first argument to have larger weight, I got convinced by your feedback as well as further resources such as http://stackoverflow.com/a/108192/145400 and http://www.microhowto.info/howto/ignore_sigpipe_without_affecting_other_threads_in_a_process.html. The latter summarized the issue quite nicely:

    If you attempt to write to a broken pipe or a disconnected socket then the signal SIGPIPE will be raised, and its default behaviour is to terminate the process responsible. This behaviour is appropriate for programs that are intended for use in a pipeline, but in most other circumstances it is undesirable and should be prevented.

    However, what I find most convincing in the context of gipc is that code should not behave differently depending on whether it is called from the initial process or from child processes. Currently, there is a deviation. This minimal example demonstrates the case where we write from the initial process to a broken pipe:

    import gipc
    
    def child(r):
        r.get()
        # Here the child process exits and the read pipe end
        # becomes closed implicitly.
    
    with gipc.pipe() as (r, w):
        p = gipc.start_process(child, (r,))
        while True:
            w.put(0)
    

    Its output:

    $ python sigpipetest.py
    Traceback (most recent call last):
      File "sigpipetest.py", line 11, in <module>
        w.put(0)
      File "/bla/gipc/gipc/gipc.py", line 758, in put
        self._write(struct.pack("!i", len(bindata)) + bindata)
      File "/home/jang/hobbyproggorn/gipc/gipc/gipc.py", line 726, in _write
        bytes_written = _write_nonblocking(self._fd, bindata)
      File "/bla/gipc/pyenv/local/lib/python2.7/site-packages/gevent/os.py", line 70, in nb_write
        return _write(fd, buf)
    OSError: [Errno 32] Broken pipe
    

    This code demonstrates the case where we write from a child process to a broken pipe:

    import gipc
    
    def child_read(r):
        r.get()
        # Here the child process exits and the read pipe end
        # becomes closed implicitly.
    
    def child_write(w):
        while True:
            w.put(0)
    
    with gipc.pipe() as (r, w):
        pr = gipc.start_process(child_read, (r,))
        pw = gipc.start_process(child_write, (w,))
        pr.join()
        pw.join()
        print "write process exit code: %s" % pw.exitcode
    

    Its output:

    $ python sigpipetestchild.py
    write process exit code: -13
    

    That is, no exception is raised and the writing child process exits silently. Its exit code indicates that it exited as of an unhandled SIGPIPE signal.

    Now I am of the opinion that this difference in behavior is not tolerable and I will commit corresponding changes to gipc and add unit tests accordingly.

  8. Log in to comment