Issues

Issue #7 resolved

signal handlers need to be deactivated by gipc

Jonathan Kamens
created an issue

This example script:

import gevent.monkey
gevent.monkey.patch_all()

import gevent
from gipc import start_process
import os
import signal
import time

def child_handler():
    print "Child handler called in PID %d." % os.getpid()

def parent_handler():
    print "Parent handler called in PID %d." % os.getpid()

print "Parent is PID %d." % os.getpid()

signal_obj = gevent.signal(signal.SIGTERM, parent_handler)

def child():
    print "Child is PID %d." % os.getpid()
    gevent.signal(signal.SIGTERM, child_handler)
    time.sleep(2)

proc = start_process(child)
time.sleep(1)
os.kill(proc.pid, signal.SIGTERM)
proc.join()

Generates output that proves that both signal handlers, the one added in the parent and the one added in the child, are called when the child is sent a SIGTERM signal.

I'd venture to say this is not what is intended.

Note that you can't call signal_obj.cancel() in child(); if you do that the child process will crash.

I think gipc's start process needs to find and cancel all signal handlers right after the fork, before replacing the gevent hub and libevent loop.

In my code, what I'm doing now to work around this is using gc.get_objects() to find all objects that are instances of gevent.hub.signal, and setting object.watcher.callback to a no-op function.

Comments (10)

  1. Jan-Philip Gehrcke repo owner

    Sorry for the late reply Jonathan. I implemented a minimum working example in C using libev, reproducing the situation where we install a signal handler for SIGTERM in the parent process, fork, destroy the default event loop in the child, create a new one, and then send SIGTERM to the child. The signal handler fires off -- the behavior you observed is entirely reproducible.

    I consulted the author of libev who stated that what we are observing is undefined behavior / outside the scope of guaranteed libev behavior. What happens is that the old signal watcher is still active within the new event loop. libev is definitely not to blame, since it only makes guarantees about watchers that are properly stopped.

    In gipc, we're relying on 'orphaned' watchers not to bother us anymore after event loop destruction -- quite an efficient approach, which is also kind of supported by the libev docs (http://pod.tst.eu/http://cvs.schmorp.de/libev/ev.pod):

    "When this is not possible, or you want to use the default loop for
    other reasons, then in the process that wants to start "fresh", call
    ev_loop_destroy (EV_DEFAULT) followed by ev_default_loop (...).
    Destroying the default loop will "orphan" (not stop) all registered
    watchers, so you have to be careful not to execute code that modifies
    those watchers.
    

    Obviously, with signal watchers this approach does not work as intended and we need to correct for that.

    I want to avoid using gc.get_objects() by default in the child wrapper as long as I can, since we cannot make any statement about how complex this call might become in the context of larger applications and frequent forks (do you support this argument or does it sound exaggerative to you?). I think reliable deactivation of orphaned signal watchers can most efficiently be achieved by resetting the signal action for all signals to signal.SIG_DFL via signal.signal([allsignals], signal.SIG_DFL). This we can do i) by default in the child wrapper or ii) conditionally in the child wrapper, controlled via something in the lines of another argument to start_process.

    What do you think?

  2. Jonathan Kamens reporter

    Hi Jan-Philip,

    What happens if you use signal.signal to reset to default signal handling after the fork, but then the user's code subsequently calls gevent.signal to set up a new signal handler that s/he wants to run in the child? Will that "awaken" the old libev signal handler from before the fork?

  3. Jan-Philip Gehrcke repo owner

    I do not see how, that would surprise me. I will implement the SIG_DFL solution and corresponding unit tests and see how it goes. Should we reset the signal action for all signals in the child wrapper by default? Currently this seems to be a reasonable solution, but I might oversee something.

  4. Jan-Philip Gehrcke repo owner

    A minor follow up on this: it seems that the complications you have pointed out initially arise from the fact that libev by default uses the signalfd mechanism for catching signals on systems supporting it (as opposed to 'classical' async handlers). In the context of forking, this mechanism is challenging to properly control. A few words on this topic by LWN (http://lwn.net/Articles/415684/):

    "" Unfortunately, signalfd has a very irritating practical issue. To use it, you need to block the signal you're interested in (using e.g. sigsetmask). However, the set of blocked signals is not reset by exec (blocked signals and signals set to SIG_IGN are preserved, but other signal actions are reset to default). So, if you use signalfd, whenever you spawn a process, it will not receive that signal. And processes tend to misbehave when not receiving signals they expect to. You can, of course, fix that. You simply need to unblock the signal after forking ""

    So far, so good, but what is problematic in this situation is that we cannot directly modify the signal mask in Python 2.x and that is why currently we probably have some kind of race condition in the child, even if we reset the signal actions to default one by one. This race condition might never be of relevance in sane usage scenarios, but still this seems to be ugly.

    I will implement what we have agreed on, but I think my recommendation would be to not use gevent.signal (which uses libev signal watchers which might use signalfd) in the context of forking. We can, instead, just use the classical signal module (if not monkey patched). The libev docs encourage to do so:

    " If you want signals to be delivered truly asynchronously, just use sigaction as you would do without libev and forget about sharing the signal. You can even use ev_async from a signal handler to synchronously wake up an event loop. "

    I took your inital repro code from above and modified it to not use gevent.signal. The parent signal handler becomes inherited (as expected), and can reliably be overwritten in the child:

    import gevent
    from gipc import start_process
    import os
    import signal
    
    def child_handler(a, b):
        print "Child handler called in PID %d." % os.getpid()
    
    def parent_handler(a, b):
        print "Parent handler called in PID %d." % os.getpid()
    
    print "Parent is PID %d." % os.getpid()
    signal.signal(signal.SIGTERM, parent_handler)
    
    def child():
        print "Child is PID %d." % os.getpid()
        signal.signal(signal.SIGTERM, child_handler)
        gevent.sleep(2)
    
    p = start_process(child)
    gevent.sleep(1)
    os.kill(p.pid, signal.SIGTERM)
    p.join()
    

    The output is for me:

    Parent is PID 28292.
    Child is PID 28293.
    Child handler called in PID 28293.
    
  5. Jan-Philip Gehrcke repo owner

    Another thing to add in this topic is that for reliable IPC we should stick with gipc's pipes, even for simple tasks. Getting signals right in all conditions, especially involving libev and forks, is a challenge. And we still did not talk about threads -- a funny quote from a user over at StackOverflow (http://stackoverflow.com/a/1134566/145400): Mixing any two of fork/exec, threads, and signals is difficult to get right. Mixing all three is a recipe for disaster.

  6. Jan-Philip Gehrcke repo owner

    For the matter of correctness, I need to take back my earlier statement "libev by default uses the signalfd mechanism for catching signals on systems supporting it" -- the developer must explicitly request its usage, signalfd is never used by default by libev.

  7. Log in to comment