Handling errors from signal handlers in callbacks

Issue #152 resolved
Denis Bilenko
created an issue

Consider the following script:

pypycore ~/work/gevent $ cat cffi_signal_problem.py 
import signal
from cffi import FFI

ffi = FFI()
void call_callback(void*);

C = ffi.verify('''
#include <unistd.h>
void call_callback(void(*cb)(void));
void call_callback(void(*cb)(void)) {
''', libraries=[])

def mycallback():
        print 'hello world'
    except Exception, ex:
        print 'caught error: %s' % ex

cb = ffi.callback("void(*)()", mycallback)

def alarm_handler(*args):
    1 // 0

signal.signal(signal.SIGALRM, alarm_handler)

It will never print "caught error", instead it'll print this:

pypycore ~/work/gevent $ pypy cffi_signal_problem.py 
From callback <function mycallback at 0x00007f4475e15100>:
Traceback (most recent call last):
  File "cffi_signal_problem.py", line 20, in mycallback
  File "cffi_signal_problem.py", line 38, in alarm_handler
    1 // 0
ZeroDivisionError: integer division by zero

I would like in gevent to be able to handle such errors (have my handle_error() function called) as I'd like those to be re-raised in main greenlet, as it happens with builtin blocking functions:

pypycore ~/work/gevent $ cat time_signal.py 
import signal
import time

def alarm_handler(*args):
    1 // 0

signal.signal(signal.SIGALRM, alarm_handler)
except Exception, ex:
    print 'caught error: %s' % ex

pypycore ~/work/gevent $ pypy time_signal.py 
caught error: integer division by zero

However, I cannot currently find a way to make it work.

In CPython version of gevent I do

    if (PyErr_Occurred()) gevent_handle_error(loop, Py_None);

Would be nice to have special support for this in CFFI / PyPy, in a form of a special callback that will be called with exception converted to value for those exception that were set before entering the callback.

Comments (18)

  1. Armin Rigo

    So, something like callback(..., onerror=print_error), where print_error is a callable invoked like an exit handler with three arguments, exc_type, exc_value, traceback?

  2. Armin Rigo

    Would it be enough if CFFI would notice the Python exception before invoking the callback, but save it away around the callback and restore it afterwards?

    ...no, nevermind...

  3. Armin Rigo

    The onerror solution above is not perfect, because if the signal occurs while we're in the onerror handler, then we're lost again. How about callback(..., append_errors_to=mylist) which guarantees that if there was an exception, it is not printed to stderr but appended to the given list as a 3-tuple? It would work in any case except out-of-memory errors... Another alternative would be to save away the error internally, and recreate it around the call to C.call_callback(cb), but this wouldn't work in case we're not in a CFFI-called C function in the first place...

  4. Armin Rigo

    Summary of the IRC discussion: we're going for onerror(). It is not 100% perfect, but it should only be an issue in case two signal handlers raise an exception in an extremely short amount of time.

    This is a case that we cannot correctly handle in Python anyway, even without CFFI. If we write try: ... except Foo: then the first handler can raise Foo, sending us inside the except block. Then, the second handler can raise Foo again. If we're very unlucky, we didn't close the loop and re-enter the try block yet. So nobody will catch that exception. Similarly, with onerror(), if we're very unlucky the second handler can raise an exception while we're still inside onerror().

    In PyPy this could be resolved by disabling signals when running the onerror() handler. I'm not sure it can be done cleanly on CPython. But it looks strange, because the only way we'd have to be 100% safe in the other case (a top-level loop without CFFI) would then be to rewrite that loop in C and use a ffi.callback() with an onerror() handler...

  5. Armin Rigo

    Talking about the cffi version of gevent: I think this project would benefit from containing a little bit more C code instead of being in pure Python. It's one thing that cffi allows, and I think that in this case it could give great results, both in terms of performance and in terms of reliability. It would for example get rid of hacks like keyboard_interrupt_allowed. For all the parts where you don't want a signal handler to interrupt you, you write them in C. See this for a 1000-feet overview of what I mean: http://paste.pound-python.org/show/anWUmVxCtVc267j4UOHj/ . This would avoid the problem cited here and improve performance a lot, notably by not creating many ffi.callback()s.

    (This doesn't help resolve the present issue, though.)

  6. Jason Madden

    I have a branch in the gevent repository that uses the ideas outlined by @Armin Rigo to replace the individual watcher callbacks with a shared callback, and to remove keyboard_interrupt_allowed. The shared callbacks seem like a clear win, though as mentioned that doesn't solve this present issue. Removing keyboard_interrut_allowed is less clearly a win; the code is mostly simplified by letting Python's default interrupt handler do its job and detecting the error case in C code, but the side effect is that you get an extra traceback printed:

    From cffi callback <function _python_loop_check_callback at 0x00000001066ad6a0>:
    Traceback (most recent call last):
      File "//site-packages/gevent/corecffi.py", line 363, in _python_loop_check_callback
        return 42

    It's entirely possible I misunderstood what the suggestion there was, though. It does seem like that problem would be solvable with the onerror construct too.

  7. Armin Rigo

    Indeed, sorry about that. It seems that what I pasted is not correct, as exceptions from each individual Python callback will be printed and discarded. Maybe onerror is the least bad solution...

  8. Jason Madden

    I didn't run much formal benchmarking, but it does appear so, yes. bench_sendall.py appears to gain 5-10 MB/s with these changes, although there is a lot of variability.

  9. Omer Katz

    Do you guys want me to try the new branch with the pypy build that has onerror for ffi callbacks? Do you think it will help us resolve the test failure in gevent?

  10. Jason Madden

    It might be worth a shot. Or, if there's on OS X build I can take a look too. (I'm currently moving house and my linux boxen are unaccessible.) I merged most but not all of that branch with master/1.1a1.

    I recently discovered another test failure, this time in the bundled standard library test suite. When gevent's subprocess module is used, cross-process signal handling is broken, apparently by exactly this issue.

  11. Armin Rigo

    Added onerror in CFFI. It is called with the three classical arguments exception, exc_value, traceback.

    A blocker was that it seemed that in gevent we'd need occasionally to learn the arguments originally passed to the callback. This is actually possible, even if slightly hackish: traceback.tb_frame.f_locals['arg'] would get the value of arg in the outermost frame, which is the frame of the callback function itself. The arguments are always present, even if the exception was raised very early because of a pending signal.

    So I'll document this hack and keep this three-argument version of onerror.

    The final version now in CFFI has the behavior that onerror can itself return the result value to give to the callback in C, but doesn't have to: if it simply returns None---or if onerror itself fails---then the value of error will be used. If onerror returns normally, then it is assumed that it handled the exception on its own and nothing is printed to stderr. If onerror raises, then we print both tracebacks.

  12. Jason Madden

    Looks great! I'd love to give it a try. Unfortunately, the nightly OS X builds have been failing since the 2nd. When I try to translate the most recent changeset (7b0ceb5a2ce0) on OS X locally, it fails with:

    [translation:ERROR] TyperError: the RPython-level __del__() method in <FunctionGraph of (pypy.module._cffi_backend.cdataobj:500)W_CDataGCP.__del__ at 0x150535b78> calls:
    [translation:ERROR]     (pypy.module._cffi_backend.cdataobj:500)W_CDataGCP.__del__
    [translation:ERROR]     (pypy.interpreter.baseobjspace:1069)call_function__star_1
    [translation:ERROR]     (pypy.interpreter.function:76)funccall__star_2
    [translation:ERROR]     (pypy.interpreter.gateway:799)BuiltinCode2.fastcall_2
    [translation:ERROR]     (?:2)fastfunc_truediv_2
    [translation:ERROR]     (pypy.module.__builtin__.interp_classobj:254)truediv
    [translation:ERROR]     (pypy.objspace.descroperation:677)binop_truediv_impl
    [translation:ERROR]     (pypy.module.__builtin__.abstractinst:126)abstract_issubclass_w
    [translation:ERROR]     (pypy.objspace.std.objspace:393)fixedview__False
    [translation:ERROR]     (pypy.interpreter.baseobjspace:838)unpackiterable
    [translation:ERROR]     (pypy.interpreter.baseobjspace:860)_unpackiterable_unknown_length
    [translation:ERROR] .. (?:2)fastfunc_gcp_2
    [translation:ERROR] .. block@9 with 1 exits
    [translation:ERROR] .. v896 = simple_call((function gcp), space_100, v895, w1_0)

    Hmm, I see a log of changes merging today. I guess I'll try to work backwards to find a working commit.

  13. Log in to comment