Crash on thread destruction in PyThreadState_Delete

Create issue
Issue #362 resolved
Former user created an issue

Hi,

I have a crash on program termination in PyThreadState_Delete. The overview of what happens is:

  • I have the main thread (which is the python process) and a second thread that is not related to python
  • At some point in the program, the second thread calls a python callback
  • cffi initializes some state about python in misc_thread_*.h
  • the python code finishes running (the second thread is not running python code anymore)
  • python cleans up all its stuff (I thought that this would be Py_Finalize, but gdb doesn't break on it)
  • the static destruction occurs which joins the second thread
  • the second thread terminates and the pthread destruction of TLS is run which calls cffi_thread_shutdown
  • cffi_thread_shutdown calls PyThreadState_Delete which crashes because python already cleaned everything in the main thread

I don't understand why cffi calls PyThreadState_Delete. The ThreadState seems to come from gil_ensure's call to PyGILState_GetThisThreadState but it doesn't seem to me that this pointer must be freed by cffi.

Here is a small program to reproduce the issue:

# build_foo.py
from cffi import FFI
ffibuilder = FFI()

ffibuilder.set_source(
    "_foo",
    r"""
    #include <thread>
    #include <cstdlib>
    #include <unistd.h>

    struct T
    {
        ~T() {
            t.join();
        }

        std::thread t;
    };

    static T g_thread;

    typedef void(*foo_t)(void);
    static foo_t g_foo;

    void set_foo(foo_t new_foo) {
        g_foo = new_foo;
    }

    void call_foo(void) {
        g_thread.t = std::thread([] { g_foo(); sleep(3); });
        sleep(1);
    }
    """,
    libraries=[],
    source_extension=".cpp",
)


ffibuilder.cdef("""
extern "Python" void my_foo(void);
typedef void(*foo_t)(void);
void set_foo(foo_t new_foo);
void call_foo(void);
""")

if __name__ == "__main__":
    ffibuilder.compile(verbose=True)
# foo.py
from _foo import ffi
from _foo import lib as libfoo


@ffi.def_extern()
def my_foo():
    print("my foo called")


libfoo.set_foo(libfoo.my_foo)
libfoo.call_foo()

I run it with:

CFLAGS=-std=c++11 CXXFLAGS= python3 setup.py clean develop --user && gdb -ex run --args python3 foo.py

The output:

my foo called
Fatal Python error: PyThreadState_Delete: NULL interp

The stacks at the moment of the crash:

(gdb) thread apply all bt

Thread 2 (Thread 0x7ffff61a2700 (LWP 19256)):
#0  0x0000000000428c10 in PyThreadState_Delete ()
#1  0x00007ffff699bf51 in cffi_thread_shutdown (p=0x7ffff00011d0) at c/misc_thread_common.h:32
#2  0x00007ffff7bc25e9 in __nptl_deallocate_tsd () at pthread_create.c:175
#3  0x00007ffff7bc3648 in __nptl_deallocate_tsd () at pthread_create.c:326
#4  start_thread (arg=0x7ffff61a2700) at pthread_create.c:346
#5  0x00007ffff6db0abf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97

Thread 1 (Thread 0x7ffff7fbe700 (LWP 19255)):
#0  0x00007ffff7bc46cd in pthread_join (threadid=140737322297088, thread_return=0x0) at pthread_join.c:90
#1  0x00007ffff66b2af7 in std::thread::join() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#2  0x00007ffff7fc548c in T::~T (this=0x7ffff7fc7120 <g_thread>) at build/temp.linux-x86_64-3.6/_foo.cpp:500
#3  0x00007ffff6cfd940 in __run_exit_handlers (status=0, listp=0x7ffff705f5d8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:83
#4  0x00007ffff6cfd99a in __GI_exit (status=<optimized out>) at exit.c:105
#5  0x00007ffff6ce82e8 in __libc_start_main (main=0x421dc0 <main>, argc=2, argv=0x7fffffffdf78, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffdf68) at ../csu/libc-start.c:325
#6  0x0000000000421ffa in _start ()

I'm using python 3.6.3 and cffi 1.11.5

Comments (11)

  1. Armin Rigo

    That's yet another issue with the threading API in CPython. CFFI needs to call PyThreadState_Delete() because I am attempting to cache the per-thread PyThreadState. The more official approach would be horribly inefficient in one important case: if the C/C++ code is running its own thread and makes a lot of small calls to Python via CFFI.

    I think there is no reliable non-hackish solution to the current problem, so I'll need some time to figure out a hackish one...

  2. Dimitri Merejkowsky

    Hello and sorry for the late reply.

    Your fix does indeed fix the example, but our real program still crashes.

    We'll try to give you a more realistic example soonish.

  3. BlastRock

    Hi,

    I'm the original poster (and a coworker of Dimitri). I got back at this bug to try to find out why our bug is still present with the official version.

    I just tested my initial example from the first post with e5f8ac3b8e6b, 1.11.5 and current master. The bug is still present for me with the exact same error. I'm attaching the project that I used for my tests (from the original post).

  4. Armin Rigo

    Okay... Took me a while to figure it out, but it's because the implementation relies on sys.exitfunc being called when the interpreter exits. But sys.exitfunc is not recognized any more in CPython 3.x, and so it's silently ignored. Why it would be a good idea to do a change that subtly break things is beyond me. I'll fix cffi...

  5. Armin Rigo

    It's even more of a mess. I found out that if you make many threads from C which call some Python functions, and if the Python functions use thread._local() attributes, then you get memory leaks (reference counters from the PyThreadState structure are never decremented). It's unclear how to fix it so far; more thinking needed.

  6. BlastRock

    Hi,

    I tested that commit with our code, and it works! :)

    I reviewed the code, which seems fine to me, except for the comment I left on the commit.

    Thank you!

  7. Armin Rigo

    Yay! It only took adding a custom type and almost no cough hacking around. The CPython C API is not easy to work with if you try to develop a cleaner interface on top...

  8. Log in to comment