Segfaults in testing/embedding/test_thread.py with recent 2.7 snapshots

Issue #358 resolved
Stefano Rivera
created an issue

Recently cffi CI tests have been failing in Debian: https://ci.debian.net/packages/p/python-cffi/unstable/amd64/ https://bugs.debian.org/889813

This presumably was caused by a new Python 2.7 snapshot being uploaded: https://tracker.debian.org/news/930217

Yeah, I know that's an ancient cffi version, I'm working on updating that - and the same thing happens on the hg head.

Is this a cpython bug, or something that's changed that triggered a cffi bug? I can't see anything significant between now and the previous snapshot (20171205).

It doesn't segfault for me under gdb, but this is what the core file shows:

Core was generated by `./thread3-test'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  new_threadstate (init=1, interp=0x0) at ../Python/pystate.c:202
202 ../Python/pystate.c: No such file or directory.
[Current thread is 1 (Thread 0x7f77884ed700 (LWP 24815))]
(gdb) bt
#0  new_threadstate (init=1, interp=0x0) at ../Python/pystate.c:202
#1  PyThreadState_New (interp=0x0) at ../Python/pystate.c:213
#2  0x00007f77897a4f6c in PyGILState_Ensure () at ../Python/pystate.c:604
#3  0x00007f778a1ad5a4 in _cffi_initialize_python () at _add3_cffi.c:841
#4  _cffi_start_python () at _add3_cffi.c:1124
#5  0x00007f778a1ad900 in _cffi_start_and_call_python (
    externpy=0x7f778a3ae1a0 <_cffi_externpy__add3>, args=0x7f77884ece90 "\350\003")
    at _add3_cffi.c:1164
#6  0x00007f778a1ad3b3 in add3 (a0=<optimized out>, a1=<optimized out>, a2=<optimized out>, 
    a3=<optimized out>) at _add3_cffi.c:1232
#7  0x000055c4b99f3d5d in start_routine_3 (arg=0x0) at thread3-test.c:26
#8  0x00007f7789f9551a in start_thread (arg=0x7f77884ed700) at pthread_create.c:465
#9  0x00007f7789ccd3ef in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) thread 2
[Switching to thread 2 (Thread 0x7f778a795b80 (LWP 24813))]
#0  0x00007f7789f9b7fd in futex_wait_cancelable (private=<optimized out>, expected=0, 
    futex_word=0x55c4b9bf5110 <done+80>) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
88  ../sysdeps/unix/sysv/linux/futex-internal.h: No such file or directory.
(gdb) bt
#0  0x00007f7789f9b7fd in futex_wait_cancelable (private=<optimized out>, expected=0, 
    futex_word=0x55c4b9bf5110 <done+80>) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x55c4b9bf50c0 <done>, 
    cond=0x55c4b9bf50e8 <done+40>) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x55c4b9bf50e8 <done+40>, mutex=0x55c4b9bf50c0 <done>)
    at pthread_cond_wait.c:655
#3  0x000055c4b99f3c84 in sem_wait (sem=0x55c4b9bf50c0 <done>) at thread-test.h:37
#4  0x000055c4b99f3edc in main () at thread3-test.c:50
(gdb) thread 2
[Switching to thread 2 (Thread 0x7f778a795b80 (LWP 24813))]
#0  0x00007f7789f9b7fd in futex_wait_cancelable (private=<optimized out>, expected=0, 
    futex_word=0x55c4b9bf5110 <done+80>) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
88  in ../sysdeps/unix/sysv/linux/futex-internal.h
(gdb) bt
#0  0x00007f7789f9b7fd in futex_wait_cancelable (private=<optimized out>, expected=0, 
    futex_word=0x55c4b9bf5110 <done+80>) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x55c4b9bf50c0 <done>, 
    cond=0x55c4b9bf50e8 <done+40>) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x55c4b9bf50e8 <done+40>, mutex=0x55c4b9bf50c0 <done>)
    at pthread_cond_wait.c:655
#3  0x000055c4b99f3c84 in sem_wait (sem=0x55c4b9bf50c0 <done>) at thread-test.h:37
#4  0x000055c4b99f3edc in main () at thread3-test.c:50
(gdb) thread 3
[Switching to thread 3 (Thread 0x7f7788cee700 (LWP 24814))]
#0  0x00007f7789cbe105 in __GI___xstat (vers=<optimized out>, 
    name=0x7f7789bd6b60 <progpath> "/home/stefanor/bin/python", buf=0x7f7788cecc70)
    at ../sysdeps/unix/sysv/linux/wordsize-64/xstat.c:35
35  ../sysdeps/unix/sysv/linux/wordsize-64/xstat.c: No such file or directory.
(gdb) bt
#0  0x00007f7789cbe105 in __GI___xstat (vers=<optimized out>, 
    name=0x7f7789bd6b60 <progpath> "/home/stefanor/bin/python", buf=0x7f7788cecc70)
    at ../sysdeps/unix/sysv/linux/wordsize-64/xstat.c:35
#1  0x00007f778981f87c in stat64 () at /usr/include/x86_64-linux-gnu/sys/stat.h:451
#2  isxfile (filename=0x7f7789bd6b60 <progpath> "/home/stefanor/bin/python")
    at ../Modules/getpath.c:155
#3  calculate_path () at ../Modules/getpath.c:430
#4  0x00007f7789821359 in Py_GetProgramFullPath () at ../Modules/getpath.c:693
#5  0x00007f7789812dfa in _PySys_Init () at ../Python/sysmodule.c:1440
#6  0x00007f77897ae492 in Py_InitializeEx (install_sigs=<optimized out>, 
    install_sigs=<optimized out>) at ../Python/pythonrun.c:248
#7  0x00007f778a3b0593 in _cffi_py_initialize () at _add2_cffi.c:814
#8  _cffi_initialize_python () at _add2_cffi.c:837
#9  _cffi_start_python () at _add2_cffi.c:1135
#10 0x00007f778a3b0900 in _cffi_start_and_call_python (
    externpy=0x7f778a5b21a0 <_cffi_externpy__add2>, args=0x7f7788cedea0 "(")
    at _add2_cffi.c:1175
#11 0x00007f778a3b03af in add2 (a0=<optimized out>, a1=<optimized out>, a2=<optimized out>)
    at _add2_cffi.c:1241
#12 0x000055c4b99f3cd3 in start_routine_2 (arg=0x0) at thread3-test.c:14
#13 0x00007f7789f9551a in start_thread (arg=0x7f7788cee700) at pthread_create.c:465
#14 0x00007f7789ccd3ef in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) thread 4
[Switching to thread 4 (Thread 0x7f7787cec700 (LWP 24816))]
#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
135 ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file or directory.

Comments (7)

  1. Stefano Rivera reporter

    My best explanation is that this change resulted in releasing the GIL within PyInitializeEx, (while setting up stdio files objects). That allows the two interpreters in the test to race during startup, although I'm not entirely sure how.

    Thread 3 "thread3-test" received signal SIGSEGV, Segmentation fault.
    [Switching to Thread 0x7ffff608f700 (LWP 4561)]
    new_threadstate (init=1, interp=0x0) at Python/pystate.c:202
    202         tstate->next = interp->tstate_head;
    (gdb) bt
    #0  new_threadstate (init=1, interp=0x0) at Python/pystate.c:202
    #1  PyThreadState_New (interp=0x0) at Python/pystate.c:213
    #2  0x00007ffff71140fc in PyGILState_Ensure () at Python/pystate.c:604
    #3  0x00007ffff79d34b4 in _cffi_initialize_python () at _add3_cffi.c:841
    #4  _cffi_start_python () at _add3_cffi.c:1124
    #5  0x00007ffff79d3810 in _cffi_start_and_call_python (
        externpy=0x7ffff7bd41a0 <_cffi_externpy__add3>, args=0x7ffff608eea0 "\350\003")
        at _add3_cffi.c:1164
    #6  0x00007ffff79d3313 in add3 (a0=<optimized out>, a1=<optimized out>, a2=<optimized out>, 
        a3=<optimized out>) at _add3_cffi.c:1232
    #7  0x0000555555554d3d in start_routine_3 (arg=0x0) at thread3-test.c:26
    #8  0x00007ffff77bb51a in start_thread (arg=0x7ffff608f700) at pthread_create.c:465
    #9  0x00007ffff74f33ef in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
    

    That interp=0x0 looks wrong.

  2. Armin Rigo

    Yes, it's likely something like that. Not too surprizing, given that what I'm trying to do with cffi embedding is not really officially supported in CPython: initializing it "just in time".

  3. Armin Rigo

    Seems that one thread acquiring the GIL while another is in the middle of Py_InitializeEx() will do this kind of crash now. I guess we need to apply the Python 3 logic in Python 2 as well, and too bad for the longer spin-locking (see comments in cffi/_embedding.h, _cffi_carefully_make_gil()).

  4. Armin Rigo

    Can you try to replace the two #if PY_VERSION_MAJOR >= 3 with #if 1? It seems to work too for Python 2, but I'd like to know if it seems it may fix the problem.

  5. Log in to comment