another segfault with gevent

Issue #2210 resolved
Armin Rigo created an issue

Comments (14)

  1. Jay Oster

    We are still seeing semi-frequent crashes in production with the latest gevent (the one which switched to pure-Python semaphore to workaround the CPyExt/GC crashes discussed in #2209) and PyPy 4.0.1.

    Here's an anonymized snippet of kern.log... It's kind of useless, but shows the crash occurs in libpypy-c.so, and also highlights at least one case where an abort occurs instead of segfault.

    Jan  3 07:43:39 ip-XXX-XXX-XXX-XXX kernel: [6509709.402702] pypy[6896]: segfault at 7f6c0fd1ce00 ip 00007f6c095a6147 sp 00007fff1cde6060 error 4 in libpypy-c.so[7f6c06f83000+2c3e000]
    Jan  3 07:43:39 ip-XXX-XXX-XXX-XXX kernel: [6509709.647262] init: XXX-worker (3) main process (6896) killed by SEGV signal
    Jan  3 07:43:39 ip-XXX-XXX-XXX-XXX kernel: [6509709.647305] init: XXX-worker (3) main process ended, respawning
    Jan  3 08:10:48 ip-XXX-XXX-XXX-XXX kernel: [6511338.316319] pypy[6950]: segfault at 7f00b41ddd80 ip 00007f00af262147 sp 00007fff54b26220 error 4 in libpypy-c.so[7f00acc3f000+2c3e000]
    Jan  3 08:10:48 ip-XXX-XXX-XXX-XXX kernel: [6511338.472527] init: XXX-worker (4) main process (6950) killed by SEGV signal
    Jan  3 08:10:48 ip-XXX-XXX-XXX-XXX kernel: [6511338.472569] init: XXX-worker (4) main process ended, respawning
    Jan  3 09:37:56 ip-XXX-XXX-XXX-XXX kernel: [6516566.698836] pypy[7067]: segfault at 7fa6baf37080 ip 00007fa6b4c65147 sp 00007fff4a58a080 error 4 in libpypy-c.so[7fa6b2642000+2c3e000]
    Jan  3 09:37:56 ip-XXX-XXX-XXX-XXX kernel: [6516566.846604] init: XXX-worker (7) main process (7067) killed by SEGV signal
    Jan  3 09:37:56 ip-XXX-XXX-XXX-XXX kernel: [6516566.846644] init: XXX-worker (7) main process ended, respawning
    Jan  3 11:55:56 ip-XXX-XXX-XXX-XXX kernel: [6524846.663201] pypy[4932]: segfault at 7f3d7e0cf680 ip 00007f3d76011147 sp 00007fff6ae0c2a0 error 4 in libpypy-c.so[7f3d739ee000+2c3e000]
    Jan  3 11:55:56 ip-XXX-XXX-XXX-XXX kernel: [6524846.916886] init: XXX-worker (6) main process (4932) killed by SEGV signal
    Jan  3 11:55:56 ip-XXX-XXX-XXX-XXX kernel: [6524846.916926] init: XXX-worker (6) main process ended, respawning
    Jan  3 14:10:56 ip-XXX-XXX-XXX-XXX kernel: [6532946.582829] pypy[21286]: segfault at 7f9b1fae6300 ip 00007f9b1ab9d147 sp 00007fff892ad4d0 error 4 in libpypy-c.so[7f9b1857a000+2c3e000]
    Jan  3 14:10:56 ip-XXX-XXX-XXX-XXX kernel: [6532946.712408] init: XXX-worker (7) main process (21286) killed by SEGV signal
    Jan  3 14:10:56 ip-XXX-XXX-XXX-XXX kernel: [6532946.712448] init: XXX-worker (7) main process ended, respawning
    Jan  3 15:01:40 ip-XXX-XXX-XXX-XXX kernel: [6535990.483430] pypy[32699]: segfault at 7f2e3ccdc100 ip 00007f2e359ad147 sp 00007fff6fd59fe0 error 4 in libpypy-c.so[7f2e3338a000+2c3e000]
    Jan  3 15:01:40 ip-XXX-XXX-XXX-XXX kernel: [6535990.618635] init: XXX-worker (6) main process (32699) killed by SEGV signal
    Jan  3 15:01:40 ip-XXX-XXX-XXX-XXX kernel: [6535990.618673] init: XXX-worker (6) main process ended, respawning
    Jan  3 17:03:43 ip-XXX-XXX-XXX-XXX kernel: [6543313.474792] pypy[15744]: segfault at 7f10108f4c00 ip 00007f100a928147 sp 00007fff1de0bb70 error 4 in libpypy-c.so[7f1008305000+2c3e000]
    Jan  3 17:03:43 ip-XXX-XXX-XXX-XXX kernel: [6543313.607193] init: XXX-worker (6) main process (15744) killed by SEGV signal
    Jan  3 17:03:43 ip-XXX-XXX-XXX-XXX kernel: [6543313.607233] init: XXX-worker (6) main process ended, respawning
    Jan  3 17:20:49 ip-XXX-XXX-XXX-XXX kernel: [6544339.717729] pypy[25874]: segfault at 7fe216927280 ip 00007fe20f0e5147 sp 00007fff576c9ef0 error 4 in libpypy-c.so[7fe20cac2000+2c3e000]
    Jan  3 17:20:49 ip-XXX-XXX-XXX-XXX kernel: [6544339.851949] init: XXX-worker (6) main process (25874) killed by SEGV signal
    Jan  3 17:20:49 ip-XXX-XXX-XXX-XXX kernel: [6544339.851991] init: XXX-worker (6) main process ended, respawning
    Jan  3 18:08:59 ip-XXX-XXX-XXX-XXX kernel: [6547229.659331] pypy[11489]: segfault at 7fd5ce2f4300 ip 00007fd5c84ad147 sp 00007fffc3778800 error 4 in libpypy-c.so[7fd5c5e8a000+2c3e000]
    Jan  3 18:08:59 ip-XXX-XXX-XXX-XXX kernel: [6547229.834471] init: XXX-worker (7) main process (11489) killed by SEGV signal
    Jan  3 18:08:59 ip-XXX-XXX-XXX-XXX kernel: [6547229.834531] init: XXX-worker (7) main process ended, respawning
    Jan  3 18:11:17 ip-XXX-XXX-XXX-XXX kernel: [6547367.493225] init: XXX-worker (2) main process (11814) killed by ABRT signal
    Jan  3 18:11:17 ip-XXX-XXX-XXX-XXX kernel: [6547367.493283] init: XXX-worker (2) main process ended, respawning
    Jan  4 04:03:11 ip-XXX-XXX-XXX-XXX kernel: [6582881.341900] pypy[31350]: segfault at 7ff1c181bc00 ip 00007ff1ba5e6147 sp 00007fff7e56d2b0 error 4 in libpypy-c.so[7ff1b7fc3000+2c3e000]
    Jan  4 04:03:11 ip-XXX-XXX-XXX-XXX kernel: [6582881.859591] init: XXX-worker (7) main process (31350) killed by SEGV signal
    Jan  4 04:03:11 ip-XXX-XXX-XXX-XXX kernel: [6582881.859635] init: XXX-worker (7) main process ended, respawning
    Jan  4 05:07:36 ip-XXX-XXX-XXX-XXX kernel: [6586746.271081] pypy[11801]: segfault at 7fe01e7c6300 ip 00007fe017a27147 sp 00007fffb54774a0 error 4 in libpypy-c.so[7fe015404000+2c3e000]
    Jan  4 05:07:36 ip-XXX-XXX-XXX-XXX kernel: [6586746.512602] init: XXX-worker (3) main process (11801) killed by SEGV signal
    Jan  4 05:07:36 ip-XXX-XXX-XXX-XXX kernel: [6586746.512646] init: XXX-worker (3) main process ended, respawning
    Jan  4 06:54:41 ip-XXX-XXX-XXX-XXX kernel: [6593171.647144] pypy[14078]: segfault at 7f3095ecb100 ip 00007f308fbc9147 sp 00007fffa5ab1d80 error 4 in libpypy-c.so[7f308d5a6000+2c3e000]
    Jan  4 06:54:41 ip-XXX-XXX-XXX-XXX kernel: [6593171.888769] init: XXX-worker (4) main process (14078) killed by SEGV signal
    Jan  4 06:54:41 ip-XXX-XXX-XXX-XXX kernel: [6593171.888808] init: XXX-worker (4) main process ended, respawning
    Jan  4 08:25:58 ip-XXX-XXX-XXX-XXX kernel: [6598648.709747] pypy[27285]: segfault at 7f1581dd7900 ip 00007f157ecee147 sp 00007fff456a9c00 error 4 in libpypy-c.so[7f157c6cb000+2c3e000]
    Jan  4 08:25:59 ip-XXX-XXX-XXX-XXX kernel: [6598648.963024] init: XXX-worker (6) main process (27285) killed by SEGV signal
    Jan  4 08:25:59 ip-XXX-XXX-XXX-XXX kernel: [6598648.963063] init: XXX-worker (6) main process ended, respawning
    Jan  4 20:55:58 ip-XXX-XXX-XXX-XXX kernel: [6643648.781510] pypy[30099]: segfault at 7fc4b4971400 ip 00007fc4ae963147 sp 00007fff09899390 error 4 in libpypy-c.so[7fc4ac340000+2c3e000]
    Jan  4 20:55:59 ip-XXX-XXX-XXX-XXX kernel: [6643649.229890] init: XXX-worker (4) main process (30099) killed by SEGV signal
    Jan  4 20:55:59 ip-XXX-XXX-XXX-XXX kernel: [6643649.229955] init: XXX-worker (4) main process ended, respawning
    Jan  5 01:37:40 ip-XXX-XXX-XXX-XXX kernel: [6660550.189110] pypy[2586]: segfault at 7fec51b75780 ip 00007fec49161147 sp 00007fff4917d3a0 error 4 in libpypy-c.so[7fec46b3e000+2c3e000]
    Jan  5 01:37:40 ip-XXX-XXX-XXX-XXX kernel: [6660550.587627] init: XXX-worker (4) main process (2586) killed by SEGV signal
    Jan  5 01:37:40 ip-XXX-XXX-XXX-XXX kernel: [6660550.587671] init: XXX-worker (4) main process ended, respawning
    Jan  5 15:23:24 ip-XXX-XXX-XXX-XXX kernel: [6710094.780643] pypy[15718]: segfault at 7f5509c83080 ip 00007f5502d36147 sp 00007fff7c955530 error 4 in libpypy-c.so[7f5500713000+2c3e000]
    Jan  5 15:23:25 ip-XXX-XXX-XXX-XXX kernel: [6710095.254614] init: XXX-worker (7) main process (15718) killed by SEGV signal
    Jan  5 15:23:25 ip-XXX-XXX-XXX-XXX kernel: [6710095.254655] init: XXX-worker (7) main process ended, respawning
    Jan  5 18:38:52 ip-XXX-XXX-XXX-XXX kernel: [6721822.450968] pypy[6913]: segfault at 7f93fbdd0c00 ip 00007f93f5ae4147 sp 00007fff8a846d50 error 4 in libpypy-c.so[7f93f34c1000+2c3e000]
    Jan  5 18:38:52 ip-XXX-XXX-XXX-XXX kernel: [6721822.798449] init: XXX-worker (0) main process (6913) killed by SEGV signal
    Jan  5 18:38:52 ip-XXX-XXX-XXX-XXX kernel: [6721822.798489] init: XXX-worker (0) main process ended, respawning
    

    See also stack traces from a 4.0.0 debug build, including the abort in pypy_g_BlackholeInterpreter_handle_exception_in_frame : https://gist.github.com/parasyte/c0bc45e9500e9033e569

  2. Jason Madden

    Do you perhaps have any cpytext modules loaded? It starts to seem increasingly clear that cpyext is a minefield with delayed effects (I think the pypy devs are aware of this, hence the branches and calls for support.)

    Also, thanks for all your testing and help :)

  3. Jay Oster

    I don't think so... here are all of the modules used:

    Cython==0.23.4
    PyYAML==3.11
    bottle==0.12.8
    gevent==1.1rc3
    pubnub==3.3.2
    sortedcontainers==0.9.5
    statsd==3.1
    

    PyYAML is using the pure Python parser (and is only used on startup (!) The processes crash after several hours/days of running)

    The only dylibs in the virtualenv are from gevent:

    jay@XXX:/env/site-packages$ find . -name '*.so'
    ./gevent/_corecffi.pypy-26.so
    ./gevent/ares.pypy-26.so
    
  4. Armin Rigo reporter

    Also, try with a trunk version. There is at least one very rare bug that has been fixed since 4.0.1...

  5. Jay Oster

    @arigo The only module which depends on Cython is gevent. It requires that Cython is installed, even with PyPy. @jamadden should be able to explain the reason for this.

    Can you elaborate on the rare bug that's fixed on trunk? Or even better, provide links to the commit that fixes it...

    It's unfortunately very risky to roll out changes to this service, due to its heavy usage. Updating the global cluster takes a few days, on top of a 24-hour minimum soaking period.

  6. Jason Madden

    @parasyte @arigo Cython is not required to be installed by gevent unless you're running from a source checkout. Installing a package from PyPI does not require Cython to be installed. (Some versions in the lead up to the current 1.1rc3 may have required Cython to be installed even when installing from PyPI, but I don't remember if that's the case.)

    Under PyPy, there is one extension module (gevent.ares) that is compiled by Cython; packages on PyPI already have run Cython and only ship the resulting .c source (or .so in the case of OS X wheels). This cpyext extension module is not used or even imported by default; users have to opt-in to using it. It has a very different use case than the Semaphore module---it's effectively used as a singleton bridge to C functions---so for those reasons it is still shipped (although its use is not encouraged under PyPy).

  7. Jay Oster

    Thanks @jamadden, you're right! I was installing gevent with a git commit id. I've since migrated to 1.1rc3, and no more Cython. Here is the complete list of dependencies and all dependency's dependencies:

    $ pip freeze
    bottle==0.12.8
    gevent==1.1rc3
    greenlet==0.4.9
    pubnub==3.3.2
    PyYAML==3.11
    sortedcontainers==0.9.5
    statsd==3.1
    wheel==0.24.0
    
  8. Jay Oster

    Unclear if this is still an issue with the latest PyPy (and gevent), as we haven't had a chance to upgrade yet.

  9. Jay Oster

    @fijal This is still an issue with PyPy 5.1. (Also using gevent 1.1.1)

    The logs are not much use, but included just FTR. Still seeing segfaults and aborts. And a new one, "general protection" ... PyPy is really going off in the weeds, here.

    May 16 14:03:54 ip-XXX-XXX-XXX-XXX kernel: [17945080.231059] init: XXX-worker (3) main process (4042) killed by ABRT signal
    May 16 14:03:54 ip-XXX-XXX-XXX-XXX kernel: [17945080.231089] init: XXX-worker (3) main process ended, respawning
    May 16 14:53:56 ip-XXX-XXX-XXX-XXX kernel: [17948081.974617] pypy[14094]: segfault at 7f7981f40020 ip 00007f797da37835 sp 00007fff85862430 error 4 in libpypy-c.so[7f797b269000+2e0e000]
    May 16 14:53:56 ip-XXX-XXX-XXX-XXX kernel: [17948082.071414] init: XXX-worker (3) main process (14094) killed by SEGV signal
    May 16 14:53:56 ip-XXX-XXX-XXX-XXX kernel: [17948082.071444] init: XXX-worker (3) main process ended, respawning
    May 16 15:22:51 ip-XXX-XXX-XXX-XXX kernel: [17949817.118919] pypy[17812] general protection ip:7f45c465d6d0 sp:7fff14344e50 error:0 in libpypy-c.so[7f45c2190000+2e0e000]
    May 16 15:22:51 ip-XXX-XXX-XXX-XXX kernel: [17949817.224425] init: XXX-worker (3) main process (17812) killed by SEGV signal
    May 16 15:22:51 ip-XXX-XXX-XXX-XXX kernel: [17949817.224453] init: XXX-worker (3) main process ended, respawning
    May 16 15:38:00 ip-XXX-XXX-XXX-XXX kernel: [17950726.481220] pypy[20015] general protection ip:7fe634a9d6d0 sp:7fffd31bdb00 error:0 in libpypy-c.so[7fe6325d0000+2e0e000]
    May 16 15:38:00 ip-XXX-XXX-XXX-XXX kernel: [17950726.570504] init: XXX-worker (3) main process (20015) killed by SEGV signal
    May 16 15:38:00 ip-XXX-XXX-XXX-XXX kernel: [17950726.570532] init: XXX-worker (3) main process ended, respawning
    May 16 15:38:01 ip-XXX-XXX-XXX-XXX kernel: [17950726.859219] init: XXX-worker (3) main process (21144) killed by TERM signal
    May 16 15:56:20 ip-XXX-XXX-XXX-XXX kernel: [17951826.126002] pypy[21161] general protection ip:7fb1b5b786d0 sp:7fff30d6c380 error:0 in libpypy-c.so[7fb1b36ab000+2e0e000]
    May 16 15:56:20 ip-XXX-XXX-XXX-XXX kernel: [17951826.217818] init: XXX-worker (3) main process (21161) killed by SEGV signal
    May 16 15:56:20 ip-XXX-XXX-XXX-XXX kernel: [17951826.217845] init: XXX-worker (3) main process ended, respawning
    May 16 16:15:36 ip-XXX-XXX-XXX-XXX kernel: [17952982.250509] pypy[22505] general protection ip:7fe907e626d0 sp:7fffa7ebd190 error:0 in libpypy-c.so[7fe905995000+2e0e000]
    May 16 16:15:36 ip-XXX-XXX-XXX-XXX kernel: [17952982.339923] init: XXX-worker (3) main process (22505) killed by SEGV signal
    May 16 16:15:36 ip-XXX-XXX-XXX-XXX kernel: [17952982.339947] init: XXX-worker (3) main process ended, respawning
    May 16 16:58:44 ip-XXX-XXX-XXX-XXX kernel: [17955570.426501] pypy[24032] general protection ip:7f8e282a563b sp:7fff09d5b4f0 error:0 in libpypy-c.so[7f8e25d13000+2e0e000]
    May 16 16:58:44 ip-XXX-XXX-XXX-XXX kernel: [17955570.556733] init: XXX-worker (3) main process (24032) killed by SEGV signal
    May 16 16:58:44 ip-XXX-XXX-XXX-XXX kernel: [17955570.556759] init: XXX-worker (3) main process ended, respawning
    May 16 17:24:33 ip-XXX-XXX-XXX-XXX kernel: [17957118.694035] pypy[27251]: segfault at 8 ip 00007f7c6f3e3a18 sp 00007fff9400aa90 error 4 in libpypy-c.so[7f7c6ce00000+2e0e000]
    May 16 17:24:33 ip-XXX-XXX-XXX-XXX kernel: [17957118.798798] init: XXX-worker (3) main process (27251) killed by SEGV signal
    May 16 17:24:33 ip-XXX-XXX-XXX-XXX kernel: [17957118.798826] init: XXX-worker (3) main process ended, respawning
    May 16 17:37:09 ip-XXX-XXX-XXX-XXX kernel: [17957875.284843] pypy[29235]: segfault at 10 ip 00007f694cc076d8 sp 00007fffaa012f90 error 4 in libpypy-c.so[7f694a73a000+2e0e000]
    May 16 17:37:09 ip-XXX-XXX-XXX-XXX kernel: [17957875.398241] init: XXX-worker (3) main process (29235) killed by SEGV signal
    May 16 17:37:09 ip-XXX-XXX-XXX-XXX kernel: [17957875.398268] init: XXX-worker (3) main process ended, respawning
    
  10. Log in to comment