Stack smashing (and crash with SIGSEGV) with rpy2, glibc 2.20 and Bioconductor OrgDb packages

Issue #251 resolved
Luca Beltrame
created an issue

The following minimal case reproduces it reliably on my system (100%):

#!/usr/bin/env python3

from rpy2.robjects.packages import importr
importr("org.Hs.eg.db")

which yields

python3 crasher.py 
Error: org.Hs.egPFAM is defunct. Please use select() if you need access to
  PFAM or PROSITE accessions.

*** stack smashing detected ***: python3 terminated
Segmentation fault

More details:

R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet"

Python 3.4.1

CFLAGS (just in case):

The call causing the actual crash is (reformatted; line 403 of the relevant file, but also the call at line 410 is affected).

    pack = InstalledSTPackage(env, name, translation = robject_translations,
    exported_names = exported_names, on_conflict = on_conflict, version = version)

CFLAGS (just in case)

-O2 -g -m64 -fmessage-length=0 -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables

And here's the backtrace:

#0  0x00007ffff3c33568 in ?? () from /lib64/libgcc_s.so.1
#1  0x00007ffff3c34469 in _Unwind_Backtrace () from /lib64/libgcc_s.so.1
#2  0x00007ffff74b4716 in backtrace () from /lib64/libc.so.6
#3  0x00007ffff73e2d92 in backtrace_and_maps () from /lib64/libc.so.6
#4  0x00007ffff743583f in __libc_message () from /lib64/libc.so.6
#5  0x00007ffff74b7c37 in __fortify_fail () from /lib64/libc.so.6
#6  0x00007ffff74b7c00 in __stack_chk_fail () from /lib64/libc.so.6
#7  0x00007ffff55120f1 in setup_Rmainloop () from /usr/lib64/R/lib/libR.so
#8  0x00007fff00000000 in ?? ()
#9  0x00000000ff030000 in ?? ()
#10 0x00007fffffffa8d0 in ?? ()
#11 0x00007fffffffabb0 in ?? ()
#12 0x00007ffff555fc6b in InInteger () from /usr/lib64/R/lib/libR.so
#13 0x00007ffff555febb in ReadBC1 () from /usr/lib64/R/lib/libR.so
#14 0x00007ffff556039c in ReadItem () from /usr/lib64/R/lib/libR.so
#15 0x00007ffff556078f in ReadItem () from /usr/lib64/R/lib/libR.so
#16 0x00007ffff5563900 in R_Unserialize () from /usr/lib64/R/lib/libR.so
#17 0x00007ffff556445a in R_unserialize () from /usr/lib64/R/lib/libR.so
#18 0x00007ffff5564734 in do_lazyLoadDBfetch () from /usr/lib64/R/lib/libR.so
#19 0x00007ffff54ea7e7 in Rf_eval () from /usr/lib64/R/lib/libR.so
#20 0x00007fff00000002 in ?? ()
#21 0x00000000051d2ee8 in ?? ()
#22 0x00000000051d1690 in ?? ()
#23 0x00000000051d1690 in ?? ()
#24 0x00000000051d51d8 in ?? ()
#25 0x00000000051d24a8 in ?? ()
#26 0x00007fffffffae50 in ?? ()
#27 0x00007ffff58f3194 in R_BrowseLines () from /usr/lib64/R/lib/libR.so
#28 0x00000000051d5130 in ?? ()
#29 0x0000000100000003 in ?? ()
#30 0x00007ffff551afd1 in Rf_NewEnvironment () from /usr/lib64/R/lib/libR.so
#31 0x00007ffff54eba85 in Rf_applyClosure () from /usr/lib64/R/lib/libR.so
#32 0x00007ffff54e326c in bcEval () from /usr/lib64/R/lib/libR.so
#33 0x00007ffff54ea590 in Rf_eval () from /usr/lib64/R/lib/libR.so
#34 0x00007ffff54eba85 in Rf_applyClosure () from /usr/lib64/R/lib/libR.so
#35 0x00007ffff54ea54e in Rf_eval () from /usr/lib64/R/lib/libR.so
#36 0x00007ffff54ecad3 in do_begin () from /usr/lib64/R/lib/libR.so
#37 0x00007ffff54ea7af in Rf_eval () from /usr/lib64/R/lib/libR.so
#38 0x00007ffff54ea7af in Rf_eval () from /usr/lib64/R/lib/libR.so
---Type <return> to continue, or q <return> to quit---
#39 0x00007ffff54ecad3 in do_begin () from /usr/lib64/R/lib/libR.so
#40 0x00007ffff54ea7af in Rf_eval () from /usr/lib64/R/lib/libR.so
#41 0x00007ffff54eba85 in Rf_applyClosure () from /usr/lib64/R/lib/libR.so
#42 0x00007ffff54ea54e in Rf_eval () from /usr/lib64/R/lib/libR.so
#43 0x00007ffff54c9b2d in getActiveValue () from /usr/lib64/R/lib/libR.so
#44 0x00007ffff59de6fc in EnvironmentSexp_subscript (self=<optimized out>, key=<optimized out>) at ./rpy/rinterface/_rinterface.c:2255
#45 0x00007ffff7a6cfdb in PyEval_EvalFrameEx () from /usr/lib64/libpython3.4m.so.1.0
#46 0x00007ffff7a76839 in PyEval_EvalCodeEx () from /usr/lib64/libpython3.4m.so.1.0
#47 0x00007ffff7a70486 in PyEval_EvalFrameEx () from /usr/lib64/libpython3.4m.so.1.0
#48 0x00007ffff7a76839 in PyEval_EvalCodeEx () from /usr/lib64/libpython3.4m.so.1.0
#49 0x00007ffff7a70486 in PyEval_EvalFrameEx () from /usr/lib64/libpython3.4m.so.1.0
#50 0x00007ffff7a76839 in PyEval_EvalCodeEx () from /usr/lib64/libpython3.4m.so.1.0
#51 0x00007ffff7a52154 in ?? () from /usr/lib64/libpython3.4m.so.1.0
#52 0x00007ffff7a4df3a in PyObject_Call () from /usr/lib64/libpython3.4m.so.1.0
#53 0x00007ffff7a4f39a in ?? () from /usr/lib64/libpython3.4m.so.1.0
#54 0x00007ffff7a4df3a in PyObject_Call () from /usr/lib64/libpython3.4m.so.1.0
#55 0x00007ffff7a61041 in ?? () from /usr/lib64/libpython3.4m.so.1.0
#56 0x00007ffff7a6098d in ?? () from /usr/lib64/libpython3.4m.so.1.0
#57 0x00007ffff7a4df3a in PyObject_Call () from /usr/lib64/libpython3.4m.so.1.0
#58 0x00007ffff7a708f7 in PyEval_EvalFrameEx () from /usr/lib64/libpython3.4m.so.1.0
#59 0x00007ffff7a76839 in PyEval_EvalCodeEx () from /usr/lib64/libpython3.4m.so.1.0
#60 0x00007ffff7a70486 in PyEval_EvalFrameEx () from /usr/lib64/libpython3.4m.so.1.0
#61 0x00007ffff7a764d0 in PyEval_EvalCodeEx () from /usr/lib64/libpython3.4m.so.1.0
#62 0x00007ffff7abacbb in PyEval_EvalCode () from /usr/lib64/libpython3.4m.so.1.0
#63 0x00007ffff7ac6260 in ?? () from /usr/lib64/libpython3.4m.so.1.0
#64 0x00007ffff7ac6ac7 in PyRun_FileExFlags () from /usr/lib64/libpython3.4m.so.1.0
#65 0x00007ffff7ac759b in PyRun_SimpleFileExFlags () from /usr/lib64/libpython3.4m.so.1.0
#66 0x00007ffff7ad3d96 in Py_Main () from /usr/lib64/libpython3.4m.so.1.0
#67 0x0000000000400c81 in main ()

Comments (24)

  1. Luca Beltrame reporter

    Strangely enough, that error from R doesn't pop up if I import the package straight from R itself. From my investigation, the environment supplied in importr() includes bits that should not be there (org.Hs.egPFAM and PROSITE).

  2. Laurent Gautier

    Strangely enough, that error from R doesn't pop up if I import the package straight from R itself. From my investigation, the environment supplied in importr() includes bits that should not be there (org.Hs.egPFAM and PROSITE).

    The import through rpy2 is using the R import mechanism from R (require(<package>) or suppressMessages(require(<package>)) depending on the acceptable verbosity level).

    Would the following work ?

    base = importr('base')
    base.require('org.Hs.eg.db')
    

    I am also noting that an error message is printed before the segfault (see below) and that you are saying that PFAM and PROSITE and included somewhere while they should not.

    Error: org.Hs.egPFAM is defunct. Please use select() if you need access to
      PFAM or PROSITE accessions.
    
  3. Luca Beltrame reporter

    That error is not printed from R, though. As far as I can understand, this occurs because importr loads parts of the environment that library() does not (the environment passed to InstalledSTPackage).

    In fact, using require() as you suggested does not trigger the crash.

    I also further investigated the matter, and by debugging I noticed that any access to the environment of the package with the keys that cause the error will cause a crash.

    EDIT: Removed bits because further tests showed it wasn't accurate.

    Testing this in pdb (pdb.runcall(importr, "org.Hs.eg.db") and breaking around line 403):

    (Pdb) env.keys()
    ('.__NAMESPACE__.', '.__S3MethodsTable__.', '.onLoad', '.onUnload', '.packageName', 
    'datacache', 'org.Hs.eg', 'org.Hs.eg_dbconn', 'org.Hs.eg_dbfile', 'org.Hs.eg_dbInfo',
    'org.Hs.eg_dbschema', 'org.Hs.eg.db', 'org.Hs.egACCNUM', 
    'org.Hs.egACCNUM2EG', 'org.Hs.egALIAS2EG', 'org.Hs.egCHR',
    'org.Hs.egCHRLENGTHS', 'org.Hs.egCHRLOC', 'org.Hs.egCHRLOCEND', 
    'org.Hs.egENSEMBL', 'org.Hs.egENSEMBL2EG', 'org.Hs.egENSEMBLPROT',
    'org.Hs.egENSEMBLPROT2EG', 'org.Hs.egENSEMBLTRANS', 
    'org.Hs.egENSEMBLTRANS2EG', 'org.Hs.egENZYME', 'org.Hs.egENZYME2EG',
    'org.Hs.egGENENAME', 'org.Hs.egGO', 'org.Hs.egGO2ALLEGS', 'org.Hs.egGO2EG', 
    'org.Hs.egMAP', 'org.Hs.egMAP2EG', 'org.Hs.egMAPCOUNTS', 'org.Hs.egOMIM',
    'org.Hs.egOMIM2EG', 'org.Hs.egORGANISM', 'org.Hs.egPATH', 'org.Hs.egPATH2EG', 
    'org.Hs.egPFAM', 'org.Hs.egPMID', 'org.Hs.egPMID2EG', 'org.Hs.egPROSITE', 
    'org.Hs.egREFSEQ', 'org.Hs.egREFSEQ2EG', 'org.Hs.egSYMBOL', 
    'org.Hs.egSYMBOL2EG', 'org.Hs.egUCSCKG', 'org.Hs.egUNIGENE', 
    'org.Hs.egUNIGENE2EG', 'org.Hs.egUNIPROT')
    

    (note that there are the items that cause the error, org.Hs.egPROSITE and org.Hs.egPFAM)

    And once there's access to the environment:

    (Pdb) env["org.Hs.egPFAM"]
    Errore: org.Hs.egPFAM is defunct. Please use select() if you need access to
      PFAM or PROSITE accessions.
    
    *** stack smashing detected ***: /usr/bin/python3 terminated
    ======= Backtrace: =========
    /lib64/libc.so.6(+0x7283f)[0x7f38062dc83f]
    /lib64/libc.so.6(__fortify_fail+0x37)[0x7f380635ec37]
    /lib64/libc.so.6(__fortify_fail+0x0)[0x7f380635ec00]
    /usr/lib64/R/lib/libR.so(+0x1000f1)[0x7f37ff3ac0f1]
    [0x7fff0c081b20]
    
  4. Laurent Gautier

    Yes.

    I am was tracing it and the error occurs when trying to be wrap org.Hs.egPFAM (present in the R package's namespace) in a rpy2.robjects proxy.

    What is happening is that trying to obtain the value associated with a symbol otherwise present in an environment (here the symbol is org.Hs.egPFAM) triggers an error. In other words, we have associative array/dictionnary/hashtable where querying an existing key triggers an error. You can't make up that stuff. Only with R :/

  5. Laurent Gautier

    I had less time to look at it than initially thought.

    The (temporary) fix might be along the lines of:

    from rpy2.rinterface import RRuntimeError
    base = importr('base')
    
    try:                                     
        value = base.get("org.Hs.egPFAM", "package:org.Hs.eg.db")
        # do more things
    except RRuntimeError:
        # skip that entry in the namespace
        pass
    

    A more definitive fix will be to prevent a segfault from occurring when trying to retrieve an entry from a namespace (that's in the C code for rinterface).

  6. Laurent Gautier

    Just a quick update.

    The method REnvironment.__getitem__ is annoying me quite a bit on this, as I want to prevent segfaults from happening in any situation (as much as this is possible).

    What is happening is that the function findVarInFrame() in R's C-API is crashing (call to EnvironmentSexp_subscript in the backtrace you provided) , and I cannot reproduce this with say Rcpp: there is something going on between Python-rpy2-R that might be specific to it. May be an ancient and cryptic issue in rpy2, or may be an issue with the API for embedding R.

  7. Luca Beltrame reporter

    Hello Laurent,

    sorry for the lack of feedback (I was on holiday). I'll be sure to check the change. I'll patch my local rpy2 installation and see how it goes.

    EDIT: unfortunately it does not seem to fix the issue. I still get the crash. Should I however also apply f9faf287bb52? EDIT2: After applying both revisions on version_2.5.x branch, I can confirm things work again!

  8. Log in to comment