Memory leak when pickling R rpy2 objects

Issue #432 resolved
Javier Lopez
created an issue

[Cross posted from https://github.com/joblib/joblib/issues/557]

EDIT: Changed description and example after further digging

I have observed a memory leak when pickling R objects created in python through rpy2. Here is some code that will reproduce the leak (WARNING: Running this might eat up your RAM):

import pickle
import tempfile
import rpy2.robjects as R

def test_R_object_dump(N=1000):
    """Test creating big R matrices in a loop and dumping them to a temp file"""
    for i in range(N):
        print(i, end=", ")
        with tempfile.TemporaryFile() as f:
            a = R.r.matrix(1., 5000, 5000)
            pickle.dump(a, f)

I've observed the leak in Linux (Ubuntu 16.04), Mac OSX seems to not have the issue.

Upon further digging, I believe this might be related to Issue #321 but on the pickling (rather than unpickling) side of things.

Comments (18)

  1. Javier Lopez reporter

    Following a suggestion from lsteve in the joblib issue tracker, I can confirm the memory leak also happens when using python's standard pickle.dump instead of the joblib one.

  2. Javier Lopez reporter

    Rewrote the description and example upon further inspection on the issue.

    By similarity with issue #321 I suspect it might be solved by upping the UNPROTECT argument on Sexp___getstate__ in sexp.c, but I am not familiar enough with the innards of rpy2 C code to know if this is a reasonable thing to do.

  3. Laurent Gautier

    I don't think that this is will fix it in a way you like. The PROTECT / UNPROTECT calls appear balanced as they are; increasing the count in UNPROTECT() will likely be followed by a segfault shortly after. I am happy to be proven wrong though.

    The fact that you report that the problem is not present on OS X suggests that a possible cause would be: - conditional OS X / Linux specific code in rpy2

    • differences in R's memory management between Linux and OS X
    • differences in Python's memory management between Linux and OS X
    • differences in compiler, OS memory management
  4. Javier Lopez reporter

    Thanks for your reply!

    If I replace the constructor to create a numpy matrix rather than an R matrix there is no leak, so I don't think python or OS memory management are the issue here. Also, if I replace the python pickle call by an R.r.saveRDS (with the appropriate changes regarding the file) there is no leak, which makes me believe the pickling mechanism from rpy2 would be the cause.

    As I mentioned, I am not very familiar with rpy2 source, any suggestions on what else I can do to help get to the bottom of the issue?

  5. Laurent Gautier

    If I replace the constructor to create a numpy matrix rather than an R matrix there is no leak, so I don't think python or OS memory management are the issue here. Also, if I replace the python pickle call by an R.r.saveRDS (with the appropriate changes regarding the file) there is no leak, which makes me believe the pickling mechanism from rpy2 would be the cause.

    numpy and rpy2 are likely using different strategies to protect their objects from garbage collection (rpy2 is using both Python and R own mechanisms, themselves relying on OS-level primitive. There is a fair chance that rpy2's pickling is partly, or fully, the cause of the problem but the fact that you do not observe the problem on OS X with the exact same code made me consider this an option.

    As I mentioned, I am not very familiar with rpy2 source, any suggestions on what else I can do to help get to the bottom of the issue?

    Unfortunately memory management issues in ryp2 are not the funniest problems to tackle (because jumping between Python and R's memory management systems, and because some of the code in rpy2 would likely need a refresh). Try running it using valgrind and to see if you can find clues. You can also try building/installing rpy2 with debug flags on (they are mentioned in comments insetup.py).

  6. Laurent Gautier

    An other suggestion: did you try calling the garbage collectors at each iteration ?

    # Python's collector
    import gc
    gc.collect()
    
    # R's collector
    from rpy2.robjects.packages import importr
    base = importr('base')
    base.gc()
    
  7. Javier Lopez reporter

    I did try using the garbage collectors (both of them) to no avail (see the linked original report in the joblib repo when I thought it was a joblib issue). Will try running under valgrind and see what comes out.

  8. Javier Lopez reporter

    Ran under valgrind, this is what comes out:

    ==88326== LEAK SUMMARY:
    ==88326==    definitely lost: 33,280 bytes in 188 blocks
    ==88326==    indirectly lost: 1,816 bytes in 36 blocks
    ==88326==      possibly lost: 17,280 bytes in 227 blocks
    ==88326==    still reachable: 40,354,274 bytes in 17,422 blocks
    ==88326==                       of which reachable via heuristic:
    ==88326==                         newarray           : 7,152 bytes in 1 blocks
    ==88326==         suppressed: 1,209,370 bytes in 153 blocks
    ==88326== 
    ==88326== For counts of detected and suppressed errors, rerun with: -v
    ==88326== Use --track-origins=yes to see where uninitialised values come from
    ==88326== ERROR SUMMARY: 2389 errors from 401 contexts (suppressed: 42 from 24)
    

    I have also checked the ref counting of the object before and after the pickle, but it seems to stay at 1, so I have no idea on what is protecting the stuff from garbage collection.

  9. Javier Lopez reporter

    Incidentally, while diggin' in the source I noticed the PROTECT-UNPROTECT calls in EmbeddedR_unserialize seem unbalanced (there are two PROTECT and a call to UNPROTECT(3)). At the same time, the rpy2_unserialize method in r_utils.c has the opposite problem:

    SEXP rpy2_unserialize(SEXP connection, SEXP rho)
    {
      SEXP c_R, call_R, res, fun_R;
      PROTECT(fun_R = rpy2_findfun(install("unserialize"), rho));
      if(!isEnvironment(rho)) error("'rho' should be an environment");
      /* obscure incatation to summon R */
      PROTECT(c_R = call_R = allocList(2));
      SET_TYPEOF(c_R, LANGSXP);
      SETCAR(c_R, fun_R);
      c_R = CDR(c_R);
    
      /* first argument is a RAWSXP representation of the object to unserialize */
      SETCAR(c_R, connection);
      c_R = CDR(c_R);
    
      PROTECT(res = eval(call_R, rho));
      UNPROTECT(2);
      return res;
    }
    

    So I am wondering perhaps the fix to issue #321 was applied in the wrong file?

  10. Javier Lopez reporter

    I have reproduced the memory leak in the official rpy2/rpy2 docker container, process just crashes when it exceeds the set memory limit.

    I have been browsing through the source trying to find operating system dependant bits, without any success so far. Any pointers on where to look or how to proceed would be appreciated.

  11. Laurent Gautier

    Thanks. I have also reproduced it and started investigating. Unfortunately, I have looked where I thought it could happen and did not see anything.

    At first sight the leak is not with R objects protected from collection.

  12. Laurent Gautier

    The leak seems to disappear if Sexp___getstate__() not longer returns the Python bytes object res_string that contains the R object serialized. Obviously not suggesting this as fix, as this would make pickling no longer work properly, but this is pointing out where to look. I have to figure out how to proceed.

  13. Laurent Gautier

    I think that I got it: this is caused by a combination of rpy2 using Python's pickling protocol slightly incorrectly and the Python C-API being relatively brittle (to Python's credit there is a line of warning in the doc that it is really easy to get things wrong, and surely enough so I did).

    If I am correct about the above, a fix should appear any time now.

  14. Laurent Gautier

    The bug was definitely non-trivial to fix.

    The full fix in present from revision 1561ec889508 in the branch "default", and from revision 41a1821d992b in the branch "version_2.9.x" where it was backported. The next release to have the fix will be rpy2-2.9.1.

    I am not planning to backport to rpy2-2.8.x, but anyone wants to submit a pull request I'd review it.

  15. Javier Lopez reporter

    Awesome, thanks a lot for taking care of this! Since this introduces some API change I think it is safer to not backport it to the 2.8 branch.

    I'll pull the 2.9.x version and check that our original code which triggered the leak is behaving appropriately.

    Thanks again for all your hard work on rpy2!

  16. Log in to comment