Memory Leak 0.7.5

Issue #2427 resolved
Iuri Diniz created an issue

Hello folks,

I have a same program running forever using SQLAlchemy thought many of my servers.

On a brand new server I've detected that my program increase usage memory quickly (12 hours) until 2.0GB and the only solution for me is kill it. The same program is running on old servers without any problem. For example, it is using 254MB since 2012-02-14.

Each server uses a separated virtual-env. So I have detect that old servers running SQLAlchemy<=0.7.4 has no problem and the new server using 0.7.5 has no problem.

problematic server is using: (see full list problematic-pip-freeze.txt) MySQL-python==1.2.3 SQLAlchemy==0.7.5 Python==2.6.6

no problematic server is using (see full list unproblematic-pip-freeze.txt) MySQL-python==1.2.3 SQLAlchemy==0.7.4 Python==2.5.2

Comments (17)

  1. Iuri Diniz reporter

    First version of test_memleak.py

    0.7.4 output:

    sqlalchemy.__version__ = 0.7.4
    len(gc.get_objects())=16206
    len(gc.get_objects())=16489
    len(gc.get_objects())=16489
    len(gc.get_objects())=16489
    len(gc.get_objects())=16489
    len(gc.get_objects())=16489
    len(gc.get_objects())=16489
    len(gc.get_objects())=16489
    len(gc.get_objects())=16489
    len(gc.get_objects())=16489
    len(gc.get_objects())=16489
    

    0.7.5 output:

    sqlalchemy.__version__ = 0.7.5
    len(gc.get_objects())=16239
    len(gc.get_objects())=16557
    len(gc.get_objects())=16592
    len(gc.get_objects())=16627
    len(gc.get_objects())=16662
    len(gc.get_objects())=16697
    len(gc.get_objects())=16732
    len(gc.get_objects())=16767
    len(gc.get_objects())=16802
    len(gc.get_objects())=16837
    len(gc.get_objects())=16872
    len(gc.get_objects())=16907
    

    Is it sufficient?

  2. Iuri Diniz reporter

    binary search done:

    (sqlalquemy-hg)iuri@moeda:~/Sources/SQLALCHEMY/src$ hg bisect -g
    The first bad revision is:
    changeset:   8022:09a503e49724b61a8119f0b7855a990a29fc1521
    user:        Mike Bayer <mike_mp@zzzcomputing.com>
    date:        Sun Jan 22 14:04:20 2012 -0500
    summary:     - [bug](bug) Fixed bug whereby a table-bound Column
    
    (sqlalquemy-hg)iuri@moeda:~/Sources/SQLALCHEMY/src$
    

    done by using bisect extension:

    hg clone http://hg.sqlalchemy.org/sqlalchemy src
    cd src
    
    export PYTHONPATH=$PYTHONPATH:$PWD/build/lib.linux-x86_64-2.7/
    
    # rel_0_7_5                       8072:667249f7b19325e1512f1a9ece29c5894e679508 (BAD)
    # rel_0_7_4                       7976:5421b51a479a4c04da5175b58c8d8557ba2bc51c (GOOD)
    
    # reset
    hg bisect -r
    
    # going to a bad revision
    hg up rel_0_7_5
    
    # build and test
    rm -rf build && python setup.py build && python ../tests/test_memleak.py
    # this revision is bad
    hg bisect -b
    
    # revision 7976 is good
    hg bisect -g 7976
    
    # build and test again
    # ...
    
  3. Mike Bayer repo owner

    yes, the bug is in the C extensions. Delete the .so files for the moment within sqlalchemy/ to alleviate your leak.

  4. Mike Bayer repo owner

    so this seems like a basic reference counting bug, related to the tuple created in resultproxy. If I apply this patch to r0822:

    diff -r 09a503e49724b61a8119f0b7855a990a29fc1521 lib/sqlalchemy/engine/base.py
    --- a/lib/sqlalchemy/engine/base.py Sun Jan 22 14:04:20 2012 -0500
    +++ b/lib/sqlalchemy/engine/base.py Sat Mar 10 14:38:50 2012 -0800
    @@ -2596,7 +2596,7 @@
                 processor = type_._cached_result_processor(dialect, coltype)
    
                 processors.append(processor)
    -            rec = (processor, obj, i)
    +            rec = (processor, None, i)
    
                 # indexes as keys. This is only needed for the Python version of
                 # RowProxy (the C version uses a faster path for integer indexes).
    @@ -2608,7 +2608,7 @@
                     # columns colliding by name is not a problem as long as the
                     # user does not try to access them (ie use an index directly,
                     # or the more precise ColumnElement)
    -                keymap[name.lower()](name.lower()) = (processor, obj, None)
    +                keymap[name.lower()](name.lower()) = (processor, None, None)
    
                 if dialect.requires_name_normalize:
                     colname = dialect.normalize_name(colname)
    

    then the bug is gone. So I need some help understanding reference counters here - the C ext doesn't refer to this second element of the tuple at all. I tried adding Py_DECREF(record) in there but that just breaks things.

    I am able to fix it by pulling out that second element from the "record" and calling Py_DECREF on it (see below). But this seems like the wrong place to do this - it's not clear to me what activity causes the reference increment on this object - is it the PyDict_GetItem that increments the reference on it (seems very unlikely)? Is it because we have a reference to "keymap"? Seems weird we'd need to access references that we otherwise haven't accessed at all.

    diff -r 09a503e49724b61a8119f0b7855a990a29fc1521 lib/sqlalchemy/cextension/resultproxy.c
    --- a/lib/sqlalchemy/cextension/resultproxy.c   Sun Jan 22 14:04:20 2012 -0500
    +++ b/lib/sqlalchemy/cextension/resultproxy.c   Sat Mar 10 14:43:12 2012 -0800
    @@ -242,7 +242,7 @@
     {
         PyObject *processors, *values;
         PyObject *processor, *value;
    -    PyObject *row, *record, *result, *indexobject;
    +    PyObject *row, *record, *result, *indexobject, *tokenobject;
         PyObject *exc_module, *exception;
         char *cstr_key;
         long index;
    @@ -278,7 +278,11 @@
                     return NULL;
             }
    
    +        tokenobject = PyTuple_GetItem(record, 1);
    +        Py_DECREF(tokenobject);
    +
             indexobject = PyTuple_GetItem(record, 2);
    +
             if (indexobject == NULL)
                 return NULL;
    
    @@ -302,7 +306,6 @@
                         "try 'use_labels' option on select statement.", cstr_key);
                 return NULL;
             }
    -
             index = PyInt_AsLong(indexobject);
             if ((index == -1) && PyErr_Occurred())
                 /* -1 can be either the actual value, or an error flag. */
    
  5. Former user Account Deleted

    Replying to zzzeek:

    Replying to kentbower:

    Replying to zzzeek:

    pjenvey and I got it, fixed in 5448f6129cd0487c3d06324385cc2ef0701b5815. Thanks for the effort reporting this !

    Just double checking: 5448f6129cd0487c3d06324385cc2ef0701b5815 contains the entire fix?

    that sounds ominous, what part might be missing ?

    No, no...I haven't looked; I was just going to apply that patch but wanted to make sure it was self contained, but your reply is as good as a "yes"

  6. Log in to comment