Possibly memmory leak in connection.execute()

Issue #1684 resolved
Former user created an issue

Hi, I`m in struggle with memory leaks in our projects and found strange sqlalchemy behaviour which probably is some kind of memory leak. See details and example below: We are using SQL Expression Language from (not ORM features) sqlalchemy 0.5.6 + MySQL-python-1.2.3c1 (but I got same result with sqllite). See an example below and in attached file memtest.py:

import gc

import sqlalchemy; from sqlalchemy import create_engine

GC_DEBUG = 0 #GC_DEBUG = 1 #turn on output gc stats

flags = gc.DEBUG_LEAK if GC_DEBUG != 0: flags |= gc.DEBUG_STATS

gc.set_debug(flags)

#engine = create_engine('mysql://sqltest:password@127.0.0.1/sql_test?charset=utf8&use_unicode=0') engine = create_engine('sqlite:///memory') metadata = sqlalchemy.MetaData() table = sqlalchemy.Table('sql_test', metadata, sqlalchemy.Column('id', sqlalchemy.Integer), sqlalchemy.Column('name', sqlalchemy.String(50))) metadata.create_all(engine) gc.collect(2) ins = table.insert() conn = None try: conn = engine.connect() print '>>>>> 1st insert' res = conn.execute(ins, id = 1, name = '2' ) gc.collect(2) bnds = [for o in gc.garbage: if isinstance(o, sqlalchemy.sql.expression._BindParamClause)] if len(bnds) >0: print '>>>>> 1st insert', bnds print '>>>>> 2nd insert' res = conn.execute(ins, id = 3, name = '3' ) gc.collect(2) bnds = [for o in gc.garbage: if isinstance(o, sqlalchemy.sql.expression._BindParamClause)] if len(bnds) >0: print '>>>>> 2nd insert', bnds print '>>>>> 3rd insert' res = conn.execute(ins, id = 4, name = '4') gc.collect(2) bnds = [for o in gc.garbage: if isinstance(o, sqlalchemy.sql.expression._BindParamClause)] if len(bnds) >0: print '>>>>> 3rd insert', bnds finally: conn.close()

after 1st insert memory is Ok, but each next insert statement left 2 unreachable _BindParamClause objects (and several others). I test it with python 2.5.4 on win32 platform, and with python 2.5.4 on Linux (kernel version 2.6.26) on intel platform. In both cases results are same. Is it memory leak or not? And if it is leak - how we could avoid this ?

Thanks in advance, Vitaliy.

Comments (3)

  1. Mike Bayer repo owner

    your test sets gc.DEBUG_LEAK, which in turn sets gc.DEBUG_SAVEALL. Here is what DEBUG_SAVEALL does:

    When set, all unreachable objects found will be appended to garbage rather than being freed. This can be useful for debugging a leaking program.

    What the first line means is, any leftover cycles are not garbage collected by the periodic gc process. This is why your list of "binds" grows. If you turn off that flag and allow gc to function normally, nothing remains hanging around after a gc.

    The second phrase of that line is completely misleading - Python clears out leftover cycles (unless __del__ is used, which we don't) so they are not "memory leaks". Cycles are a little less than ideal in Python but for complex tasks they are often hard to avoid and not harmful, whereas we sometimes avoid breaking cycles with weakrefs specifically because weakref traversal is expensive.

    What a "leak" in Python looks like is one of the following things:

    • You have an unreachable cycle, where one or more of the objects in the cycle uses __del__.
    • A module level variable references a collection that grows. This is a leak because modules are Python objects themselves and are held as "singletons" in sys.modules. This is where 99% of all leaks occur, and they are hard to track down since the path from a module to a collection may be long and windy (like a static collection on a class, a dictionary stuck somewhere, etc.)
    • C modules can leak. I'm sure you're aware that if you turned that MySQLdb use_unicode=1 it would leak like a river due to a known issue.

    Further info on the efforts SQLAlchemy goes to prevent memory leaks can be seen in our test suite at source:sqlalchemy/trunk/test/aaa_profiling/test_memusage.py, which is run for all tested backends.

  2. Former user Account Deleted
    • changed status to open
    • removed status

    Im understanding meaning gc.DEBUG_LEAK flag, but lets me try to explain our situation: we are using sqlalchemy to store our logs in DB. For some reasons we turned off garbage collector in our application (we cannot keep in turned on). We noticed that our aplpication consumes mememory (and when we turn off our logging to db feature - grows of memory stopped. On order to find which objects remains in memeory and mot freed we set gc.DEBUG_LEAK flag. Now Im interesting in one more question (I didn`t find answer in docs): is sqlalchemy designed to work with gc turned on only or not (in other words: Do the sqlalchemy free unsued object itself if gc turned off or not) ?

  3. Mike Bayer repo owner

    I've never heard of any Python library that specifically advertised that it is fully tested as generating no cycles whatsoever without GC turned on. We don't support that use case, no. Closing this ticket (again).

  4. Log in to comment