- changed milestone to 0.7.8
C extension causes memory leak on Windows
On Windows with 64-bit python 2.7.3, I created 2 virtualenv's with SQLAlchemy-0.7.7, one with C extension and the other without. Then, I started a pyramid pserve process from each virtualenv, and send a request every 1 second to each process.
The one without C extension has memory usage that hovers around 100~110 MB. The one with C extension has ever-increasing memory usage, currently at 256 MB after being up for 50 minutes. In both settings, len(gc.get_objects())
hovers around 190,000~210,000.
The test was done with the mssql+pyodbc driver. By the way, I don't encounter such memory problem with the C extension when using Linux.
Comments (22)
-
repo owner -
reporter Well, I wasn't really looking for a fix, since the pure python package works just fine :)
Anyway, here's a test case I came up with:
from sqlalchemy.ext.declarative import declarative_base from sqlalchemy.engine import create_engine from sqlalchemy.orm import scoped_session, sessionmaker from sqlalchemy.schema import Column from sqlalchemy.types import Integer, Unicode url = 'mssql+pyodbc://username:password@server/database?driver=SQL+Server+Native+Client+11.0' engine = create_engine(url) Session = scoped_session(sessionmaker(autocommit=False)) Session.configure(bind=engine) Base = declarative_base() Base.metadata.bind = engine class Test(Base): __tablename__ = 'tests' id = Column(Integer, primary_key=True) col = Column(Unicode(50), nullable=False) Base.metadata.create_all(checkfirst=True) test = Test(col='abc') Session.add(test) Session.commit() while True: Session.query(Test).all()
Without C extension, the memory usage stabilizes at around 16,000 K. With C extension, the memory usage is now at 29,368 K after several minutes, and keeps increasing.
-
reporter Ahh, I was wrong. The memory usage only increases when using the mssql+pyodbc driver with C extension enabled, and happens on both Windows and Linux. Other drivers are not affected.
I am using pyodbc-3.0.5
-
repo owner OK so pyodbc is suspect here, can you try stepping back through versions on that ? we use 2.1.8 in production here with no issues.
-
reporter I just run the test case against pyodbc version 2.1.8 and up
, including 2.1.8, 2.1.9, 2.1.10, 2.1.11, 3.0.3, and 3.0.5.
In all cases, whenever the C extension is enabled, memory usage will just go up. Otherwise, it will stabilize.
Older version do not work as I need this changeset: http://code.google.com/p/pyodbc/source/detail?r=d0f9733f5273dacc059f41ad9d8bd99bf774ef34
-
repo owner doesn't seem to be happening for me on OSX.
-
reporter I managed to reproduce the memory usage behavior on machine running Mac OS X 10.7.3 with Python 2.7.3, SQLAlchemy 0.7.7, pyodbc 3.0.5, and FreeTDS 0.91 (iODBC: yes). The sqlalchemy url I used is in the form of:
mssql+pyodbc://username:password@host:port/database?driver=FreeTDS&TDS_Version=8.0
-
reporter PID 87455 is using pure-python SQLAlchemy, while PID 87456 has C extensions enabled. Both started at the same time, and screen shot was captured after running for about 10 minutes
-
repo owner yeah I'd probably next try a vmware linux image....but unfortunately even if i get that far, seems like this would require using valgrind or something to really trace out it which is beyond my expertise....
-
repo owner a few things you could try though. 1. what happens if you dont use the Unicode type in your example, so that the unicode processor is not used ? 2. what happens if you try running only with the result proxy, or processors c module, that is, you alternatively delete "cprocessors.so", and "cresultproxy.so" ? you can delete either one independently of the other (I'm pretty sure) If we could at least figure out which of the two .c modules is involved.
-
reporter Good suggestions.
-
If I change the test script to do:
Session.query(Test.id).all()
then there is no leak.
Both of these:
Session.query(Test).all() Session.query(Test.col).all()
causes memory leak.
- If I remove just cresultproxy.so, then there is no leak. Removing just cprocessors.so doesn't make a difference.
1 and 2 are run independently.
-
-
repo owner here's a more specific series of tests. can you try each of test_one, test_two, test_three, test_four and tell me which ones leak please:
from sqlalchemy import * t = Table('test', MetaData(), Column('id', Integer), Column('strval', String(30)), Column('unival', Unicode(30)) ) def test_one(conn): result = conn.execute("select strval from test") result.fetchall() result.close() def test_two(conn): result = conn.execute("select unival from test") result.fetchall() result.close() def test_three(conn): result = conn.execute(select([t.c.strval](t.c.strval))) result.fetchall() result.close() def test_four(conn): result = conn.execute(select([t.c.unival](t.c.unival))) result.fetchall() result.close() url = 'mssql+pyodbc://username:password@server/database?driver=SQL+Server+Native+Client+11.0' engine = create_engine(url) with engine.begin() as conn: t.create(conn, checkfirst=True) # avoiding myriad pyodbc issues with nvarchar here by not # using binds... conn.execute("""insert into test (id, strval, unival) values (1, 'strvalue', 'univalue')""") while True: test_one(conn) # try each of these ... #while True: # test_two(conn) #while True: # test_three(conn) #while True: # test_four(conn)
-
Account Deleted I've got the same problem as sayap, running on Windows using sqlalchemy 0.7.6 with pyodbc, using sql server express 2008 r2. I'm not sure offhand what version of pyodbc we're running, but our application leaks heavily. If I run the test case provided by sayab it leaks, but running the four test cases provided by zzzeek I experience no leak. Test 3 and 4 uses slightly more memory, but it stabilizes at about 500K higher than test 1 and 2. If there's any other tests you'd like run, I'll happily run them.
-
repo owner sure. Take Sayap's script and try these various things at the bottom:
while True: Session.execute(Test.__table__.select()).fetchall() while True: Session.execute(select([Test.__table__.c.col](Test.__table__.c.col))).fetchall() while True: stmt = select([Test.__table__.c.col](Test.__table__.c.col)) Session.query(Test.col).instances(Session.execute(stmt))
this one might be it:
while True: stmt = select([Test.__table__.c.col](Test.__table__.c.col)) for row in Session.execute(stmt): row[Test.__table__.c.col](Test.__table__.c.col)
if it's that last one, then go to the other test and try:
def test_five(conn): result = conn.execute(select([t.c.unival](t.c.unival))) for row in result.fetchall(): row[t.c.unival](t.c.unival) result.close()
-
Account Deleted It's the last one, definitely leaking there.
-
repo owner ok, i can reproduce, thanks.
-
repo owner current thinking, which I'm trying to get an expert to ask about. Here is Pyodbcs' tuple getitem (this might not be the latest, but this is the one im running):
static PyObject * Row_item(Row* self, Py_ssize_t i) { // Apparently, negative indexes are handled by magic ;) -- they never make it here. if (i < 0 || i >= self->cValues) { PyErr_SetString(PyExc_IndexError, "tuple index out of range"); return NULL; } Py_INCREF(self->apValues[i](i)); return self->apValues[i](i); }
note the Py_INCREF. Its a new reference. But we aren't decreasing this reference count, the code is assuming we used PyTuple_GetItem on line 322, whereas if we use PySequence_GetItem, its a new reference. OK this might be it.
-
repo owner here's a patch, can we all test this please ?
diff -r cc947a4da70c906ac1820db6ccdd7f0faa8a2ced lib/sqlalchemy/cextension/resultproxy.c --- a/lib/sqlalchemy/cextension/resultproxy.c Sat May 26 13:21:05 2012 -0400 +++ b/lib/sqlalchemy/cextension/resultproxy.c Fri Jun 01 12:23:34 2012 -0400 @@ -247,7 +247,8 @@ char *cstr_key; long index; int key_fallback = 0; - + int tuple_check = 0; + if (PyInt_CheckExact(key)) { index = PyInt_AS_LONG(key); } else if (PyLong_CheckExact(key)) { @@ -319,17 +320,27 @@ return NULL; row = self->row; - if (PyTuple_CheckExact(row)) + if (PyTuple_CheckExact(row)) { value = PyTuple_GetItem(row, index); - else + tuple_check = 1; + } + else { value = PySequence_GetItem(row, index); + tuple_check = 0; + } + if (value == NULL) return NULL; if (processor != Py_None) { + if (!tuple_check) { + Py_XDECREF(value); + } return PyObject_CallFunctionObjArgs(processor, value, NULL); } else { - Py_INCREF(value); + if (tuple_check) { + Py_XINCREF(value); + } return value; } }
-
repo owner - changed status to resolved
I'm assuming that's it, the fix is in 0.8 and 0.7.8:
c6d599f99344c840db5cc5422537310b83981fda cd676cd8dfbddf1440849f96fa0cdabd4fd72497
reopen if you're still seeing problems thanks !
-
reporter Tested on dev, works great. Thank you guys!
Will deploy to production tomorrow and reconfirm. Last time, the IIS process would crash within half a day after reaching the 2G memory limit (32-bit Windows)
-
Account Deleted Just wanted to say that this works great! Really appreciate the good work, thanks guys :)
-
repo owner - removed milestone
Removing milestone: 0.7.8 (automated comment)
- Log in to comment
can you attach some sample code to this please, while I don't know that we have resources to debug windows C compilation issues, we can't do anything without at least the hope of reproducing it.