ResourceClosedError can sometimes be raised instead of more appropriate OperationalError

Issue #2536 resolved

Johannes Erdfelt created an issue 2012-07-19

In some cases when code is poorly written and triggers a deadlock in the database, a confusing ResourceClosedError can be raised instead of an OperationalError which explains the deadlock.

The attached script reproduces the problem. When run with the argument 'bad', it will result in this exception:

ResourceClosedError: This result object does not return rows. It has been closed automatically.

if run without any arguments, it results in this exception:

OperationalError: (OperationalError) (1213, 'Deadlock found when trying to get lock; try restarting transaction') 'SELECT b.id AS b_id, b.value AS b_value \nFROM b \nWHERE b.id = %s \n LIMIT %s FOR UPDATE' (1, 1)

The latter message is much more useful than the former, which left me scratching my head and spending a day trying to figure out what was happening.

The only difference between the two is that the OperationalError case uses filter_by(id=1) whereas the ResourceClosedError case uses filter(B.value.in_(1)). It's not clear to me yet why using the in_ filter causes a different exception to be raised.

The little amount I was able to figure out before I had to move on made it look like the OperationalError was getting supressed somehow, the connection was closed and then the subsequent call to first() fails since the connection is closed (resulting in the ResourceClosedError exception).

deadlock_sqlalchemy.py

Comments (9)

Johannes Erdfelt reporter
- attached deadlock_sqlalchemy.py
Example script to reproduce bug
- 2012-07-19T11:52:27+00:00
Johannes Erdfelt reporter
I should also mention that I tested this with SQLAlchemy 0.7.8. I've also reproduced this with 0.7.6, 0.7.2, but a 0.6 release gave a different unrelated exception (AttributeError: 'NoneType' object has no attribute '_keymap')
- 2012-07-19T11:54:50+00:00

Mike Bayer repo owner

I have a lot of points to make here. The main one is that this is definitely a MySQLdb bug, however I will prepare a test case to prove that as well.

First off, let's fix your test case, which as is, does not reproduce the error for several reasons. One is that it doesn't insert any data into the tables, so doesn't actually produce any deadlock since no rows are selected. Another is that using threads and timings to reproduce a deadlock isn't very reliable. The deadlock is in the database among two simultaneous transactions, so it's a lot easier just to work with the two transactions in the main thread and interleave them as necessary. Finally, let's also reduce that timeout on the deadlock so we can see the effect immediately:

#!/usr/bin/env python
from sqlalchemy import create_engine, Column, Integer
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from sqlalchemy import event

Base = declarative_base()
class A(Base):
    __tablename__ = 'a'
    id = Column(Integer, primary_key=True)
    value = Column(Integer)
    __table_args__ = {'mysql_engine': 'innodb'}

class B(Base):
    __tablename__ = 'b'
    id = Column(Integer, primary_key=True)
    value = Column(Integer)
    __table_args__ = {'mysql_engine': 'innodb'}

engine = create_engine('mysql://scott:tiger@localhost/test', echo=True)

@event.listens_for(engine, "connect")
def connect(conn, conn_record):
    cursor = conn.cursor()
    cursor.execute("set innodb_lock_wait_timeout=1")

Base.metadata.drop_all(engine)
Base.metadata.create_all(engine)

Session = sessionmaker(bind=engine, autocommit=True)

s0 = Session()
with s0.begin():
    s0.add_all([B(id=1), B(id=2)](A(id=1),))

s1 = Session()
s2 = Session()

s1.begin()
s2.begin()
s1.query(A).\
        filter_by(id=1).\
        with_lockmode('update').\
        first()
s2.query(B).\
        filter_by(id=1).\
        with_lockmode('update').\
        first()

bad = False
if bad:
    s1.query(B).\
            filter(B.value.in_([1](1))).\
            with_lockmode('update').\
            first()
else:
    s1.query(B).\
            filter(B.id == 1).\
            with_lockmode('update').\
            first()

s2.query(A).\
        filter_by(id=1).\
        with_lockmode('update').\
        first()

Next, let's consider why I could tell just from the stack trace that this is in MySQLdb. Basically, if MySQLdb raised OperationalError, SQLAlchemy would send it right out, and in the case of the "IN", it's clearly not. Instead, the statement seems to be completing, and the cursor is just missing cursor.description, which is the sign that the statement is not one that returns rows, so that's the error we get. We can only report errors to the degree that the DBAPI lets us.

Just to make sure, before I build a pure MySQLdb test case that you can report on their tracker for me, I made one simple change to the script:

engine = create_engine('mysql+oursql://scott:tiger@localhost/test', echo=True)

Using OurSQL fixed the issue - OurSQL raises a ProgrammingError instead of an OperationalError, which is an unfortunate effect of how vaguely the DBAPI specification was written, however you get the exact same traceback whether or not the "bad" flag is turned on :

bad =True

Traceback (most recent call last):
  File "test.py", line 54, in <module>
    with_lockmode('update').\
  File "/Users/classic/dev/sqlalchemy/lib/sqlalchemy/orm/query.py", line 2171, in first
    ret = list(self[0:1](0:1))
  File "/Users/classic/dev/sqlalchemy/lib/sqlalchemy/orm/query.py", line 2038, in __getitem__
    return list(res)
  File "/Users/classic/dev/sqlalchemy/lib/sqlalchemy/orm/query.py", line 2242, in __iter__
    return self._execute_and_instances(context)
  File "/Users/classic/dev/sqlalchemy/lib/sqlalchemy/orm/query.py", line 2257, in _execute_and_instances
    result = conn.execute(querycontext.statement, self._params)
  File "/Users/classic/dev/sqlalchemy/lib/sqlalchemy/engine/base.py", line 1476, in execute
    params)
  File "/Users/classic/dev/sqlalchemy/lib/sqlalchemy/engine/base.py", line 1627, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/Users/classic/dev/sqlalchemy/lib/sqlalchemy/engine/base.py", line 1756, in _execute_context
    context)
  File "/Users/classic/dev/sqlalchemy/lib/sqlalchemy/engine/base.py", line 1749, in _execute_context
    context)
  File "/Users/classic/dev/sqlalchemy/lib/sqlalchemy/dialects/mysql/oursql.py", line 96, in do_execute
    cursor.execute(statement, parameters)
  File "cursor.pyx", line 122, in oursql.Cursor.execute (oursqlx/oursql.c:15820)
  File "statement.pyx", line 400, in oursql._Statement.execute (oursqlx/oursql.c:10255)
  File "statement.pyx", line 126, in oursql._Statement._raise_error (oursqlx/oursql.c:7425)
sqlalchemy.exc.ProgrammingError: (ProgrammingError) (1205, 'Lock wait timeout exceeded; try restarting transaction', None) u'SELECT b.id AS b_id, b.value AS b_value \nFROM b \nWHERE b.value IN (?) \n LIMIT ? FOR UPDATE' (1, 1)

bad = False

Traceback (most recent call last):
  File "test.py", line 59, in <module>
    with_lockmode('update').\
  File "/Users/classic/dev/sqlalchemy/lib/sqlalchemy/orm/query.py", line 2171, in first
    ret = list(self[0:1](0:1))
  File "/Users/classic/dev/sqlalchemy/lib/sqlalchemy/orm/query.py", line 2038, in __getitem__
    return list(res)
  File "/Users/classic/dev/sqlalchemy/lib/sqlalchemy/orm/query.py", line 2242, in __iter__
    return self._execute_and_instances(context)
  File "/Users/classic/dev/sqlalchemy/lib/sqlalchemy/orm/query.py", line 2257, in _execute_and_instances
    result = conn.execute(querycontext.statement, self._params)
  File "/Users/classic/dev/sqlalchemy/lib/sqlalchemy/engine/base.py", line 1476, in execute
    params)
  File "/Users/classic/dev/sqlalchemy/lib/sqlalchemy/engine/base.py", line 1627, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/Users/classic/dev/sqlalchemy/lib/sqlalchemy/engine/base.py", line 1756, in _execute_context
    context)
  File "/Users/classic/dev/sqlalchemy/lib/sqlalchemy/engine/base.py", line 1749, in _execute_context
    context)
  File "/Users/classic/dev/sqlalchemy/lib/sqlalchemy/dialects/mysql/oursql.py", line 96, in do_execute
    cursor.execute(statement, parameters)
  File "cursor.pyx", line 122, in oursql.Cursor.execute (oursqlx/oursql.c:15820)
  File "statement.pyx", line 400, in oursql._Statement.execute (oursqlx/oursql.c:10255)
  File "statement.pyx", line 126, in oursql._Statement._raise_error (oursqlx/oursql.c:7425)
sqlalchemy.exc.ProgrammingError: (ProgrammingError) (1205, 'Lock wait timeout exceeded; try restarting transaction', None) u'SELECT b.id AS b_id, b.value AS b_value \nFROM b \nWHERE b.id = ? \n LIMIT ? FOR UPDATE' (1, 1)

so thanks for the effort on this, I'll have for you a MySQLdb test case shortly.

2012-07-19T13:19:02+00:00

Johannes Erdfelt reporter
Sorry about missing the data to be inserted in the test case. Thankfully that was trivial.

I wrote a test case using MySQLdb as well and both raised OperationalError for me, which is why I reported the bug here. It's quite possible that I'm using MySQLdb differently than SQLAlchemy does which results in different failures.

I'm curious to see your test case and figure out why it's different than my test case.
- 2012-07-19T13:37:03+00:00

Mike Bayer repo owner

When you deal with MySQLdb, you can't assume anything - it uses regular expressions on statements to figure things out sometimes, so even slight differences can conceal the behavior. I made this test case by duplicating the exact series of steps which SQLAlchemy sends to it; then simplifying each detail one line at a time, such as removing the LIMIT phrases, removing the usage of bound parameters, etc., testing each time. This test can be shortened further by removing the fetchall()/close() calls but I figure it's more complete by using a fresh cursor each time:

import MySQLdb

c1 = MySQLdb.connect(user="scott", passwd="tiger", db="test")

cursor = c1.cursor()
cursor.execute("drop table if exists a")
cursor.execute("drop table if exists b")

cursor.execute("""
 create table a (
   id int not null auto_increment, primary key (id)) engine=InnoDB;""")
cursor.execute("""
 create table b (
   id int not null auto_increment,
   value int, primary key (id)) engine=InnoDB;""")

cursor.execute("insert into a (id) values (1)")
cursor.execute("insert into b (id, value) values (1, null)")
cursor.close()
c1.commit()

curs1 = c1.cursor()
curs1.execute("set innodb_lock_wait_timeout=1")
curs1.close()

curs1 = c1.cursor()
curs1.execute("SELECT * FROM a WHERE a.id = 1 FOR UPDATE")
assert curs1.description
curs1.fetchall()
curs1.close()

c2 = MySQLdb.connect(user="scott", passwd="tiger", db="test")
curs2 = c2.cursor()
curs2.execute("set innodb_lock_wait_timeout=1")
curs2.close()

curs2 = c2.cursor()
curs2.execute("SELECT * FROM b WHERE b.id = 1 FOR UPDATE")
assert curs2.description
curs2.fetchall()
curs2.close()

def run_test(stmt):
    curs1 = c1.cursor()
    try:
        curs1.execute(stmt)
        print "no operational error raised !"
        assert curs1.description, \
            "cursor returned from SELECT with no description!"
    except MySQLdb.OperationalError:
        print "operational error raised correctly."


def bad():
    """FAILS: the execute() call completes, and then
    cursor.description is None"""
    run_test("SELECT * FROM b WHERE b.value = 1 FOR UPDATE")

def good():
    """PASSES: the execute() call correctly times out and raises
    OperationalError"""
    run_test("SELECT * FROM b WHERE b.id = 1 FOR UPDATE")

bad()
#good()

2012-07-19T14:06:17+00:00

Johannes Erdfelt reporter
I have to say, your test cases has me confused for a bit.

Your second test doesn't need table a anymore. It's entirely reproducible without it.

I was initially confused why you used a different error (lock timeout versus deadlock) especially since it's not any faster as originally described (both wait 1 second). But when removing the a table, it results in a simpler test case.

As an aside, setting innodb_lock_wait_timeout at runtime isn't reliable (which apparently is only possible when using a plugin and not built-in to the MySQL daemon). I had to manually change my.cnf and restart mysqld to run your test case.

I was also confused why your test case does a select on a non-indexed column instead of using the in operator like my original test case, but it appears to be similar enough in that it appears to force a table scan and an attempt to lock the same row already locked in the other connection.

If I change the bad test case to use this query:
```
run_test("SELECT * FROM b WHERE b.id in (1, 2) FOR UPDATE")
```
then it fails the same way as your example (but oddly, raises an exception correctly if it's just "WHERE b.id in (1) FOR UPDATE").

Anyway, it appears that your test case reproduces the same bug in MySQLdb since looking at the socket data shows a Lock wait timeout exceeded error that doesn't get raised.

Thanks for the help. I can followup with the MySQLdb project if you don't have plans to yourself.
- 2012-07-19T14:55:54+00:00
Mike Bayer repo owner
yes please report it to MySQLdb, and I guess I came at the problem from a different perspective. if you look at the echoed SQL in the original SQLAlchemy test case, the IN operator isn't even used - the x.in_(y) is reduced to a simple equality comparison. The "b.value" column is also what you were comparing in your original test, which appears to be necessary to produce this condition.

My MySQL is built via homebrew and didn't need any special configuration to use the innodb_lock_wait_timeout variable.
- 2012-07-19T15:07:16+00:00
Johannes Erdfelt reporter
Thanks again for the help, I tracked down the bug (after spending some time figuring out why MySQL can report errors in two different ways) and opened up a bug report, along with patch, for MySQLdb:

https://sourceforge.net/tracker/?func=detail&aid=3546166&group_id=22307&atid=374932

This SQLAlchemy bug report is invalid.
- 2012-07-19T21:19:16+00:00
Mike Bayer repo owner
- changed status to invalid
looks great, and I've added myself as a monitor on the issue. I sincerely hope they respond, though looking at the level of inactivity on MySQLdbs tracker has me concerned.
- 2012-07-19T23:46:30+00:00
Log in to comment

Assignee: –

Type: bug

Priority: major

Status: resolved

Component: –

Milestone: –

Votes: 0

Watchers: 0