implement server side cursors for mysql

Mike Bayer reporter

assigned issue to

2008-06-27T10:01:24+00:00

jek

marked as minor

AFAIK mysqldb (as of 1.2.2 anyhow) doesn't support mysql server side cursors. it's SSCursor fetches rows in 'use_result' mode, which basically just directly streams out the query results on demand, holding locks on the tables for the entire duration. it's IMHO nigh useless if you have multiple database readers. mysql has real server side cursors that materialize as temporary tables and hold no locks, but mysqldb doesn't use them.

2008-06-27T12:04:44+00:00

Mike Bayer reporter

changed milestone to blue sky

thanks for the info. I guess we'll have to wait for MySQLDB to improve in this regard.

2008-06-27T13:08:55+00:00

Mike Bayer reporter

changed status to wontfix

oursql does this so that has this covered.

2010-03-27T17:30:06+00:00

Former user Account Deleted

Passing connect_args to the mysqldb connection can accomplish this:

   from sqlalchemy import create_engine, MetaData
   import MySQLdb.cursors
   engine = create_engine('mysql://root:zenoss@localhost/e2', connect_args={'cursorclass': MySQLdb.cursors.SSCursor})
   meta = MetaData(engine, reflect=True)
   conn = engine.connect()
   rs = s.execution_options(stream_results=True).execute()

2010-09-01T13:39:32+00:00

Mike Bayer reporter

Replying to guest:

Passing connect_args to the mysqldb connection can accomplish this:

it can add the flag, but is not generally useful, as per comment:2

2010-09-01T14:06:28+00:00

dan mackinlay

Hmm - generally useful depends of course upon whom one thinks of as the general user. While in a web application with multiple readers, having the results streamed at the cost of locking for multiple users is a bad trade-off. On the other hand, in the big data community where typically one reader is accessing enormous data sets, there is no other way. I wouldn't care to guess which use case is more general, but I would suggest that they are both important.

OTOH, the mysqldb authors don't seem to think so, and this is really their problem, so I guess the real summary here is "switch to oursql". (Although the solution from Anonymous works fine in my particular big data problem - future users, consider this an endorsement of the SSCursor hack)

2015-01-24T19:06:13+00:00

Grant Cox

I also endorse using SSCursor for streaming results. I'm not sure under what circumstances it "holds locks on the tables for the entire duration", but I can confirm that with

engine = create_engine('...', connect_args={'cursorclass': MySQLdb.cursors.SSCursor})
session = scoped_session(sessionmaker(bind=engine))

stream = session.query(AllocatedUserModel.user_id) \
                .filter(AllocatedUserModel.group_id.in_([1,2,3,4])) \
                .yield_per(10000)

while the stream is being iterated over I can have concurrent processes modifying the underlying table, and concurrent streams being yielded. So for this simple kind of query, there are no locks.

If you do find a case where the stream is locking the underlying table, then instead streaming from a query like SELECT SQL_BUFFER_RESULT ... should fix this as it will stream from a result buffer instead of the actual table.

And finally with regards to oursql, unfortunately in my tests using oursql uses no less memory than a mysqldb.SSCursor, and takes massively longer. Here are the results from my local tests iterating over a stream of 1.25 million ids, using the above ORM query:

MySQLDB SSCursor
    Iterated over 1256916 rows in 44.70 seconds
    Memory as per Heapy: 14223680 bytes total
    Memory as per PsUtil: 37.58984375 MB

OurSQL
    Iterated over 1256916 rows in 596.58 seconds
    Memory as per Heapy: 14346384 bytes total
    Memory as per PsUtil: 38.17578125 MB

MySQLDB normal cursor
    Iterated over 1256916 rows in 46.25 seconds
    Memory as per Heapy: 14222456 bytes total
    Memory as per PsUtil: 158.1875 MB

2015-01-30T21:24:25+00:00

Comments (8)