implement server side cursors for mysql

Issue #1089 resolved
Michael Bayer
repo owner created an issue

assuming this works:

import MySQLdb
import MySQLdb.cursors #Make sure to import this seperately

#build your connection object normally but pass it a cursorclass keyword
db=MySQLdb.connect(host=HOSTNAME,user=USERNAME,passwd=PASSWORD,db=DATABASE,
            cursorclass=MySQLdb.cursors.SSCursor)

Comments (8)

  1. jek

    AFAIK mysqldb (as of 1.2.2 anyhow) doesn't support mysql server side cursors. it's SSCursor fetches rows in 'use_result' mode, which basically just directly streams out the query results on demand, holding locks on the tables for the entire duration. it's IMHO nigh useless if you have multiple database readers. mysql has real server side cursors that materialize as temporary tables and hold no locks, but mysqldb doesn't use them.

  2. Anonymous

    Passing connect_args to the mysqldb connection can accomplish this:

       from sqlalchemy import create_engine, MetaData
       import MySQLdb.cursors
       engine = create_engine('mysql://root:zenoss@localhost/e2', connect_args={'cursorclass': MySQLdb.cursors.SSCursor})
       meta = MetaData(engine, reflect=True)
       conn = engine.connect()
       rs = s.execution_options(stream_results=True).execute()
    
  3. dan mackinlay

    Hmm - generally useful depends of course upon whom one thinks of as the general user. While in a web application with multiple readers, having the results streamed at the cost of locking for multiple users is a bad trade-off. On the other hand, in the big data community where typically one reader is accessing enormous data sets, there is no other way. I wouldn't care to guess which use case is more general, but I would suggest that they are both important.

    OTOH, the mysqldb authors don't seem to think so, and this is really their problem, so I guess the real summary here is "switch to oursql". (Although the solution from Anonymous works fine in my particular big data problem - future users, consider this an endorsement of the SSCursor hack)

  4. Grant Cox

    I also endorse using SSCursor for streaming results. I'm not sure under what circumstances it "holds locks on the tables for the entire duration", but I can confirm that with

    engine = create_engine('...', connect_args={'cursorclass': MySQLdb.cursors.SSCursor})
    session = scoped_session(sessionmaker(bind=engine))
    
    stream = session.query(AllocatedUserModel.user_id) \
                    .filter(AllocatedUserModel.group_id.in_([1,2,3,4])) \
                    .yield_per(10000)
    

    while the stream is being iterated over I can have concurrent processes modifying the underlying table, and concurrent streams being yielded. So for this simple kind of query, there are no locks.

    If you do find a case where the stream is locking the underlying table, then instead streaming from a query like SELECT SQL_BUFFER_RESULT ... should fix this as it will stream from a result buffer instead of the actual table.

    And finally with regards to oursql, unfortunately in my tests using oursql uses no less memory than a mysqldb.SSCursor, and takes massively longer. Here are the results from my local tests iterating over a stream of 1.25 million ids, using the above ORM query:

    MySQLDB SSCursor
        Iterated over 1256916 rows in 44.70 seconds
        Memory as per Heapy: 14223680 bytes total
        Memory as per PsUtil: 37.58984375 MB
    
    OurSQL
        Iterated over 1256916 rows in 596.58 seconds
        Memory as per Heapy: 14346384 bytes total
        Memory as per PsUtil: 38.17578125 MB
    
    MySQLDB normal cursor
        Iterated over 1256916 rows in 46.25 seconds
        Memory as per Heapy: 14222456 bytes total
        Memory as per PsUtil: 158.1875 MB
    
  5. Log in to comment