New recipe -- solution to select through large number of rows for SQLITE

Issue #2826 resolved
Former user created an issue

Sqlite does not support the window/range function described in this recipe

http://www.sqlalchemy.org/trac/wiki/UsageRecipes/WindowedRangeQuery

We have developed a solution along the same lines using LIMIT/OFFSET, and wanted to share this back with the community.

def _yield_limit(qry, pk, maxrq=1000):
    r"""specialized windowed query generator (using LIMIT/OFFSET)

    This recipe is to select through a large number of rows thats too
    large to fetch at once. The technique depends on the primary key
    of the FROM clause being an integer value, and selects items
    surrounded with LIMIT/OFFSET"""

    key = pk.property.key
    nextid = qry.session.query(pk).order_by(key).\
                limit(1).scalar()

    if nextid is None:
        return

    while 1:
        count = 0
        for rec in qry.filter(pk >= nextid).limit(maxrq):
            # Retrieve the key value before yielding as the
            # caller could delete the object.
            nextid = rec.__getattribute__(key) + 1
            count += 1
            yield rec

        # If we received less than the number of items we requested
        # we have reached the end.

        if count != maxrq:
            return

Comments (10)

  1. Mike Bayer repo owner

    this is pretty trivial and here's how to make sure it's ordering correctly:

    def _yield_limit(qry, pk_attr, maxrq=1000):
        r"""specialized windowed query generator (using LIMIT/OFFSET)
    
        This recipe is to select through a large number of rows thats too
        large to fetch at once. The technique depends on the primary key
        of the FROM clause being an integer value, and selects items
        surrounded with LIMIT/OFFSET"""
    
    
        firstid = qry.session.query(func.min(pk_attr)).scalar()
    
        while firstid is not None:
            rec = None
            for rec in qry.filter(pk_attr >= firstid).order_by(pk_attr).limit(maxrq):
                yield rec
            firstid = pk_attr.__get__(rec) if rec else None
    

    maybe we could make some kind of category out of the existing limit recipe, e.g. https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/WindowedRangeQuery

  2. Former user Account Deleted

    Hey Mike - your updated recipe doesn't quite work - it seems like get(rec) requires a second argument. Also, firstid is never set to None, thus the loop doesn't end. At least for my test case.

    I'd love to see this become a category, as I'm unfortunately not able to use a DB that can use the WindowedRangeQuery recipe.

    Thanks

  3. Mike Bayer repo owner
    from sqlalchemy import *
    from sqlalchemy.orm import *
    from sqlalchemy.ext.declarative import declarative_base
    
    Base = declarative_base()
    
    class A(Base):
        __tablename__ = 'a'
    
        id = Column(Integer, primary_key=True)
    
    e = create_engine("sqlite://", echo=True)
    Base.metadata.create_all(e)
    
    sess = Session(e)
    
    sess.add_all([A() for i in range(2000)])
    
    def _yield_limit(qry, pk_attr, maxrq=100):
        r"""specialized windowed query generator (using LIMIT/OFFSET)
    
        This recipe is to select through a large number of rows thats too
        large to fetch at once. The technique depends on the primary key
        of the FROM clause being an integer value, and selects items
        surrounded with LIMIT/OFFSET"""
    
        firstid = None
        while True:
            q = qry
            if firstid is not None:
                q = qry.filter(pk_attr > firstid)
            rec = None
            for rec in q.order_by(pk_attr).limit(maxrq):
                yield rec
            if rec is None:
                break
            firstid = pk_attr.__get__(rec, pk_attr) if rec else None
    
    for rec in _yield_limit(sess.query(A), A.id):
        print(rec.id, rec)
    
  4. Former user Account Deleted

    Thanks Mike, that works great!

    Would you recommend using this over Query.yield_per(), even when using mysql+oursql, which fetches rows lazily?

  5. Mike Bayer repo owner

    I don't use yield_per() because it doesnt work with eager loading, so if im loading instances, then yes I do it this way.

  6. Former user Account Deleted

    So if I'm not using joinedload() or subqueryload(), I should be ok with yield_per() ?

  7. Log in to comment