- changed milestone to 0.9.xx
New recipe -- solution to select through large number of rows for SQLITE
Sqlite does not support the window/range function described in this recipe
http://www.sqlalchemy.org/trac/wiki/UsageRecipes/WindowedRangeQuery
We have developed a solution along the same lines using LIMIT/OFFSET, and wanted to share this back with the community.
def _yield_limit(qry, pk, maxrq=1000):
r"""specialized windowed query generator (using LIMIT/OFFSET)
This recipe is to select through a large number of rows thats too
large to fetch at once. The technique depends on the primary key
of the FROM clause being an integer value, and selects items
surrounded with LIMIT/OFFSET"""
key = pk.property.key
nextid = qry.session.query(pk).order_by(key).\
limit(1).scalar()
if nextid is None:
return
while 1:
count = 0
for rec in qry.filter(pk >= nextid).limit(maxrq):
# Retrieve the key value before yielding as the
# caller could delete the object.
nextid = rec.__getattribute__(key) + 1
count += 1
yield rec
# If we received less than the number of items we requested
# we have reached the end.
if count != maxrq:
return
Comments (10)
-
repo owner -
repo owner this is pretty trivial and here's how to make sure it's ordering correctly:
def _yield_limit(qry, pk_attr, maxrq=1000): r"""specialized windowed query generator (using LIMIT/OFFSET) This recipe is to select through a large number of rows thats too large to fetch at once. The technique depends on the primary key of the FROM clause being an integer value, and selects items surrounded with LIMIT/OFFSET""" firstid = qry.session.query(func.min(pk_attr)).scalar() while firstid is not None: rec = None for rec in qry.filter(pk_attr >= firstid).order_by(pk_attr).limit(maxrq): yield rec firstid = pk_attr.__get__(rec) if rec else None
maybe we could make some kind of category out of the existing limit recipe, e.g. https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/WindowedRangeQuery
-
Account Deleted Hey Mike - your updated recipe doesn't quite work - it seems like get(rec) requires a second argument. Also, firstid is never set to None, thus the loop doesn't end. At least for my test case.
I'd love to see this become a category, as I'm unfortunately not able to use a DB that can use the WindowedRangeQuery recipe.
Thanks
-
repo owner from sqlalchemy import * from sqlalchemy.orm import * from sqlalchemy.ext.declarative import declarative_base Base = declarative_base() class A(Base): __tablename__ = 'a' id = Column(Integer, primary_key=True) e = create_engine("sqlite://", echo=True) Base.metadata.create_all(e) sess = Session(e) sess.add_all([A() for i in range(2000)]) def _yield_limit(qry, pk_attr, maxrq=100): r"""specialized windowed query generator (using LIMIT/OFFSET) This recipe is to select through a large number of rows thats too large to fetch at once. The technique depends on the primary key of the FROM clause being an integer value, and selects items surrounded with LIMIT/OFFSET""" firstid = None while True: q = qry if firstid is not None: q = qry.filter(pk_attr > firstid) rec = None for rec in q.order_by(pk_attr).limit(maxrq): yield rec if rec is None: break firstid = pk_attr.__get__(rec, pk_attr) if rec else None for rec in _yield_limit(sess.query(A), A.id): print(rec.id, rec)
-
Account Deleted Thanks Mike, that works great!
Would you recommend using this over Query.yield_per(), even when using mysql+oursql, which fetches rows lazily?
-
repo owner I don't use yield_per() because it doesnt work with eager loading, so if im loading instances, then yes I do it this way.
-
Account Deleted So if I'm not using joinedload() or subqueryload(), I should be ok with yield_per() ?
-
repo owner yeah.
-
repo owner - changed milestone to 1.0.xx
-
repo owner - changed status to resolved
I tacked that on at the bottom of https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/WindowedRangeQuery.
- Log in to comment