1. Michael Bayer
  2. sqlalchemy

Wiki

Clone wiki

sqlalchemy / UsageRecipes / DisjointEagerLoading

DisjointEagerLoading

Note: This feature is included in SQLAlchemy 0.6. Use Subquery Loading, described at http://www.sqlalchemy.org/docs/orm/loading.html.


The usual eagerload() functionality is great, but it loads parent/child objects using a JOIN or LEFT OUTER JOIN, meaning you get full parent rows for as many child rows that exist. For some geometries, this is inefficient, considering that most DBAPIs fully pre-buffer all the columns in every row whether you use them or not (SQLAlchemy itself doesn't re-read those parent columns on every row).

You can alternatively load the parent and child results in separate queries, then combine them together in a second step. It's very easy:

from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

# parent

class Parent(Base):
    __tablename__ = 'parent'
    id = Column(Integer, primary_key=True)
    data = Column(String)
    children = relation("Child", collection_class=set)

# child

class Child(Base):
    __tablename__ = 'child'
    id = Column(Integer, primary_key=True)
    data = Column(String)
    parent_id = Column(Integer, ForeignKey('parent.id'))
    def __repr__(self):
        return "Child(%r)" % self.data

# setup
engine = create_engine('sqlite://', echo=True)
Base.metadata.create_all(engine)

session = sessionmaker(engine)()

# test data

session.add_all([
    Parent(
        data="parent %d" % i, 
        children=set([Child(data="child %d parent %d" % (j, i)) for j in xrange(50)])
    ) 
    for i in xrange(100)
])
session.commit()

# query for Parent objects
q = session.query(Parent).filter(Parent.id.between(28, 39)).limit(12)

# load them
parents = q.all()

# reuse the query, join to Child objects.  group into collections.
from itertools import groupby
children = dict((k, list(v)) for k, v in groupby(
                q.from_self(Child).join(Parent.children).order_by(Child.parent_id, Child.id), 
                lambda x:x.parent_id
            ))

# piece together !
from sqlalchemy.orm import attributes

for p in parents:
    attributes.set_committed_value(p, "children", children.get(p.id, ()))

attributes.set_committed_value() is the public version of the same functionality the internal collection loading uses to populate. It only needs an iterable, so the above works regardless of the collection_class configuration on the relation.

Generalizing the Recipe

The above recipe can be generalized to produce the second query based on what Parent.children tells us. Below, we use the local_remote_pairs collection of RelationProperty to tell us most of what we need:

from itertools import groupby
from sqlalchemy.orm import attributes

def disjoint_load(query, attr):
    target = attr.prop.mapper
    local_cols, remote_cols = zip(*attr.prop.local_remote_pairs)

    child_q = query.from_self(target).join(attr).order_by(*remote_cols)
    if attr.prop.order_by:
        child_q = child_q.order_by(*attr.prop.order_by)

    collections = dict((k, list(v)) for k, v in groupby(
                    child_q, 
                    lambda x:tuple([getattr(x, c.key) for c in remote_cols])
                ))

    parents = query.all()

    for p in parents:
        attributes.set_committed_value(
            p, 
            attr.key, 
            collections.get(
                tuple([getattr(p, c.key) for c in local_cols]), 
                ())
        )
    return parents

q = session.query(Parent).filter(Parent.id.between(28, 39)).limit(12)

for p in disjoint_load(q, Parent.children):
    print p
    print p.children

The core SQLAlchemy feature when completed will work in the full range of scenarios as eagerload().

Updated