`contains_eager` for subqueryload

As we all know contains_eager and joinedload queries can be very expensive to run as the amount of columns and rows in the result set start to grow. subqueryload helps a lot in these cases, but unfortunately there's no contains_eager version for it which could allow filtering of the related collections. There is kind of a way to get around this by using the DisjointEagerLoading recipe. But could this perhaps be achieved in a more generic way which could make it into the library?

One idea I have is to pass in a new option sa.orm.enable_eagerloads_filtering() to .options(), like this:

import sqlalchemy as sa
import sqlalchemy.orm  # noqa
from sqlalchemy.ext import declarative

Base = declarative.declarative_base()
Session = sa.orm.sessionmaker()


class Product(Base):
    __tablename__ = 'product'
    id = sa.Column(sa.Integer, primary_key=True)
    artno = sa.Column(sa.Text)
    name = sa.Column(sa.Text)


class Variant(Base):
    __tablename__ = 'variant'
    id = sa.Column(sa.Integer, primary_key=True)
    color = sa.Column(sa.Text)
    product_id = sa.Column(sa.Integer, sa.ForeignKey(Product.id))
    product = sa.orm.relationship(Product, backref='variants')


class SKU(Base):
    __tablename__ = 'sku'
    id = sa.Column(sa.Integer, primary_key=True)
    size = sa.Column(sa.Text)
    variant_id = sa.Column(sa.Integer, sa.ForeignKey(Variant.id))
    variant = sa.orm.relationship(Variant, backref='skus')


product = Product(name='T-shirt', artno='12345')
variant = Variant(product=product, color='Black')
sku2 = SKU(variant=variant, size='S')
sku = SKU(variant=variant, size='M')
sku = SKU(variant=variant, size='L')


engine = sa.create_engine('sqlite://', echo=True)
Session.configure(bind=engine)
Base.metadata.create_all(bind=engine)
session = Session()
session.add_all([sku, sku2])
session.flush()

query = (
    session.query(Product)
    .outerjoin(Product.variants)
    .outerjoin(Variant.skus)
    .filter(Product.name == 'T-shirt')
    .filter(Variant.color == 'Black')
    .filter(SKU.size == 'M')
    .options(
        sa.orm.enable_eagerloads_filtering(),  # <-- This triggers filtering
        sa.orm.subqueryload('variants')
        .subqueryload('skus')
    )
)

Imagine that the commented line would change the behavior of how subqueryloads generates it's queries, and perhaps this could even be a replacement to contains_eager if both subqueryload and joinedload can be changed in this way. I imagine the generated queries to look something like what follows. Note how the inner queries look almost exactly the same.

Main query

SELECT
    product.id,
    product.artno,
    product.name
FROM product
WHERE product.id IN (
    SELECT DISTINCT product.id
    FROM product
    LEFT OUTER JOIN variant ON
        variant.product_id = product.id
    LEFT OUTER JOIN sku ON
        sku.variant_id = variant.id
    WHERE
        product.name = 'T-shirt' AND
        variant.color = 'Black' AND
        sku.size = 'M'
)

Variant subquery

SELECT
    variant.id,
    variant.product_id,
    variant.color
FROM variant
WHERE variant.id IN (
    SELECT DISTINCT variant.id
    FROM product
    LEFT OUTER JOIN variant ON
        variant.product_id = product.id
    LEFT OUTER JOIN sku ON
        sku.variant_id = variant.id
    WHERE
        product.name = 'T-shirt' AND
        variant.color = 'Black' AND
        sku.size = 'M'
)

SKU subquery

SELECT
    sku.id,
    sku.variant_id,
    sku.size
FROM sku
WHERE sku.id IN (
    SELECT DISTINCT sku.id
    FROM product
    LEFT OUTER JOIN variant ON
        variant.product_id = product.id
    LEFT OUTER JOIN sku ON
        sku.variant_id = variant.id
    WHERE
        product.name = 'T-shirt' AND
        variant.color = 'Black' AND
        sku.size = 'M'
)

SQLAlchemy then bakes in the SKU and Variant results into the main query results like it normally would.

I'm not very familiar at all with the internals of SQLAlchemy so I can't say anything about the feasibility of this method. Could it be achieved? And is it a good idea?

Comments (7)