`contains_eager` for subqueryload
As we all know contains_eager
and joinedload
queries can be very expensive to run as the amount of columns and rows in the result set start to grow. subqueryload
helps a lot in these cases, but unfortunately there's no contains_eager
version for it which could allow filtering of the related collections. There is kind of a way to get around this by using the DisjointEagerLoading recipe. But could this perhaps be achieved in a more generic way which could make it into the library?
One idea I have is to pass in a new option sa.orm.enable_eagerloads_filtering()
to .options()
, like this:
import sqlalchemy as sa
import sqlalchemy.orm # noqa
from sqlalchemy.ext import declarative
Base = declarative.declarative_base()
Session = sa.orm.sessionmaker()
class Product(Base):
__tablename__ = 'product'
id = sa.Column(sa.Integer, primary_key=True)
artno = sa.Column(sa.Text)
name = sa.Column(sa.Text)
class Variant(Base):
__tablename__ = 'variant'
id = sa.Column(sa.Integer, primary_key=True)
color = sa.Column(sa.Text)
product_id = sa.Column(sa.Integer, sa.ForeignKey(Product.id))
product = sa.orm.relationship(Product, backref='variants')
class SKU(Base):
__tablename__ = 'sku'
id = sa.Column(sa.Integer, primary_key=True)
size = sa.Column(sa.Text)
variant_id = sa.Column(sa.Integer, sa.ForeignKey(Variant.id))
variant = sa.orm.relationship(Variant, backref='skus')
product = Product(name='T-shirt', artno='12345')
variant = Variant(product=product, color='Black')
sku2 = SKU(variant=variant, size='S')
sku = SKU(variant=variant, size='M')
sku = SKU(variant=variant, size='L')
engine = sa.create_engine('sqlite://', echo=True)
Session.configure(bind=engine)
Base.metadata.create_all(bind=engine)
session = Session()
session.add_all([sku, sku2])
session.flush()
query = (
session.query(Product)
.outerjoin(Product.variants)
.outerjoin(Variant.skus)
.filter(Product.name == 'T-shirt')
.filter(Variant.color == 'Black')
.filter(SKU.size == 'M')
.options(
sa.orm.enable_eagerloads_filtering(), # <-- This triggers filtering
sa.orm.subqueryload('variants')
.subqueryload('skus')
)
)
Imagine that the commented line would change the behavior of how subqueryloads
generates it's queries, and perhaps this could even be a replacement to contains_eager
if both subqueryload
and joinedload
can be changed in this way. I imagine the generated queries to look something like what follows. Note how the inner queries look almost exactly the same.
Main query
SELECT
product.id,
product.artno,
product.name
FROM product
WHERE product.id IN (
SELECT DISTINCT product.id
FROM product
LEFT OUTER JOIN variant ON
variant.product_id = product.id
LEFT OUTER JOIN sku ON
sku.variant_id = variant.id
WHERE
product.name = 'T-shirt' AND
variant.color = 'Black' AND
sku.size = 'M'
)
Variant subquery
SELECT
variant.id,
variant.product_id,
variant.color
FROM variant
WHERE variant.id IN (
SELECT DISTINCT variant.id
FROM product
LEFT OUTER JOIN variant ON
variant.product_id = product.id
LEFT OUTER JOIN sku ON
sku.variant_id = variant.id
WHERE
product.name = 'T-shirt' AND
variant.color = 'Black' AND
sku.size = 'M'
)
SKU subquery
SELECT
sku.id,
sku.variant_id,
sku.size
FROM sku
WHERE sku.id IN (
SELECT DISTINCT sku.id
FROM product
LEFT OUTER JOIN variant ON
variant.product_id = product.id
LEFT OUTER JOIN sku ON
sku.variant_id = variant.id
WHERE
product.name = 'T-shirt' AND
variant.color = 'Black' AND
sku.size = 'M'
)
SQLAlchemy then bakes in the SKU
and Variant
results into the main query results like it normally would.
I'm not very familiar at all with the internals of SQLAlchemy so I can't say anything about the feasibility of this method. Could it be achieved? And is it a good idea?
Comments (7)
-
repo owner -
repo owner I'd imagine this would best be:
query = ( session.query(Product) .outerjoin(Product.variants) .outerjoin(Variant.skus) .filter(Product.name == 'T-shirt') .filter(Variant.color == 'Black') .filter(SKU.size == 'M') .options( sa.orm.contains_eager('variants', separate=True) .contains_eager('skus', separate=True) ) )
you then get essentially the same query three times with a different set of columns. The spirit of "contains_eager()" is, this is exactly the query we want.
-
reporter You're absolutely right, @zzzeek.
How do you envision the queries actually turning out with the suggestion you made?
-
reporter Perhaps a
distinct
keyword could be passed in to contains_eager as well? I'd imagine that would be cheaper than having python iterate through all the duplicate rows one might end up with (like the "variants" query would in this case if there'd be thousands of SKUs). -
repo owner it's not a good idea to put distinct() on an open-ended set of columns. the existing subqueryload has distinct() on the inner query where we only have primary key values. so repurposing contains_eager() to work for second/third/etc. loads while maintaining WYSIWYG is not ideal.
-
repo owner really, this goes back to the fundamental point that the use case of, "I want to load X's filtered by Y, and also have their collections filtered to Y", is strange. I've been writing production applications with SQLAlchemy for eight years and have never wanted to do that. I usually encourage people who have the "load some Xs and some related Ys" do do it the normal way, which here would be "sess.query(Product, Variant, Sku)" ...
-
repo owner - changed status to wontfix
I don't have plans to work on / support a feature like this, the use case of "filtered collections" is not a first-class use case and to automate it further will lead to burdensome, complicated features on top of the ORM.
- Log in to comment
That proposal can't work because the "main query" emitted in SQL looks nothing like what's being asked for by Query. The contract of Query is that you always get exactly the join() / outerjoin() / filter() you ask for, regardless of eager loading. It looks like you're having the eager loading directives actually remove meaning from the outermost query and that doesn't make any sense. There's no IN or DISTINCT or anything else like that being requested in the main query.
if the goal is, "do the thing that contains_eager() accomplishes except with subqueries", you should use a new query for that. We can make the recipe in disjointeagerloading easier to use, but definitely, the simplest way for any user to understand the feature of, "I'd like to emit query A, then query B, and have full control over query B", is that they write query B. You can use query A as the foundation for query B.