- changed status to resolved
Joining Subquery on Relationship
Consider the following sqlalchemy generated query.
SELECT
venue.id AS venue_id
FROM
venue
WHERE
(
EXISTS (
SELECT
1
FROM
venue AS alias1
JOIN
venue_to_label AS venue_to_label_1
ON alias1.id = venue_to_label_1.venue_id
JOIN
label
ON label.id = venue_to_label_1.label_id
WHERE
label.id = :id_1
AND venue.id = alias1.id
)
)
AND (
EXISTS (
SELECT
1
FROM
venue AS alias2
JOIN
venue_to_label AS venue_to_label_2
ON alias2.id = venue_to_label_2.venue_id
JOIN
label
ON label.id = venue_to_label_2.label_id
WHERE
label.name = :name_1
AND venue.id = alias2.id
)
)
I'm trying to find all venues that have a label attached with a specific id and a label attached with a specific name. The sub-queries are necessary since this is a *_to_many relationship, but instead of joining them on the venue.id we could join them on venue_to_label.venue_id and move the relationship join into the main query. This would shorten the query by one join.
Is there an easy way to achieve this? I've attached the code generating the sql query below.
import unittest
from sqlalchemy import Table, Column, Integer, ForeignKey, String
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import relationship, aliased
from flask import Flask
from flask.ext.sqlalchemy import SQLAlchemy
app = Flask(__name__)
app.config['SQLALCHEMY_DATABASE_URI'] = (
'postgres://postgres:password@localhost:5432/tmp')
db = SQLAlchemy(app)
Base = declarative_base()
# many (venue) <-> many (label) mapping table
venue_to_label = Table(
'venue_to_label', db.metadata,
Column('venue_id', Integer, ForeignKey('venue.id'), primary_key=True),
Column('label_id', Integer, ForeignKey('label.id'), primary_key=True)
)
class Label(db.Model):
__tablename__ = 'label'
id = Column(Integer, primary_key=True, nullable=False)
name = Column(String(254))
class Venue(db.Model):
id = Column(Integer, primary_key=True, nullable=False)
labels = relationship(Label, secondary=venue_to_label)
db.create_all()
class TestJoinSubquery(unittest.TestCase):
def test_join_subquery(self):
query = Venue.query
# subquery one
alias = aliased(Venue, name="alias1")
subquery = db.session.query(alias)
subquery = subquery.join(alias.labels)
subquery = subquery.filter(Label.id == 3)
subquery = subquery.filter(Venue.id == alias.id)
query = query.filter(subquery.exists())
# subquery two
alias = aliased(Venue, name="alias2")
subquery = db.session.query(alias)
subquery = subquery.join(alias.labels)
subquery = subquery.filter(Label.name == "label_name")
subquery = subquery.filter(Venue.id == alias.id)
query = query.filter(subquery.exists())
print query
print query.all()
Comments (3)
-
repo owner -
repo owner I guess you dont really need the alias here either, these are subqueries:
query = session.query(Venue) subquery = session.query(venue_to_label) subquery = subquery.join(Label) subquery = subquery.filter(Label.id == 3) subquery = subquery.filter(Venue.id == venue_to_label.c.venue_id) query = query.filter(subquery.exists()) subquery = session.query(venue_to_label) subquery = subquery.join(Label) subquery = subquery.filter(Label.name == "some_label") subquery = subquery.filter(Venue.id == venue_to_label.c.venue_id) query = query.filter(subquery.exists())
SELECT venue.id AS venue_id FROM venue WHERE (EXISTS (SELECT 1 FROM venue_to_label JOIN label ON label.id = venue_to_label.label_id WHERE label.id = :id_1 AND venue.id = venue_to_label.venue_id)) AND (EXISTS (SELECT 1 FROM venue_to_label JOIN label ON label.id = venue_to_label.label_id WHERE label.name = :name_1 AND venue.id = venue_to_label.venue_id))
-
reporter Thank you again for the quick response! Your answer helped me to approach the problem differently and solve it. After so many hours you get tunnel vision I guess... Much appreciated!
Apologies for not posting this to the mailing list. I will do that in the future!
- Log in to comment
there's no "enhancement" I can see here as of yet, this is a "how do I?" question so please send these to the mailing list. If some specific feature becomes apparent then that can become an enhancement request.
as far as how to specify it, here's the easiest way:
output:
using all joins tends to be more efficient than using an EXISTS correlated subquery if you are looking for a positive relationship (e.g. rather than a NOT EXISTS).
if you want exactly the EXISTS scenario, with the two subqueries but just taking out Venue.id:
producing: