Joining Subquery on Relationship

Issue #3558 resolved
Lukas Siemon created an issue

Consider the following sqlalchemy generated query.

SELECT
   venue.id AS venue_id   
FROM
   venue   
WHERE
   (
      EXISTS (
         SELECT
            1   
         FROM
            venue AS alias1 
         JOIN
            venue_to_label AS venue_to_label_1 
               ON alias1.id = venue_to_label_1.venue_id 
         JOIN
            label 
               ON label.id = venue_to_label_1.label_id   
         WHERE
            label.id = :id_1 
            AND venue.id = alias1.id
      )
   ) 
   AND (
      EXISTS (
         SELECT
            1   
         FROM
            venue AS alias2 
         JOIN
            venue_to_label AS venue_to_label_2 
               ON alias2.id = venue_to_label_2.venue_id 
         JOIN
            label 
               ON label.id = venue_to_label_2.label_id   
         WHERE
            label.name = :name_1 
            AND venue.id = alias2.id
      )
   )

I'm trying to find all venues that have a label attached with a specific id and a label attached with a specific name. The sub-queries are necessary since this is a *_to_many relationship, but instead of joining them on the venue.id we could join them on venue_to_label.venue_id and move the relationship join into the main query. This would shorten the query by one join.

Is there an easy way to achieve this? I've attached the code generating the sql query below.

import unittest
from sqlalchemy import Table, Column, Integer, ForeignKey, String
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import relationship, aliased
from flask import Flask
from flask.ext.sqlalchemy import SQLAlchemy

app = Flask(__name__)
app.config['SQLALCHEMY_DATABASE_URI'] = (
    'postgres://postgres:password@localhost:5432/tmp')
db = SQLAlchemy(app)

Base = declarative_base()

# many (venue) <-> many (label) mapping table
venue_to_label = Table(
    'venue_to_label', db.metadata,
    Column('venue_id', Integer, ForeignKey('venue.id'), primary_key=True),
    Column('label_id', Integer, ForeignKey('label.id'), primary_key=True)
)


class Label(db.Model):
    __tablename__ = 'label'
    id = Column(Integer, primary_key=True, nullable=False)
    name = Column(String(254))


class Venue(db.Model):
    id = Column(Integer, primary_key=True, nullable=False)
    labels = relationship(Label, secondary=venue_to_label)

db.create_all()


class TestJoinSubquery(unittest.TestCase):

    def test_join_subquery(self):
        query = Venue.query

        # subquery one
        alias = aliased(Venue, name="alias1")
        subquery = db.session.query(alias)
        subquery = subquery.join(alias.labels)
        subquery = subquery.filter(Label.id == 3)
        subquery = subquery.filter(Venue.id == alias.id)
        query = query.filter(subquery.exists())

        # subquery two
        alias = aliased(Venue, name="alias2")
        subquery = db.session.query(alias)
        subquery = subquery.join(alias.labels)
        subquery = subquery.filter(Label.name == "label_name")
        subquery = subquery.filter(Venue.id == alias.id)
        query = query.filter(subquery.exists())

        print query
        print query.all()

Comments (3)

  1. Mike Bayer repo owner

    there's no "enhancement" I can see here as of yet, this is a "how do I?" question so please send these to the mailing list. If some specific feature becomes apparent then that can become an enhancement request.

    as far as how to specify it, here's the easiest way:

    a1 = aliased(Label)
    a2 = aliased(Label)
    q = query.join(venue_to_label).join(a1).join(a2).filter(a1.id == 1).\
        filter(a2.name == 'some_name')
    print q
    

    output:

    SELECT venue.id AS venue_id 
    FROM venue JOIN venue_to_label ON venue.id = venue_to_label.venue_id JOIN label AS label_1 ON label_1.id = venue_to_label.label_id JOIN label AS label_2 ON label_2.id = venue_to_label.label_id 
    WHERE label_1.id = :id_1 AND label_2.name = :name_1
    

    using all joins tends to be more efficient than using an EXISTS correlated subquery if you are looking for a positive relationship (e.g. rather than a NOT EXISTS).

    if you want exactly the EXISTS scenario, with the two subqueries but just taking out Venue.id:

    query = session.query(Venue)
    
    alias = aliased(venue_to_label, name="alias1")
    subquery = session.query(alias)
    subquery = subquery.join(Label)
    subquery = subquery.filter(Label.id == 3)
    subquery = subquery.filter(Venue.id == alias.c.venue_id)
    query = query.filter(subquery.exists())
    
    alias = aliased(venue_to_label, name="alias2")
    subquery = session.query(alias)
    subquery = subquery.join(Label)
    subquery = subquery.filter(Label.name == "some_label")
    subquery = subquery.filter(Venue.id == alias.c.venue_id)
    query = query.filter(subquery.exists())
    

    producing:

    SELECT venue.id AS venue_id 
    FROM venue 
    WHERE 
    
    (EXISTS (SELECT 1 
    FROM venue_to_label AS alias1 JOIN label ON label.id = alias1.label_id 
    WHERE label.id = :id_1 AND venue.id = alias1.venue_id)) 
    
    AND 
    
    (EXISTS (SELECT 1 
    FROM venue_to_label AS alias2 JOIN label ON label.id = alias2.label_id 
    WHERE label.name = :name_1 AND venue.id = alias2.venue_id))
    
  2. Mike Bayer repo owner

    I guess you dont really need the alias here either, these are subqueries:

    query = session.query(Venue)
    
    subquery = session.query(venue_to_label)
    subquery = subquery.join(Label)
    subquery = subquery.filter(Label.id == 3)
    subquery = subquery.filter(Venue.id == venue_to_label.c.venue_id)
    query = query.filter(subquery.exists())
    
    subquery = session.query(venue_to_label)
    subquery = subquery.join(Label)
    subquery = subquery.filter(Label.name == "some_label")
    subquery = subquery.filter(Venue.id == venue_to_label.c.venue_id)
    query = query.filter(subquery.exists())
    
    SELECT venue.id AS venue_id 
    FROM venue 
    WHERE 
    
    (EXISTS (SELECT 1 
    FROM venue_to_label JOIN label ON label.id = venue_to_label.label_id 
    WHERE label.id = :id_1 AND venue.id = venue_to_label.venue_id)) 
    
    AND 
    
    (EXISTS (SELECT 1 
    FROM venue_to_label JOIN label ON label.id = venue_to_label.label_id 
    WHERE label.name = :name_1 AND venue.id = venue_to_label.venue_id))
    
  3. Lukas Siemon reporter

    Thank you again for the quick response! Your answer helped me to approach the problem differently and solve it. After so many hours you get tunnel vision I guess... Much appreciated!

    Apologies for not posting this to the mailing list. I will do that in the future!

  4. Log in to comment