optimize subqueries to not render columns which aren't referred to on the outside

Issue #3289 wontfix
malthe created an issue

The following program outputs:

SELECT b.b_id AS b_b_id,
       b.a_id AS b_a_id,
       c.id AS c_id,
       c.c_name AS c_c_name,
       c.b_id AS c_b_id
FROM c JOIN (
       SELECT a.id AS a_id,
              a.a_name AS a_a_name,
              b.id AS b_id,
              b.b_name AS b_b_name
       FROM a JOIN b ON a.id = b.id
) AS b ON b.b_id = c.b_id

Note how b.b_name is in the subselect, but not used outside. It's a minor issue because it will most probably be optimized out by the database, but it still makes the query longer than it needs to be.

from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class A(Base):
    __tablename__ = 'a'

    id = Column(Integer, primary_key=True)
    a_name = Column(String)


class B(A):
    __tablename__ = 'b'

    id = Column(Integer, ForeignKey("a.id"), primary_key=True)
    b_name = Column(String)


class C(Base):
    __tablename__ = 'c'

    id = Column(Integer, primary_key=True)
    c_name = Column(String)
    b_id = Column("b_id", Integer, ForeignKey("b.id"))

    b = relationship(B)

e = create_engine("sqlite://", echo=True)
Base.metadata.create_all(e)

s = Session(e)

b = B(id=1, a_name="foo", b_name="bar")
s.add(b)

c = C(id=1, b_id=b.id, c_name="boo")
s.add(c)

s.flush()

alias = aliased(B, name="b")
print s.query(C).join(alias).options(
    contains_eager(C.b, alias=alias).load_only(alias.id)
)

Comments (3)

  1. Mike Bayer repo owner

    the subquery in the first place makes the query longer and vastly less efficient than any effect the column within has. The inner column is there because we are "selecting from" the "selectable" formed by this mapping, which is rendered as is. An optimization that caused subselects to only render columns that are called upon on the outside would be intricate, at this point might be doable, but IMHO not really worth it, because usually you want to just get rid of the subselect.

    which leads to how you should be doing this query, which is doing the alias as "flat":

    alias = aliased(B, flat=True)
    print s.query(C).join(alias).options(
        contains_eager(C.b, alias=alias).load_only(alias.id)
    )
    
    SELECT b_1.id AS b_1_id, a_1.id AS a_1_id, c.id AS c_id, c.c_name AS c_c_name, c.b_id AS c_b_id 
    FROM c JOIN (a AS a_1 JOIN b AS b_1 ON a_1.id = b_1.id) ON b_1.id = c.b_id
    

    see http://docs.sqlalchemy.org/en/rel_0_9/changelog/migration_09.html#many-join-and-left-outer-join-expressions-will-no-longer-be-wrapped-in-select-from-as-anon-1 for caveats on SQLite.

  2. Mike Bayer repo owner

    this would only be feasible if the Python overhead of figuring this out were not prohibitive. it also means that a text-specified column expression would no longer work though I dont think that's a common use. The system here would be drawing upon similar mechanisms as that of http://docs.sqlalchemy.org/en/latest/changelog/migration_10.html#order-by-and-group-by-are-special-cases.

  3. Mike Bayer repo owner

    this is a very complicated feature without too much of a use case. A subquery can be built now to include the specific columns needed and the "flat" argument allows the "auto subquery" thing to not take place; the "flat" alias needs to be publicized some more.

  4. Log in to comment