column targeting failure with any() + joinedload + join to joined inh + limit

Issue #2419 resolved

Mike Bayer repo owner created an issue 2012-02-28

another one where joinedload + limit subquerying screws up targeting

from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base

Base= declarative_base()

class A(Base):
    __tablename__ = "a"

    id = Column(Integer, primary_key=True)

# test both here
#class B(A):
class B(Base):
    __tablename__ = "b"

#    id = Column(Integer, ForeignKey('a.id'), primary_key=True)
    id = Column(Integer, primary_key=True)
    ds = relationship("D")
    es = relationship("E")

class C(A):
    __tablename__ = "c"

    id = Column(Integer, ForeignKey('a.id'), primary_key=True)
    b_id = Column(Integer, ForeignKey('b.id'))
    b = relationship("B", primaryjoin=b_id==B.id)

class D(Base):
    __tablename__ = "d"

    id = Column(Integer, primary_key=True)
    b_id = Column(Integer, ForeignKey('b.id'))

class E(Base):
    __tablename__ = 'e'
    id = Column(Integer, primary_key=True)
    b_id = Column(Integer, ForeignKey('b.id'))


e = create_engine("sqlite://", echo=True)

Base.metadata.create_all(e)

s = Session(e)

s.add_all([   C(
        b=B(ds=[D()](
))
    )

])

s.commit()

q = s.query(B, B.ds.any(D.id==1)).options(joinedload_all("es"))
q = q.join(C, C.b_id==B.id)
q = q.limit(5)
print q.all()

Comments (9)

Mike Bayer reporter

it appears I can generate the identical query and identical result without the error using outerjoin() + contains_eager:

ea = aliased(E)
q = s.query(B, B.ds.any(D.id==1)).options(contains_eager("es", alias=ea))
q = q.join(C, C.b_id==B.id)
q = q.limit(5).from_self().outerjoin(ea, B.es)
print q.all()



SELECT e_1.id AS e_1_id, e_1.b_id AS e_1_b_id, anon_1.b_id AS anon_1_b_id, anon_1.anon_2 AS anon_1_anon_2 
FROM (SELECT b.id AS b_id, EXISTS (SELECT 1 
FROM d 
WHERE b.id = d.b_id AND d.id = ?) AS anon_2 
FROM b JOIN (SELECT a.id AS a_id, c.id AS c_id, c.b_id AS c_b_id 
FROM a JOIN c ON a.id = c.id) AS anon_3 ON anon_3.c_b_id = b.id
 LIMIT ? OFFSET ?) AS anon_1 LEFT OUTER JOIN e AS e_1 ON anon_1.b_id = e_1.b_id

vs.

SELECT anon_1.b_id AS anon_1_b_id, anon_1.anon_2 AS anon_1_anon_2, e_1.id AS e_1_id, e_1.b_id AS e_1_b_id 
FROM (SELECT b.id AS b_id, EXISTS (SELECT 1 
FROM d 
WHERE b.id = d.b_id AND d.id = ?) AS anon_2 
FROM b JOIN (SELECT a.id AS a_id, c.id AS c_id, c.b_id AS c_b_id 
FROM a JOIN c ON a.id = c.id) AS anon_3 ON anon_3.c_b_id = b.id
 LIMIT ? OFFSET ?) AS anon_1 LEFT OUTER JOIN e AS e_1 ON anon_1.b_id = e_1.b_id

2012-02-28T12:07:08+00:00

Mike Bayer reporter
- changed title to column targeting failure with any() + joinedload + join to joined inh + limit
- 2012-02-28T12:08:27+00:00
Mike Bayer reporter
OK next trick, "anon_2.anon_1" isn't the name it's looking for. It's got anonymous names in there but when the exception is raised it calls str() on it, so where the anon_X goes is based on that.
- 2012-02-28T12:16:34+00:00
Mike Bayer reporter
so if we take the problem to be, that Query should be aliasing correctly and not relying on key_fallback, we can disable key_fallback (maybe that should be an execution option ?) and then make the failure very simple:
```
# fails
q = s.query(B).from_self(B.ds.any(D.id==1))

# works
q = s.query(B).from_self(B.id)
```
pdbing into _ColumnEntity.row_processor() leads to very complex situations with labels coming up that aren't in result_map, not yet clear exactly what that's about.
- 2012-02-28T12:41:07+00:00
Mike Bayer reporter
note that changing "C" to not be joined inheritance resolves. It seems like a corresponding column issue due to the fact that the join to "C" subquery runs the whole select through an adaption. if you turn off require_embedded or use adapt_on_names in the adapter used in _ColumnEntity.row_processor, then it matches. both are not options as they each cause failures in some of those very obscure name collision tests.
- 2012-02-28T13:36:16+00:00

Mike Bayer reporter

patch:

diff -r fd9dd2280003707c97cf9e928ac474d2d8f9c20d lib/sqlalchemy/sql/expression.py
--- a/lib/sqlalchemy/sql/expression.py  Mon Feb 27 11:14:46 2012 -0500
+++ b/lib/sqlalchemy/sql/expression.py  Tue Feb 28 14:22:09 2012 -0500
@@ -2466,6 +2466,12 @@

         """

+        def embedded(proxy_set, target_set):
+            for t in target_set:
+                if not set(_expand_cloned([t](t))).intersection(_expand_cloned(proxy_set)):
+                    return False
+            return True
+
         # dont dig around if the column is locally present
         if self.c.contains_column(column):
             return column
@@ -2476,7 +2482,7 @@
             i = target_set.intersection(itertools.chain(*[                    for p in c.proxy_set](p._cloned_set
)))
             if i and (not require_embedded
-                      or c.proxy_set.issuperset(target_set)):
+                      or embedded(c.proxy_set, target_set)):
                 if col is None:

                     # no corresponding column yet, pick this one.

this liberalizes the "embedded" test by considering any clone of each item in target_set to count as "contained" within c.proxy_set. basically, column elements are cloned out on three axes (!), cloned, proxy, and annotated. annotated is taken care of via __hash__(). corresponding_column needs specificity regarding proxies. But clones should also be considered equivalent (right?). need to test this out in sql.test_generative. everything passes though.

2012-02-28T14:25:04+00:00

Mike Bayer reporter

here's a refinement:

diff -r fd9dd2280003707c97cf9e928ac474d2d8f9c20d lib/sqlalchemy/sql/expression.py
--- a/lib/sqlalchemy/sql/expression.py  Mon Feb 27 11:14:46 2012 -0500
+++ b/lib/sqlalchemy/sql/expression.py  Tue Feb 28 14:35:30 2012 -0500
@@ -2466,6 +2466,13 @@

         """

+        def embedded(expanded_proxy_set, target_set):
+            for t in target_set.difference(expanded_proxy_set):
+                if not set(_expand_cloned([t](t))
+                            ).intersection(expanded_proxy_set):
+                    return False
+            return True
+
         # dont dig around if the column is locally present
         if self.c.contains_column(column):
             return column
@@ -2473,10 +2480,10 @@
         target_set = column.proxy_set
         cols = self.c
         for c in cols:
-            i = target_set.intersection(itertools.chain(*[                   for p in c.proxy_set](p._cloned_set
-)))
+            expanded_proxy_set = set(_expand_cloned(c.proxy_set))
+            i = target_set.intersection(expanded_proxy_set)
             if i and (not require_embedded
-                      or c.proxy_set.issuperset(target_set)):
+                      or embedded(expanded_proxy_set, target_set)):
                 if col is None:

                     # no corresponding column yet, pick this one.

2012-02-28T14:35:45+00:00

Mike Bayer reporter
- changed status to resolved
some additional tweaks to copy over _is_clone_of for exported columns.

0536c48dafb670d34fc96d26078b41ed6accf01f
- 2012-02-29T17:48:48+00:00
Mike Bayer reporter
- removed milestone
Removing milestone: 0.7.6 (automated comment)
- 2014-02-23T18:14:55+00:00
Log in to comment

Assignee: Mike Bayer

Type: bug

Priority: critical

Status: resolved

Component: orm

Milestone: –

Votes: 0

Watchers: 0