innerjoin=True is ignored on second joinedload() option

Issue #2976 resolved
Ezra Epstein created an issue

How should be (IMO): 1. joinedload (and join / contains_eager) should, by default, use the semantics of the declared / mapped relationship if there is one. Thus if the relationship is a to-one (uselist=False or innerjoin=True), then those attributes of the relationship should be discovered and used when building the SQL query. 2. If one specifies innerjoin=True on the (second) joinedload() it should be honored on that join.

How it is: session.query(AClass).options(joinedload(AClass.rel1).joinedload(RelClass.rel2, innerjoin=True)).filter(AClass.attrib=='whatever').one()

Yields two regular (default left outer) JOIN clauses.

It should (I think) have INNER JOIN on the second of those clauses.

Comments (8)

  1. Mike Bayer repo owner

    the reason innerjoin=True is ignored if following an outer join is because it otherwise returns the wrong result rows. the golden rule of joined eager loading is that it must only affect the contents of the target collection or attribute, and never the parent object. This is described at the zen of eager loading.

    Using sqlite we start with three tables and six rows:

    sqlite> create table a (id integer);
    sqlite> create table b (id integer);
    sqlite> create table c (id integer);
    sqlite> insert into a (id) values(1);
    sqlite> insert into a (id) values(2);
    sqlite> insert into a (id) values(3);
    sqlite> insert into b (id) values(1);
    sqlite> insert into b (id) values(3);
    sqlite> insert into c (id) values(3);
    

    If we load all "a"s, and "eagerload" bs, using an outer join, that means some "a" rows may not have a "b" row. It means we want "a" rows 1, 2, 3:

    sqlite> select a.* from a left outer join b on a.id=b.id;
    1
    2
    3
    

    if we would like to now also eagerly load every "c" for every "b", we do an outer join. the "a" rows that we get don't change:

    sqlite> select a.* from a left outer join b on a.id=b.id left outer join c on b.id=c.id;
    1
    2
    3
    

    however, if we do an inner join directly from "b" to "c", without parenthesizing results, we now get the wrong answer for "a":

    sqlite> select a.* from a left outer join b on a.id=b.id join c on b.id=c.id;
    3
    

    now you might be getting at, we should nest the join from b to c in a parenthesis. OK, that's a possibility. The ability for joined eager loading to do this at all was only introduced in version 0.9, a feature we've avoided since SQLite wouldn't allow it. in http://docs.sqlalchemy.org/en/rel_0_9/changelog/migration_09.html#many-join-and-left-outer-join-expressions-will-no-longer-be-wrapped-in-select-from-as-anon-1, we've modified our approach so that we can do this, and in the case of SQLite we'd generate a SELECT subquery something like this:

    sqlite> select a.* from a left outer join (select b.id as b_id from b join c on b.id=c.id) as subq on a.id=subq.b_id;
    1
    2
    3
    

    but currently 'innerjoin=True' is not that sophisticated and I'm not sure what effort would be needed to make that work. Is that what you're getting at?

  2. Mike Bayer repo owner
    • Added a new option to :paramref:.relationship.innerjoin which is to specify the string "nested". When set to "nested" as opposed to True, the "chaining" of joins will parenthesize the inner join on the right side of an existing outer join, instead of chaining as a string of outer joins. This possibly should have been the default behavior when 0.9 was released, as we introduced the feature of right-nested joins in the ORM, however we are keeping it as a non-default for now to avoid further surprises. fixes #2976

    → <<cset 12ce2edc92ef>>

  3. Mike Bayer repo owner

    this is now available if you say innerjoin="nested" instead of True. This probably should have been the default when we released 0.9, but as we're up to 0.9.3 already I'm keeping it as an alternative to avoid any further regressions.

  4. Ezra Epstein reporter

    Hi Mike,

    Your earlier response got me to reflecting more deeply. I see that, at the end, the ability to specify INNER JOIN ends up being more of helping the DB optimize the query, as the results I'm getting now are, in fact the ones I want, as a strict to-one FK to a target's PK column/s always returns one target row, hence the LEFT OUTER gives the correct result in this case.

    Still, the innerjoin="nested" provides the new functionality without interfering with existing behavior. I'll take it! ;-)

    Thanks again.

  5. Mike Bayer repo owner
    • Fixed a regression caused by 🎫2976 released in 0.9.4 where the "outer join" propagation along a chain of joined eager loads would incorrectly convert an "inner join" along a sibling join path into an outer join as well, when only descendant paths should be receiving the "outer join" propagation; additionally, fixed related issue where "nested" join propagation would take place inappropriately between two sibling join paths.

    this is accomplished by re-introducing the removed flag "allow_innerjoin", now inverted and named "chained_from_outerjoin". Propagating this flag allows us to know when we have encountered an outerjoin along a load path, without confusing it for state obtained from a sibling path.

    fixes #3131 ref #2976

    → <<cset 61384fd0e529>>

  6. Mike Bayer repo owner
    • Fixed a regression caused by 🎫2976 released in 0.9.4 where the "outer join" propagation along a chain of joined eager loads would incorrectly convert an "inner join" along a sibling join path into an outer join as well, when only descendant paths should be receiving the "outer join" propagation; additionally, fixed related issue where "nested" join propagation would take place inappropriately between two sibling join paths.

    this is accomplished by re-introducing the removed flag "allow_innerjoin", now inverted and named "chained_from_outerjoin". Propagating this flag allows us to know when we have encountered an outerjoin along a load path, without confusing it for state obtained from a sibling path.

    fixes #3131 ref #2976

    → <<cset 43663e7aac99>>

  7. Log in to comment