support nesting of joins with joined eager loading

Issue #2120 resolved
Former user created an issue

Using 0.6.6. [BR] A might have a B. I create a NULLable A.b_id column, and the relation A.b. [BR] B always has a C. I create a NOT NULL B.c_id column, and the relation B.c with innerjoin=True.

I perform this query:

session.query(A).options(joinedload_all(A.b, B.c))

As expected, the resulting join condition is:

FROM a LEFT OUTER JOIN b ON ... LEFT OUTER JOIN c ON ...

However, if I do this:

session.query(A).outerjoin(A.b).options(contains_eager(A.b), joinedload(B.c))

Then the query is:

FROM a LEFT OUTER JOIN b ON ... JOIN c ON ...

So every A without a B is excluded from the query, even though I explicitly outerjoined.

Quick script doing the above is attached.

For what it's worth, I don't believe either of the above queries are really correct. I expected to see:

FROM a LEFT OUTER JOIN (b JOIN c ON ...) ON ...

Comments (12)

  1. Former user Account Deleted

    Hm, I see nested joins were proposed in comment🎫1643:2. I don't think your following comment is quite right; nested joins are standard SQL, and definitely shouldn't be related to subqueries. They only affect join order and condition evaluation order.

    Looks like MySQL just didn't get around to supporting the syntax correctly until 5.0. Not aware of any other RDBMSes that have problems.

  2. Mike Bayer repo owner

    Replying to guest:

    However, if I do this:

    session.query(A).outerjoin(A.b).options(contains_eager(A.b), joinedload(B.c))
    

    This is not even a use case that's ever been anticipated. I'm impressed it works at all. Passing innerjoin=False to your joinedload is an easy solution.

    Replying to guest:

    Hm, I see nested joins were proposed in comment🎫1643:2. I don't think your following comment is quite right; nested joins are standard SQL, and definitely shouldn't be related to subqueries. They only affect join order and condition evaluation order.

    Looks like MySQL just didn't get around to supporting the syntax correctly until 5.0. Not aware of any other RDBMSes that have problems.

    This implies that eagerloading would work in a completely different manner and I'd be concerned that the "a outerjoin (b join c)" case, which a lot of work went into to prevent (because MySQL certainly didn't support it at all the time), would start to occur regularly. That's a very major change with little upside except that you wouldn't need to add a flag to your joinedload above. I would not be at all ready to allow "a outerjoin (b join c)" to occur frequently unless I know that it is supported across all of MySQL, Oracle 9+, doesn't destroy Oracle query plans (many things do), works with our Oracle 8 join-compatiblity mode, Firebird, Sybase, SQL Server, etc. Otherwise it's not worth breaking lots of applications to support an edge case with easy workarounds.

    "a outerjoin (b join c)" is entirely doable if you say query.select_from(outerjoin(A, join(B, C, B.c), A.b)). So SQLAlchemy supports this syntax just at the moment it seems a very destabilizing change to render them automatically via the joinedload mechanism, which otherwise hasn't changed much at all in five years.

  3. Mike Bayer repo owner

    The basic idea here is that by using contains_eager(), you've chosen to design your joins explicitly. Using joinedload() on top of that means you're trying to still take advantage of a particular, and optional, "automatic join designing" service of the ORM. But you're mixing it with a manual approach. It seems reasonable that you might need to customize your usage of joinedload() in this case. joinedload() only renders non-parenthesized joins for maximum database compatibility and minimal query plan issues. To make it group joins with parenthesis by default means it works entirely differently.

    Perhaps, if some flag or option could allow the joinedload() strategy to render in this way only when requested (perhaps per-dialect, grouped_joins_ok), that would be a softer approach to its introduction. It remains to be seen how much of a change to Query/EagerLoader this would be.

  4. Mike Bayer repo owner

    some quick tests. The script below executes verbatim for PG (obviously), SQL Server 2008, Oracle 11, MySQL 5, Sybase 11, Firebird. Still chokes on SQLite 3.6.12:

    sqlite> create table a (id integer primary key);
    sqlite> create table b (id integer primary key, aid integer references a(id));
    sqlite> create table c (id integer primary key, bid integer references b(id));
    sqlite> insert into a (id) values (1);
    sqlite> insert into b (id, aid) values (1, 1);
    sqlite> insert into c (id, bid) values (1, 1);
    sqlite> select * from a left outer join b on a.id=b.aid join c on b.id=c.bid;
    1|1|1|1|1
    sqlite> select * from a left outer join (b join c on b.id=c.bid) on a.id=b.aid;
    SQL error: no such column: b.aid
    sqlite> select * from a left outer join (b as ba join c on ba.id=c.bid) on a.id=ba.aid;
    SQL error: no such column: ba.aid
    sqlite> select * from a left outer join (b ba join c on ba.id=c.bid) on a.id=ba.aid;
    SQL error: no such column: ba.aid
    

    A little surprising that so many DBs do support it yet one very major DB still does not. I think its still early for this behavior by default.

  5. Former user Account Deleted

    Perhaps, if some flag or option could allow the joinedload() strategy to render in this way only when requested (perhaps per-dialect, grouped_joins_ok), that would be a softer approach to its introduction. It remains to be seen how much of a change to Query/EagerLoader this would be.

    Sure, that's fair. I do believe this is worth supporting sometime; even with small tables, I see a significant performance boost in Pg when using a nested inner join instead of flat outer joins.

    The basic idea here is that by using contains_eager(), you've chosen to design your joins explicitly. Using joinedload() on top of that means you're trying to still take advantage of a particular, and optional, "automatic join designing" service of the ORM. But you're mixing it with a manual approach. It seems reasonable that you might need to customize your usage of joinedload() in this case.

    I actually have lazy='joined' on the B.c relationship, so my query never mentions C, yet its presence is interfering with A. The action at a distance was a surprise. I only use contains_eager in one query, though, so adding the explicit joinedload() option is easy.

    A little surprising that so many DBs do support it yet one very major DB still does not. I think its still early for this behavior by default.

    Seems there's an ancient but very-low-priority bug with how SQLite looks up table names in a FROM clause. This does work:

    select * from a left outer join (b join c on b.id=c.bid) bc on bc.aid = a.id;
    

    But then you can't control what happens with overlapping column names, which is not too useful in the general case. What a shame.

  6. Mike Bayer repo owner

    Replying to guest:

    Perhaps, if some flag or option could allow the joinedload() strategy to render in this way only when requested (perhaps per-dialect, grouped_joins_ok), that would be a softer approach to its introduction. It remains to be seen how much of a change to Query/EagerLoader this would be.

    Sure, that's fair. I do believe this is worth supporting sometime; even with small tables, I see a significant performance boost in Pg when using a nested inner join instead of flat outer joins.

    yeah its obviously the way the joined loader wants to do things, i.e. it can just join onto what's there instead of digging through the joins to find a place to attach. that code just took a very long time to develop, went through many refactorings to do things in an ever more generalized way, etc. I haven't analyzed how hard the change would be yet, though.

  7. Mike Bayer repo owner

    Looking at this:

    session.query(A).outerjoin(A.b).options(contains_eager(A.b), joinedload(B.c))
    

    I don't see where SQLAlchemy is being told to say "a outerjoin (b innerjoin c)". Joins work from left to right - this is synonymous A.outerjoin(B).innerjoin(C). Of course if you use joinedload_all(A.b, B.c), right now SQLAlchemy will do an outerjoin all the way, and perhaps in that case, it should nest the joins, since it has the correct context to make this decision. That's essentially #2369.

    In this case, you're joining explicitly. When you say query(A).outerjoin(A.b), the "join point" is now "B". Any generations subsequent to that are against "B", and it wouldn't work for an incoming joinedload() to change that decision, turning the "B" joinpoint into a "B join C" one. The joinedload() is the same as saying the_query_I_already_have.join(B.c).

    Query should always keep the structure of the SQL the same as what you stated. So if you want "b innerjoin c" to be nested, you need to express that nesting, and say:

    session.query(A).outerjoin(join(B, C), A.b)
    

    there, you have nesting that's expressed the same way. If you want the joinedload() option to work out that way, you use contains_eager():

    qsub = aliased(join(B, C))
    q = session.query(A).outerjoin(qsub, A.b).options(
                    contains_eager(A.b, alias=qsub), 
                    contains_eager(A.b, B.c, alias=qsub)
                )
    

    so while allowing joinedload() to outerjoin to a nested join, that's really #2369, and I've added the non-many-to-many use case there. It implies that joinedload_all(x, y, z) would nest the joins, when it is given the opportunity to decide about the query as a whole. This ticket refers to when you're doing things explicitly, and the use case is already supported so this is worksforme.

  8. Mike Bayer repo owner

    Also might consider that #2369 would allow query(A).outerjoin(join(B, C)) to not generate a "SELECT " subquery if it isn't needed.

  9. Log in to comment