PostgreSQL LATERAL JOIN support

Issue #2857 resolved
Former user created an issue

Comments (30)

  1. Mike Bayer repo owner

    this is pretty easy to do yourself, feel free to turn into a pullreq (with unit tests of course):

    class Lateral(Alias):
       # ...
    
    @compiles(Lateral, "postgresql")
    def compile(element, compiler, **kw):
        return "LATERAL %s" % compiler.visit_alias(element, **kw)
    
  2. Jon Rosebaugh

    We've been trying this approach at Axial and we ran into a problem with correlation. Here's the test case:

    import sqlalchemy as sa
    from sqlalchemy.ext.compiler import compiles
    from sqlalchemy.sql.expression import Alias
    
    class Lateral(Alias):
        pass
    
    @compiles(Lateral)
    def compile(element, compiler, **kw):
        return "LATERAL %s" % compiler.visit_alias(element, **kw)
    
    company = sa.sql.table('company', sa.sql.column('id', sa.Integer))
    member = sa.sql.table('member', sa.sql.column('id', sa.Integer), sa.sql.column('company_id', sa.Integer))
    
    first = sa.select([company.c.id]).alias('first')
    second = Lateral(sa.select([sa.func.count(member.c.id)]).where(member.c.company_id == first.c.id), 'second')
    
    print sa.select(['*'], from_obj=sa.join(first, second, sa.sql.true()))
    
    """
    This prints:
    SELECT *
    FROM (SELECT company.id AS id
    FROM company) AS first JOIN LATERAL (SELECT count(member.id) AS count_1
    FROM member, (SELECT company.id AS id
    FROM company) AS first
    WHERE member.company_id = first.id) AS second ON TRUE
    
    Note how the 'first' selectable is repeated in the FROM in the 'second' selectable.
    I've tried using .correlate() and .correlate_except() and I can't make it go away.
    It should produce this:
    
    SELECT *
    FROM (SELECT company.id AS id
    FROM company) AS first JOIN LATERAL (SELECT count(member.id) AS count_1
    FROM member
    WHERE member.company_id = first.id) AS second ON TRUE
    """
    
  3. Mike Bayer repo owner

    every fiber of correlation at the moment expects that correlation only occurs from the WHERE clause or the COLUMNS clause. If the idea of LATERAL join is that we can correlate between elements in the FROM clause, then yes we need to dig into the mechanics of select() quite deeply.

  4. Martijn van Oosterhout

    Yesterday I really needed LATERAL and was surprised that SQLAlchemy didn't support it. Besides making queries with subselects a lot easier to read it's also more flexible. So after browsing the SQLAlchemy code yesterday I've come up with something that sorta works.

    https://bitbucket.org/martijn_van_oosterhout/sqlalchemy/commits/2943536fe040f52401a6a9278670126b38cb2405

    It works by pushing a selectable on the compiler stack while processing the lateral making the hide_from machinery do it's job, which means all the normal correlation stuff works as expected. I'm not sure about the syntax or the way it's coded, but I'd like some feedback as to whether this approach is feasible.

  5. Mike Bayer repo owner

    Yesterday I really needed LATERAL and was surprised that SQLAlchemy didn't support it

    Usually when PG adds support for some keyword that the world of SQL has done without just fine for decades I'm slow to jump on those. this keyword is part of the SQL standard though, so that helps its case significantly, including that we can add this to the Core as opposed to specific to the Postgresql dialect.

    It works by pushing a selectable on the compiler stack while processing the lateral making the hide_from machinery do it's job, which means all the normal correlation stuff works as expected.

    Well the LATERAL keyword has no relationship to the JOIN keyword, so this feature wouldn't have any connection to the Join object or join methods. It's a modifier to a FROM in any context, meaning any FromClause object can advertise itself as "lateral" and be considered differently within the mechanics of correlation. These mechanics take place here: https://bitbucket.org/zzzeek/sqlalchemy/src/acda2e7d0b736e349c24905a359a4e765f829683/lib/sqlalchemy/sql/selectable.py?at=master#cl-2514 , so that would be most of the code. compiler.py itself would only need to throw in the LATERAL keyword probably only within visit_alias() for now, or even for every FROM clause in _compose_select_body() based on the .lateral flag, hmm. Either way is good for starters.

    The quick approach we tried was a subclass Lateral(Alias), this is still roughly what we'd be doing, though perhaps since this is SQL standard Alias itself can just have a flag lateral=True; .lateral would then be available on any FromClause and is usually False. This can also be added to the ORM-level aliased() construct. The rest of the ORM doesn't need to know anything about it.

    The absolutely best way to help me whip this out in no time is to write up tests. Right now the primary set of tests we have for core SQL compilation are in very poor shape, as these were the first tests that were ever written for SQLA back in 2005 when I was a terrible Python programmer and while they've been cleaned up a lot, they are still munged into giant monolithic methods inside of https://bitbucket.org/zzzeek/sqlalchemy/src/acda2e7d0b736e349c24905a359a4e765f829683/test/sql/test_compiler.py?at=master; I still need to break these up. So I wouldn't imitate those. A good series of SQL compiler tests to emulate are those in test_insert at https://bitbucket.org/zzzeek/sqlalchemy/src/acda2e7d0b736e349c24905a359a4e765f829683/test/sql/test_insert.py?at=master; e.g. each test tests just one thing, we try to use common fixtures. There's some other tests we would want to add into test/sql/test_generative but some basic SQL generation would be a good start.

  6. Martijn van Oosterhout

    Usually when PG adds support for some keyword that the world of SQL has done without just fine for decades I'm slow to jump on those.

    Well, it's an SQL:2003 feature and PostgreSQL was rather late to the party, some other databases long had a similar feature, it just wasn't called LATERAL. Anyway...

    Well the LATERAL keyword has no relationship to the JOIN keyword, so this feature wouldn't have any connection to the Join object or join methods.

    While from an SQL syntax point of view you are correct from a semantic point of view I don't agree. With LATERAL the order of the clauses in the FROM matters, so (LATERAL A) JOIN B means something completely different from B JOIN (LATERAL A). When you're doing generative query building like Query(...).filter(A.c.foo==B.c.bar) you need to have some way to indicate which is to the left and which is to the right. We do have a way, namely, joins which have an explicit left and right. We have outer LATERAL joins. Additionally the join infrastructure has stuff to find the relationship between the two sides automatically, so you don't have to specify it explicitly. Together with the correlation you have a powerful feature.

    You're probably right though that underwater it should be a flag on a FromClause though or something similar.

    The compiler would need to do a little bit more though, it has to suppress the correlated table in the FROM clause in the subquery, which I did by pushing the selectable on the stack. There might be a nicer way to do that though but I thought it rather elegant actually.

    You're right about writing up tests is important and I think I might start with that since especially on the correlation part there are lots of possibilities with respect to implicit and explicit correlations. I'll stay away from the .join() if I can but long term I think it would be really nice if something like the following could be made to work (not real SQLAlchemy, just to get the idea across):

    >>> subq = Query(B.count(), B.count(distinct=True)).subquery()
    >>> print Query(A, subq.count, subq.count_distinct).join(A, subq, lateral=True)
    SELECT A.*, C.count, C.count_distinct FROM A JOIN LATERAL (SELECT count(*), count(DISTINCT *) AS count_distinct FROM B WHERE A.fkey = B.pkey) AS C ON (TRUE)
    

    since that's the most natural way to use it.

  7. Mike Bayer repo owner

    While from an SQL syntax point of view you are correct from a semantic point of view I don't agree. With LATERAL the order of the clauses in the FROM matters, so (LATERAL A) JOIN B means something completely different from B JOIN (LATERAL A).

    Absolutely, the vast majority of users of this feature will be probably using it in conjunction with join(). However, the Query / select() / compiler objects only need to deal with syntax, and in that regard LATERAL is a modifier for the FROM; the JOIN keyword is not required. By keeping the feature local to all FROM objects, a SELECT that does not include the JOIN keyword like the example in https://wiki.postgresql.org/wiki/What's_new_in_PostgreSQL_9.3#LATERAL_JOIN which allows a freestanding subquery to join to the other element in the FROM list strictly through correlation; there's no JOIN keyword or even a WHERE clause there and I'm sure users of this feature might want to use that technique.

    The compiler would need to do a little bit more though, it has to suppress the correlated table in the FROM clause in the subquery, which I did by pushing the selectable on the stack. There might be a nicer way to do that though but I thought it rather elegant actually.

    it is nice, but the mechanics of correlation as far as what renders in the FROM of the inner query is handled in that one call, which is independent of the SQL compilation process. Breaking out the behavior of correlation into two different subsystems would make it harder to understand correlation.

    You're right about writing up tests is important and I think I might start with that since especially on the correlation part there are lots of possibilities with respect to implicit and explicit correlations. I'll stay away from the .join()

    well for the test suite we should have plenty of join() in there to make sure it works.

    long term I think it would be really nice if something like the following could be made to work

    We just move "lateral" to be specific to the subq, it is probably worth it to create a wrapper for the query that acts like a FromGrouping, so that we can say:

    >>> print Query(A, subq.count, subq.count_distinct).join(A, subq.lateral())
    
    # or 
    
    >>> print Query(A, subq.count, subq.count_distinct).join(A, lateral(subq))
    

    this is most like the LATERAL keyword itself, e.g. it modifies the FROM object itself. The user could also say:

    >>> print Query(A, subq.count, subq.count_distinct).select_from(A, lateral(subq))
    
  8. Martijn van Oosterhout

    I've got the test environment running and am going to see if I can get some useful test cases going.

    I think your syntax suggestions look good, I'll try to keep it in that style.

  9. Martijn van Oosterhout

    So I've been making a collection of test cases and so I've been trying to work out what syntax we'd like and what the output SQL should be. And trying stuff on PostgreSQL to see what works. And it's taking a little longer than I hoped. Here are some of my conclusions so far.

    Firstly, you can't just put LATERAL in front of anything you put in the FROM clause, it must be a subquery. So the following is invalid:

    # select * from a join lateral b  on (a.id=b.id);
    

    You have to write it like this:

    # select * from a join lateral (select * from b) as d on (a.id=d.id);
    

    Which makes sense if you consider that LATERAL is meaningless on plain tables. But does that mean that if the user does: Table1.join(lateral(Table2)) should SQLAlchemy automatically turn the right hand side into a subquery? How far should we go to make sense of the queries users type?

    Another thing is that you can't parenthesise the query the way you'd think, the following is invalid:

    # select * from a join (lateral (select * from b) as d) on (a.id=d.id);
    

    Which means you can't give a lateral an explicit alias (without wrapping it in another layer of subquery).

    You can get a parenthesis between the join and the lateral if the right hand side is a join, so this works:

    # select * from a join (
        lateral (select id as e_id from b) as e 
                join 
                (select id as f_id from b) as f 
                on e_id=f_id) as d 
        on (a.id=e_id);
    

    Except that the lateral is only applied to the first subquery (e) but not to (f). There doesn't appear to be any way to apply lateral to a join, only to individual subqueries. Which means (I think) that lateral() should probably automatically turn everything it touches into a subquery if it isn't one already.

    Similarly, lateral(lateral(x)) should either be rejected or pretend you only did it once.

    I think the conclusion is that in SQLAlchemy terms lateral can only be applied to classes derived from SelectBase and anything else needs to be turned into a subquery first.

  10. Mike Bayer repo owner

    But does that mean that if the user does: Table1.join(lateral(Table2)) should SQLAlchemy automatically turn the right hand side into a subquery?

    noooo..no no....yeah, let's not do that :) we've done a lot of that kind of thing, more in the ORM regarding mappings, and wow, does it suck.

    the SQL spec seems to indicate LATERAL can be against any table clause. While we can stick it just on Select() and friends to start, I don't know that it's a problem for it to be part of FromClause in general. It's OK that the compiler can generate SQL that's not valid on a backend as long as it matches the structure passed by the user.

    Another thing is that you can't parenthesise the query the way you'd think, the following is invalid:

    OK, well we'll do it such that LATERAL is rendered on the outside of a subquery, and again if someone tries to game it by sticking that in a subquery, it'll just fail against the backend.

    Which means (I think) that lateral() should probably automatically turn everything it touches into a subquery if it isn't one already.

    turning things into subqueries really complicates things, a lot. it means any time the user passes in that same table again to refer to it, we have to translate that table into the subquery we've turned it into. Again, the ORM does a lot of this kind of thing in places where it sort of has to, but in Core, it's way into the magical category. People that are using LATERAL are going to know exactly what they want to do so they should appreciate spelling it out in the Core language exactly. At most, we'd have it raise an error or not have lateral() present on Table, but again I think a user playing with LATERAL will just run it against PG, the query will fail, and they'll figure out how it needs to be.

    Similarly, lateral(lateral(x)) should either be rejected or pretend you only did it once.

    it'll be rejected. By the database. The theme here is, the database has a very sophisticated syntax checker. we don't have to reinvent it or guess around it.

  11. Mike Bayer repo owner

    OK well, if we do just this patch:

    diff --git a/lib/sqlalchemy/sql/compiler.py b/lib/sqlalchemy/sql/compiler.py
    index 52116a2..44120cf 100644
    --- a/lib/sqlalchemy/sql/compiler.py
    +++ b/lib/sqlalchemy/sql/compiler.py
    @@ -1276,6 +1276,28 @@ class SQLCompiler(Compiled):
             else:
                 return alias.original._compiler_dispatch(self, **kwargs)
    
    +    def visit_lateral(self, lateral, **kw):
    +
    +        toplevel = not self.stack
    +        entry = self._default_stack_entry if toplevel else self.stack[-1]
    +
    +        correlate_froms = entry['correlate_froms']
    +
    +        new_correlate_froms = set(selectable._from_objects(lateral))
    +        all_correlate_froms = new_correlate_froms.union(correlate_froms)
    +        new_entry = {
    +            'asfrom_froms': new_correlate_froms,
    +            'correlate_froms': all_correlate_froms,
    +            'selectable': lateral.element,
    +        }
    +        self.stack.append(new_entry)
    +
    +        text = "LATERAL %s" % self.visit_alias(lateral, **kw)
    +
    +        self.stack.pop(-1)
    +
    +        return text
    +
         def get_render_as_alias_suffix(self, alias_name_text):
             return " AS " + alias_name_text
    
    diff --git a/lib/sqlalchemy/sql/selectable.py b/lib/sqlalchemy/sql/selectable.py
    index 7334105..59a1b7a 100644
    --- a/lib/sqlalchemy/sql/selectable.py
    +++ b/lib/sqlalchemy/sql/selectable.py
    @@ -403,6 +403,9 @@ class FromClause(Selectable):
    
             return Alias(self, name)
    
    +    def lateral(self, name=None):
    +        return Lateral(self, name)
    +
         def is_derived_from(self, fromclause):
             """Return True if this FromClause is 'derived' from the given
             FromClause.
    @@ -1170,6 +1173,10 @@ class Alias(FromClause):
             return self.element.bind
    
    
    +class Lateral(Alias):
    +    __visit_name__ = 'lateral'
    +
    +
     class CTE(Generative, HasSuffixes, Alias):
         """Represent a Common Table Expression.
    

    then we can get the example expression in https://wiki.postgresql.org/wiki/What's_new_in_PostgreSQL_9.3#LATERAL_JOIN to happen:

    from sqlalchemy import *
    
    base = select([func.generate_series(1, 10).label('nr')]).alias('base')
    multiples = select([
        func.generate_series(1, 10).label('b_nr'),
        (base.c.nr * 2).label('multiple')
    ]).correlate(base).alias('multiples')
    
    lm = select([multiples.c.multiple]).\
        where(multiples.c.b_nr == base.c.nr).correlate(base).lateral('multiples')
    
    stmt = select([
        base.c.nr, lm.c.multiple
    ])
    
    print stmt
    

    the output seems to match:

    SELECT base.nr, multiples.multiple 
    FROM (SELECT generate_series(:generate_series_1, :generate_series_2) AS nr) AS base, LATERAL (SELECT multiples.multiple AS multiple 
    FROM (SELECT generate_series(:generate_series_3, :generate_series_4) AS b_nr, base.nr * :nr_1 AS multiple) AS multiples 
    WHERE multiples.b_nr = base.nr) AS multiples
    

    now what I don't like is that I have to call correlate() explicitly, and I tried a few ways of getting "auto" correlation to work here, but I haven't figured that out yet. At least for the inner query, auto-correlation for 'base' should take place there.

    but other than that, the tests and docs, this would be the patch.

  12. Mike Bayer repo owner

    so in 8d93e436795827689944a9b0b332b9292f73 I have a simpler patch that seems to fulfill this.

    a demo of the two straightforward LATERAL examples seems to work perfectly:

    from sqlalchemy import *
    
    
    def test_demo_one():
        """
        the query at https://wiki.postgresql.org/wiki/What's_new_in_PostgreSQL_9.3#LATERAL_JOIN:
    
        SELECT base.nr,
               multiples.multiple
          FROM (SELECT generate_series(1,10) AS nr) base,
        LATERAL (
          SELECT multiples.multiple FROM
           ( SELECT generate_series(1,10) AS b_nr, base.nr * 2 AS multiple ) multiples
            WHERE multiples.b_nr = base.nr
          ) multiples;
    
        output:
    
        SELECT base.nr, multiples.multiple
        FROM (SELECT generate_series(:generate_series_1, :generate_series_2) AS nr) AS base,
        LATERAL (SELECT multiples.multiple AS multiple
        FROM (SELECT generate_series(:generate_series_3, :generate_series_4) AS b_nr, base.nr * :nr_1 AS multiple) AS multiples
        WHERE multiples.b_nr = base.nr) AS multiples
    
        """
    
        base = select([func.generate_series(1, 10).label('nr')]).alias('base')
        multiples = select([
            func.generate_series(1, 10).label('b_nr'),
            (base.c.nr * 2).label('multiple')
        ]).correlate(base).alias('multiples')
    
        lm = select([multiples.c.multiple]).\
            where(multiples.c.b_nr == base.c.nr).lateral('multiples')
    
        stmt = select([
            base.c.nr, lm.c.multiple
        ])
    
        print "---------------------"
        print stmt
    
    
    def test_demo_two():
        """inklesspen's query:
    
        SELECT *
        FROM (SELECT company.id AS id
        FROM company) AS first JOIN LATERAL (SELECT count(member.id) AS count_1
        FROM member
        WHERE member.company_id = first.id) AS second ON TRUE
    
    
        output:
    
        SELECT first.id, second.count_1
        FROM (SELECT company.id AS id
        FROM company) AS first JOIN LATERAL (SELECT count(member.id) AS count_1
        FROM member
        WHERE member.company_id = first.id) AS second ON 1
    
        """
    
        m = MetaData()
        company = Table('company', m, Column('id', Integer))
        member = Table('member', m, Column('id', Integer), Column('company_id', Integer))
    
        first = select([company.c.id]).alias('first')
        second = select([func.count(member.c.id)]).where(member.c.company_id == first.c.id).lateral('second')
        stmt = select([first, second]).select_from(first.join(second, true()))
    
        print "---------------------"
        print stmt
    
    test_demo_one()
    test_demo_two()
    

    @martijn_van_oosterhout if you still have interest in writing tests, let me know. This can be in 1.1. I'll eventually write some tests for it otherwise and also i need to write a migration note for it.

  13. Martijn van Oosterhout

    Wow, you've made a lot of progress in a short period. Yes, I think the Select._get_display_froms() is the right place.

    Back to the SQL standard, I have a copy of SQL:2003 and it has the following productions:

    <lateral derived table> ::= LATERAL <table subquery>
    <table subquery> ::= <subquery>
    <subquery> ::= <left paren> <query expression> <right paren>
    

    Which is pretty strict. But that just leads to the inverse problem, namely that PostgreSQL supports more than just subqueries, in particular, set returning functions. So I'm pretty on board with your idea of just supporting the syntax and letting the server do the checking.

    I'll push what I have shortly. It's not complete, especially the ORM part but it should be enough to show you what I was thinking. It contains a lot of stuff that it SQL that'd be rejected by a server, but we don't test against a real server, so... I think in the final commit you'd probably want less cases, since I was really just trying out all the possibilities.

  14. Martijn van Oosterhout

    Ok I've pushed: https://bitbucket.org/martijn_van_oosterhout/sqlalchemy/commits/a03db2278442c89f2ee2c3c9a0a643fa0b1187ee

    It's not synced with your branch, but your implementation is different anyway, so that's not a big deal. The interesting file with the tests is this one:

    https://bitbucket.org/martijn_van_oosterhout/sqlalchemy/src/a03db2278442c89f2ee2c3c9a0a643fa0b1187ee/test/sql/test_lateral.py?at=lateral_tests

    There you can see how far I got...

  15. Martijn van Oosterhout

    Ok, I've pushed my test cases here: https://bitbucket.org/martijn_van_oosterhout/sqlalchemy/commits/87e73c93e973a0268d2a6b7f442cad1ff4025e37

    Unfortunately, due to the merge with your changes they all fail. However, before I go and spend a lot of effort trying to work out why could you look over the test cases to see whether you agree with the syntax choices. There are some that are debatable. For example:

    • select_from() doesn't preserve order. Fixing this is probably a PITA and I can imagine you'd prefer to say: don't use LATERAL within select_from()
    • I have a lateral() method that does two things. Without arguments it wraps the query in a lateral (acts like alias). If you provide a selectable it does a join, wrapping the argument in a lateral. I think this is the natural way to use it, but it could be considered a wart in the API. There's alias/aliased, but lateral/lateraled just seems wrong. You could put the second behaviour in a lateraljoin() method.
    • There's the idea of automatically determining the join condition but that seems fiddly and I don't think I'd mind if that didn't work the first version.
    • What is currently output is A JOIN LATERAL B ON TRUE but arguably A CROSS JOIN LATERAL B would be nicer, except that SQLAlchemy currently doesn't support explicit cross joins like that.

    Hope this helps,

  16. Mike Bayer repo owner

    take a look at 781c86bf308cbff76e0e. This proposes a new method .join_lateral() that I think accomplishes what you are looking for. As far as how to do this in the ORM, it might be a little involved because we want to make use of Query's more elaborate "onclause" feature.

  17. Mike Bayer repo owner

    except....it doesn't work if we want to include that "right" side in the enclosing SELECT....ok

            subq = select([table2.c.book_id])
            stmt = select([table1, subq.c.book_id]).select_from(
                table1.join_lateral(subq)
            )
    

    that can't work, because subq is redefined. I know that in your approach, you're pushing the whole "JOIN LATERAL" into the compiler, but this is too magical. let's see if i can think of something more explicit...

  18. Mike Bayer repo owner

    the PG docs describe so-called "LATERAL JOIN" here as "enables a subquery in the FROM part of a clause to reference columns from preceding items in the FROM list. ". That is, we are really thinking of subqueries here when we choose to use LATERAL, not that we are thinking only of JOINs with some extra syntax in them. This supports the notion that we should not seek to augment JOIN and similar with automatic lateralization. Lateralization as a transparent performance enhancement makes this vastly more complicated; we'd have to support people doing query.join(A, lateral=True).join(B, lateral=True).join(C, lateral=True), and I don't even know what that would generate.

  19. Martijn van Oosterhout

    The LATERAL JOIN is a special kind of join, though it shares a lot of characteristics of the normal joins (INNER/OUTER/LEFT/RIGHT/CROSS). This argues for me that there should be a .join_lateral() method that does it and something like join(A, lateral=True) is wrong somehow. Similarly the lateral() wrapper could the also be dispensed with because the only place LATERAL makes sense is with a JOIN.

    Your example should I think compile to A CROSS JOIN LATERAL B CROSS JOIN LATERAL C or something similar. Which is actually not valid syntax. You could argue that they should be made into subqueries automatically but I think that pushes it way out of what you can expect .join() to do, but for a .lateral_join() it would be fine. It could probably even error out on plain tables. I know you don't agree with making the LATERAL JOIN a separate compiler node, it still feels like the best place for me.

    Another thing: suppose you are writing a query transformer that needs to understand about joins, if you don't make it part of the join node I fear you're going to end up with a lot of code doing if join.right.lateral, or simply forgetting to handle the case. By having it as a separate node then it will make it much more obvious.

  20. Mike Bayer repo owner

    So, first off I don't think it's an option to not have a JOIN-independent LATERAL construct. There is no such thing as "LATERAL JOIN" and the JOIN/LATERAL keywords have no syntactical relationship to each other. PG's own documentation confirms this:

    A LATERAL item can appear at top level in the FROM list, or within a JOIN tree. In the latter case it can also refer to any items that are on the left-hand side of a JOIN that it is on the right-hand side of.

    While PG has a bias towards the JOIN keyword being in play, it isn't required, and note we are implementing this as part of standard SQL in any case. What do other database backends that either support LATERAL right now or will in the future support? Do they have the same emphasis on the JOIN keyword? I don't see why we we take the risk of making that assumption.

    So the proposal is at best, that there are two separate constructs accepted by the compiler: join(a, lateral(b)) and a hypothetical JoinLateral construct.

    JoinLateral would try to do what join() does, in that if you give it "a" and "b", it figures out the "on" clause automatically. So the compiler would be tasked with looking at the subquery, figuring out if it already has a WHERE clause or not, then adding on that WHERE clause at compile time (what if the subquery already has the joining criteria in its WHERE clause, wouldn't that be confusing to users if they see it duplicated?). Or we could, as my previous patch tried, modify the given "b" in place ahead of compilation, but this gets extremely complicated as we need to somehow maintain both our modified version and the user-supplied version at the same time, which really does not seem worth it for an edge feature like this.

    There are very few cases where the compiler actually makes structural alterations to queries and they are always extremely difficult to develop, test, and maintain, they are super-last-resort approaches. Support for Oracle's lack of LIMIT/OFFSET and SQLite's lack of support for "a JOIN (b JOIN c)" are the prime examples; but these structural changes are to support extremely common and useful SQL features on all other backends while not excluding these backends with special limitations. The LATERAL story here is more about adding a convenience feature and that just doesn't seem like its worth new guessing features (dynamic WHERE alteration) and assumptions (user did / did not correlate the subquery already).

    At the ORM level, "join_lateral()" concerns me a lot, as query.join() is already extremely elaborate, and a new core-level "join" style means users would be looking for this new thing to adapt all the features that query.join() already has, and that would be an enormous undertaking.

    Basically, given inklesspen's original case, we're looking at the difference between:

        first = sa.select([company.c.id]).alias('first')
        second = sa.select([sa.func.count(member.c.id)])
    
        stmt = select([first, second]).select_from(first.join_lateral(second))
    

    and:

        first = sa.select([company.c.id]).alias('first')
        second = sa.select([sa.func.count(member.c.id)]).correlate_from(first).lateral()
    
        stmt = select([first, second]).select_from(first.join(second))
    

    The first version introduces a lot more complexities and unknowns, as well as new behaviors that are not at all idiosyncratic for the Core /compiler. The second works easily and we don't even have to add anything to the ORM.

    Considering that doing this the simple and straightforward way doesnt even preclude someday adding a more automatic "join_lateral()" construct I'd like to just make the basic cases such as that of inklesspen's possible and be done with it for now.

  21. Jeff Widman

    Found this researching for whether SQLA supported PG's LATERAL JOIN.

    Personally I'd vote for keeping it as simple as possible, just put the basic syntax in Core and don't worry about making it super convenient or smart/magical.

    If I'm muddling around in the depths of SQL enough to use an advanced feature like a LATERAL JOIN, in those situations by the time I try to write it in Sqlalchemy, I'll know exactly the query I want to write, I've tested it in psql, and now just trying to translate it to sqlalchemy code... the more explicit the syntax the easier it is to translate from SQL to SQLAlchemy, even if it's a little more wordy.

  22. Log in to comment