Unable to access outside tables within subqueries in FROM

Issue #4019 closed
Volker Diels-Grabsch created an issue

Wrong SQL code is produced when a sub query appears within a FROM clause and tries to access an outside table.

Usually a sub query is able to access outside tables. However, when a sub query appears within a FROM clause, it is not able to access outside tables. Instead an unwanted additional instance of the table is added to the sub query.

For example, consider two tables a and b with a sub query e. When sticking these together to query, the outer table a should be used in the sub query e:

from sqlalchemy import *

a = table('a', column('w'), column('x'))
b = table('b', column('y'), column('z'))

e = select([b.c.y]).where(a.c.x == b.c.z).alias('e')

query = select([
    a.c.w,
    e.c.y.label('f'),
])

print(str(query))

This produces the following wrong SQL code with an unwanted inner select from a:

SELECT a.w, e.y AS f
FROM a, (SELECT b.y AS y
FROM b, a  -- ERROR
WHERE a.x = b.z) AS e

Expected SQL code:

SELECT a.w, e.y AS f
FROM a, (SELECT b.y AS y
FROM b  -- OK
WHERE a.x = b.z) AS e

For comparison, when putting the sub select into the SELECT clause rather than the FROM clause, everything works fine:

from sqlalchemy import *

a = table('a', column('w'), column('x'))
b = table('b', column('y'), column('z'))

e = select([b.c.y]).where(a.c.x == b.c.z).alias('e')

query = select([
    a.c.w,
    e.as_scalar().label('f'),  # Difference to first example
])

print(str(query))

This produces correct SQL code without the unwanted inner select from a:

SELECT a.w, (SELECT b.y 
FROM b  -- OK 
WHERE a.x = b.z) AS v 
FROM a

Of course, in our more complex production code, the sub select cannot be easily moved from the FROM to the SELECT clause. The example above is just the minimal code I came up with that reproduces the issue.

For the sake of completeness, this code is intended to run on PostgreSQL >= 9.2. However, the dialect doesn't change anything in this bug report. In all examples, str(query) produces the same SQL code as str(query.compile(dialect=postgresql.dialect())).

Comments (4)

  1. Mike Bayer repo owner

    Hi -

    there's no bug illustrated here. The "Expected SQL code" posted is invalid SQL and will fail as written on all database backends including Postgresql. PG actually provides the nicest error message too:

    (psycopg2.ProgrammingError) invalid reference to FROM-clause entry for table "a"
    LINE 1: ...w, e.y AS f FROM a, (SELECT b.y AS y FROM b WHERE a.x = b.z)...
                                                                 ^
    HINT:  There is an entry for table "a", but it cannot be referenced from this part of the query.
     [SQL: 'SELECT a.w, e.y AS f FROM a, (SELECT b.y AS y FROM b WHERE a.x = b.z) AS e']
    

    Correlation is not allowed in the FROM clause to the immediate parent SELECT unless you are using Postgresql's LATERAL which we support.

  2. Volker Diels-Grabsch reporter

    It seems I simplified the example query too much. I'll provide a slightly more complex query to demonstrate the issue.

  3. Mike Bayer repo owner

    OK, are you trying to correlate the "a" table inside the subquery to a SELECT that is itself enclosing the parent query?

    e.g.:

    SELECT a.x, (
       SELECT b.q FROM b, (SELECT c.id FROM c WHERE c.id = a.id) AS e WHERE b.id = e.id
    ) FROM a
    

    we support that as well, however you need to use select.correlate():

    When Select.correlate() is used to apply specific FROM clauses for correlation, the FROM elements become candidates for correlation regardless of how deeply nested this Select object is, relative to an enclosing Select which refers to the same FROM object. This is in contrast to the behavior of “auto-correlation” which only correlates to an immediate enclosing Select. Multi-level correlation ensures that the link between enclosed and enclosing Select is always via at least one WHERE/ORDER BY/HAVING/columns clause in order for correlation to take place.

  4. Log in to comment