allow correlation of FROM elements to any SELECT which encloses via WHERE, columns, ORDER BY (but not FROM list)

Issue #2746 resolved

Sok Ann Yap created an issue 2013-06-05

This is related to the mailing list post: https://groups.google.com/forum/?fromgroups#!topic/sqlalchemy/Ld0iuCUphHU

It seems like ~~#2668~~ has universally disabled correlation for subquery inside a FROM clause. The situations where such expression is needed could be quite rare, but it is sometimes necessary to do so, especially for lesser databases such as SQL Server. An example would be this:

http://stackoverflow.com/questions/5786739/porting-postgresql-string-agg-to-sql-server-problem-with-order-by

Another example would be a subquery that returns a couple of computed values, with an outer query consists entirely of a CASE statement in its select list that makes use of the values.

(Note that these examples are used in the context of hybrid / column property)

I think it would be good to add a flag to select() to enforce the correlation when necessary.

Comments (13)

Sok Ann Yap reporter
- attached sqlalchemy-correlate-asfrom.diff
A diff to add flag correlate_asfrom (default to False) to select()
- 2013-06-05T23:55:40+00:00
Mike Bayer repo owner
- marked as major
- marked as enhancement
- changed milestone to 0.9.0
well the funny thing here is that we've had a long-standing comment in the compiler, regarding the fact that the list of "Existing froms" we pass in is only from the immediate enclosing SELECT, not the SELECT that's enclosing that one. I think at least we know we can take that out, since that only applies to auto-correlation and I think we can agree auto-correlation out of the FROM clause is not necessary. This is the case where you say, "omit these elements from the FROM clause, unconditionally".

I'd rather make this specific to each FROM for the "correlate" case. See the attached patch.
```
t1 = table('t1', column('a'))
t2 = table('t2', column('a'), column('b'))
s = select(
    [t2.c.b](t2.c.b),
    t1.c.a == t2.c.a,
).correlate_except(t2, froms=True).alias('s')
s2 = select([func.foo(s.c.b)](func.foo(s.c.b))).as_scalar()
s3 = select([t1](t1), order_by=s2)

print s3
```
- 2013-06-06T14:11:15+00:00
Mike Bayer repo owner
dont know what the flag should be named though. maybe "within_from_ok".
- 2013-06-06T14:13:20+00:00
Sok Ann Yap reporter
Thanks, patch works great. "within_from_ok" sounds about right to me (disclaimer: I suck at naming things).
- 2013-06-06T19:39:29+00:00

Mike Bayer repo owner

marked as blocker
changed milestone to 0.8.xx

See also ~~#2668~~ which we'll close as a dupe here, we need to consider propagating the correlation list recursively:

--- a/lib/sqlalchemy/sql/compiler.py
+++ b/lib/sqlalchemy/sql/compiler.py
@@ -1241,6 +1241,8 @@ class SQLCompiler(engine.Compiled):
         froms = select._get_display_froms(existingfroms, asfrom=asfrom)

         correlate_froms = set(sql._from_objects(*froms))
+        if existingfroms:
+            correlate_froms = correlate_froms.union(existingfroms)


         self.stack.append({'from': correlate_froms,

here's the test case for that:

from sqlalchemy.sql import table, column, select, exists

p = table('parent', column('id'))
c = table('child', column('id'), column('parent_id'), column('pos'))

s = c.select().where(c.c.parent_id == p.c.id).order_by(c.c.pos).limit(1)
s = s.correlate(p, froms=True)
s = exists().select_from(s).where(s.c.id == 1)
s = select([p](p)).where(s)
print s

we want:

SELECT parent.id 
FROM parent 
WHERE EXISTS (SELECT * 
FROM (SELECT child.id AS id, child.parent_id AS parent_id, child.pos AS pos 
FROM child 
WHERE child.parent_id = parent.id ORDER BY child.pos
 LIMIT :param_1) 
WHERE id = :id_1)

and not:

SELECT parent.id 
FROM parent 
WHERE EXISTS (SELECT * 
FROM (SELECT child.id AS id, child.parent_id AS parent_id, child.pos AS pos 
FROM child, parent 
WHERE child.parent_id = parent.id ORDER BY child.pos
 LIMIT :param_1) 
WHERE id = :id_1)

I'm a little concerned about propagating the FROMs all the way in like that but no tests fail, and I'm getting the impression that when you have "SELECT ( SELECT ( SELECT ) ) ", those selects are typically going to be nested in the FROM clause and this new "froms" flag is a nice way of ensuring it doesn't happen by default.

updated patch is attached and i think we have to make this 0.8 also since people from 0.7 are getting hit by it.

2013-06-20T15:33:01+00:00

Mike Bayer repo owner
from that change we also can fix correlate_except, which also doesn't do the right thing here:
```
from sqlalchemy.sql import table, column, select, exists

t1 = table('t1', column('x'))
t2 = table('t2', column('y'))
t3 = table('t3', column('z'))

s = select([t1](t1)).where(t1.c.x == t2.c.y).where(t2.c.y == t3.c.z).correlate_except(t1)

print s
```
we want:
```
SELECT t1.x 
FROM t1, t2, t3 
WHERE t1.x = t2.y AND t2.y = t3.z
```
and not
```
SELECT t1.x 
FROM t1 
WHERE t1.x = t2.y AND t2.y = t3.z
```
a similar pattern to the latter is called "backwards correlated" in test_compiler...but its wrong, and I'd like to be able to add other FROM clauses to an any() or has().

This is getting pretty changy. I don't think there's a real-world use for "backwards correlated" and correlate_except() is fairly new so hopefully this is safe for 0.8 still.
- 2013-06-25T17:24:13+00:00
Mike Bayer repo owner
this is in the ticket_2746 branch for now and also in the attached patch.
- 2013-06-25T18:07:12+00:00
Mike Bayer repo owner
- marked as bug
- 2013-06-25T18:07:21+00:00
Mike Bayer repo owner
- changed title to allow correlation of FROM elements to any SELECT which encloses via WHERE, columns, ORDER BY (but not FROM list)
good news. We don't need a flag. the latest patch has the compiler figure out what the legal "correlations" should be, so all these scenarios just work.

this is a pretty big change but 0.8's current behavior is pretty broken.
- 2013-06-26T02:11:13+00:00
Sok Ann Yap reporter
Just tested 1b6b2fb8d2ea and everything just works indeed. Nice.
- 2013-06-26T10:10:12+00:00
Mike Bayer repo owner
I'm going to try taking it one step further, and make it so that if you dont actually use correlate()/correlate_except(), the default auto-correlation will work the old way, i.e. just for the immediate enclosing SELECT via WHERE/etc. This is to minimize the chance of someone's query behaving differently since I want this in 0.8 - it will act very close to the 0.7 behavior.
- 2013-06-26T11:20:32+00:00
Mike Bayer repo owner
- changed status to resolved
f76cae4bc92da640c1 master b4697e9e187571b06 0.8
- 2013-06-26T13:39:10+00:00
Mike Bayer repo owner
- removed milestone
Removing milestone: 0.8.xx (automated comment)
- 2014-08-25T20:48:26+00:00
Log in to comment

Assignee: Mike Bayer

Type: bug

Priority: blocker

Status: resolved

Component: sql

Milestone: –

Votes: 0

Watchers: 1