allow correlation of FROM elements to any SELECT which encloses via WHERE, columns, ORDER BY (but not FROM list)

Issue #2746 resolved
Sok Ann Yap created an issue

This is related to the mailing list post: https://groups.google.com/forum/?fromgroups#!topic/sqlalchemy/Ld0iuCUphHU

It seems like #2668 has universally disabled correlation for subquery inside a FROM clause. The situations where such expression is needed could be quite rare, but it is sometimes necessary to do so, especially for lesser databases such as SQL Server. An example would be this:

http://stackoverflow.com/questions/5786739/porting-postgresql-string-agg-to-sql-server-problem-with-order-by

Another example would be a subquery that returns a couple of computed values, with an outer query consists entirely of a CASE statement in its select list that makes use of the values.

(Note that these examples are used in the context of hybrid / column property)

I think it would be good to add a flag to select() to enforce the correlation when necessary.

Comments (13)

  1. Mike Bayer repo owner

    well the funny thing here is that we've had a long-standing comment in the compiler, regarding the fact that the list of "Existing froms" we pass in is only from the immediate enclosing SELECT, not the SELECT that's enclosing that one. I think at least we know we can take that out, since that only applies to auto-correlation and I think we can agree auto-correlation out of the FROM clause is not necessary. This is the case where you say, "omit these elements from the FROM clause, unconditionally".

    I'd rather make this specific to each FROM for the "correlate" case. See the attached patch.

    t1 = table('t1', column('a'))
    t2 = table('t2', column('a'), column('b'))
    s = select(
        [t2.c.b](t2.c.b),
        t1.c.a == t2.c.a,
    ).correlate_except(t2, froms=True).alias('s')
    s2 = select([func.foo(s.c.b)](func.foo(s.c.b))).as_scalar()
    s3 = select([t1](t1), order_by=s2)
    
    print s3
    
  2. Sok Ann Yap reporter

    Thanks, patch works great. "within_from_ok" sounds about right to me (disclaimer: I suck at naming things).

  3. Mike Bayer repo owner

    See also #2668 which we'll close as a dupe here, we need to consider propagating the correlation list recursively:

    --- a/lib/sqlalchemy/sql/compiler.py
    +++ b/lib/sqlalchemy/sql/compiler.py
    @@ -1241,6 +1241,8 @@ class SQLCompiler(engine.Compiled):
             froms = select._get_display_froms(existingfroms, asfrom=asfrom)
    
             correlate_froms = set(sql._from_objects(*froms))
    +        if existingfroms:
    +            correlate_froms = correlate_froms.union(existingfroms)
    
    
             self.stack.append({'from': correlate_froms,
    

    here's the test case for that:

    from sqlalchemy.sql import table, column, select, exists
    
    p = table('parent', column('id'))
    c = table('child', column('id'), column('parent_id'), column('pos'))
    
    s = c.select().where(c.c.parent_id == p.c.id).order_by(c.c.pos).limit(1)
    s = s.correlate(p, froms=True)
    s = exists().select_from(s).where(s.c.id == 1)
    s = select([p](p)).where(s)
    print s
    

    we want:

    SELECT parent.id 
    FROM parent 
    WHERE EXISTS (SELECT * 
    FROM (SELECT child.id AS id, child.parent_id AS parent_id, child.pos AS pos 
    FROM child 
    WHERE child.parent_id = parent.id ORDER BY child.pos
     LIMIT :param_1) 
    WHERE id = :id_1)
    

    and not:

    SELECT parent.id 
    FROM parent 
    WHERE EXISTS (SELECT * 
    FROM (SELECT child.id AS id, child.parent_id AS parent_id, child.pos AS pos 
    FROM child, parent 
    WHERE child.parent_id = parent.id ORDER BY child.pos
     LIMIT :param_1) 
    WHERE id = :id_1)
    

    I'm a little concerned about propagating the FROMs all the way in like that but no tests fail, and I'm getting the impression that when you have "SELECT ( SELECT ( SELECT ) ) ", those selects are typically going to be nested in the FROM clause and this new "froms" flag is a nice way of ensuring it doesn't happen by default.

    updated patch is attached and i think we have to make this 0.8 also since people from 0.7 are getting hit by it.

  4. Mike Bayer repo owner

    from that change we also can fix correlate_except, which also doesn't do the right thing here:

    from sqlalchemy.sql import table, column, select, exists
    
    t1 = table('t1', column('x'))
    t2 = table('t2', column('y'))
    t3 = table('t3', column('z'))
    
    s = select([t1](t1)).where(t1.c.x == t2.c.y).where(t2.c.y == t3.c.z).correlate_except(t1)
    
    print s
    

    we want:

    SELECT t1.x 
    FROM t1, t2, t3 
    WHERE t1.x = t2.y AND t2.y = t3.z
    

    and not

    SELECT t1.x 
    FROM t1 
    WHERE t1.x = t2.y AND t2.y = t3.z
    

    a similar pattern to the latter is called "backwards correlated" in test_compiler...but its wrong, and I'd like to be able to add other FROM clauses to an any() or has().

    This is getting pretty changy. I don't think there's a real-world use for "backwards correlated" and correlate_except() is fairly new so hopefully this is safe for 0.8 still.

  5. Mike Bayer repo owner

    I'm going to try taking it one step further, and make it so that if you dont actually use correlate()/correlate_except(), the default auto-correlation will work the old way, i.e. just for the immediate enclosing SELECT via WHERE/etc. This is to minimize the chance of someone's query behaving differently since I want this in 0.8 - it will act very close to the 0.7 behavior.

  6. Log in to comment