detect illegal usage of exported columns being passed back into generative methods especially with_only_columns; document expected usage

Issue #2319 resolved
Former user created an issue

I've got a moderately complex 3-way join against a sort of EAV table.

It was working fine with sqlalchemy 0.7.2, with both sqllite and mysql.

When I deployed it to a production setting, I started getting the dreaded 'Every derived table must have its own alias' error from mysql.

It turns out that 0.7.3 produces very different SQL. I hope this is not by design, but I can't really be sure.

I isolated it in sareport.py ( https://gist.github.com/1349445 and attached); run it a la: python -m doctest sareport.py. It passes with 0.7.2 and fails with 0.7.3.

I took a quick look at the 0.7.3 release notes, but nothing jumped out at me that would explain what's going on.

Comments (11)

  1. Mike Bayer repo owner
    • changed milestone to 0.7.4
    • assigned issue to
    • changed component to orm
    • marked as blocker

    This is a slightly insane amount of code for me to break down and understand. As I'd like to save 90 minutes of intense focusing to understand the issue, any chance you can just illustrate, very simply, what a "three way join" is exactly, remove all the superfluous things here, like group bys, and_s, func.counts, with_only_columns if that isn't needed etc. ? Basically, remove elements of complexity, re-run, verify that erroneous behavior remains, repeat. This seems like a structural issue so a couple of columns and some simple selects should be sufficient, and this process will also identify where things go wrong.

    A regression between 0.7.2 and 0.7.3 is also a serious issue if confirmed.

  2. Mike Bayer repo owner

    for example. Here is a script which produces different results on 0.7.2 and 0.7.3:

    from sqlalchemy import *
    
    rd = Table('rd', MetaData(),
        Column('project_id', Integer),
        Column('record', Integer)
    )
    
    a = rd.alias('a')
    b = rd.alias('b')
    
    j = a.join(b, a.c.record == b.c.record).alias('c').select()
    
    print j.with_only_columns((j.c.a_record,))
    

    so there's some change in behavior of with_only_columns. So far it looks like 0.7.3 is doing the right thing as you are selecting from an alias.

  3. Mike Bayer repo owner

    So there's not really a bug here, though there's a section where it's hard to detect that you're doing the wrong thing. Basically with_only_columns is against the columns you pass to the select(), not the columns exported by the select. Unfortunately there's no quick way for me to detect that, will try. You get a better query, identical in both versions, if you do it correctly:

        j = decision.join(candidate,
                          and_(candidate.c.record == decision.c.record,
                               candidate.c.project_id == decision.c.project_id)
                          ).outerjoin(dt_exp, and_(
                dt_exp.c.record == decision.c.record,
                dt_exp.c.project_id == decision.c.project_id)
                                      ).alias('cdwho')
    
        j2 = j.select()
        cdwho = j2.with_only_columns((j.c.cd_record.label('record'),
                                     j.c.cd_decision.label('decision'),
                                     j.c.who_userid.label('candidate'),
                                     j.c.expire_dt_exp.label('dt_exp')))
    
  4. Mike Bayer repo owner

    another interaction illustrating 0.7.3 is consistent with itself:

    >>> from sqlalchemy.sql import table, column, select
    >>> a = table('a', column('x'))
    >>> b = table('b', column('y'))
    
    >>> # a join
    >>> j = a.join(b, a.c.x == b.c.y)
    >>> print j
    a JOIN b ON a.x = b.y
    
    >>> # a select of the join
    >>> j2 = j.select()
    >>> print j2
    SELECT a.x, b.y 
    FROM a JOIN b ON a.x = b.y
    
    >>> # a select of the select of the join
    >>> print select([j2.c.x](j2.c.x))
    SELECT x 
    FROM (SELECT a.x AS x, b.y AS y 
    FROM a JOIN b ON a.x = b.y)
    
    >>> # a select from j2 as well as j at the same time
    >>> print select([j2.c.x](j2.c.x)).select_from(j)
    SELECT x 
    FROM (SELECT a.x AS x, b.y AS y 
    FROM a JOIN b ON a.x = b.y), a JOIN b ON a.x = b.y
    
    >>> # equivalent operation using with_only_columns
    >>> print j2.with_only_columns([j2.c.x](j2.c.x))
    SELECT x 
    FROM (SELECT a.x AS x, b.y AS y 
    FROM a JOIN b ON a.x = b.y), a JOIN b ON a.x = b.y
    
    >>> # with_only_columns can be used to replace the "inside",
    >>> # need to send it those
    >>> print j2.with_only_columns([j.c.a_x](j.c.a_x))
    SELECT a.x 
    FROM a JOIN b ON a.x = b.y
    
  5. Mike Bayer repo owner

    this one is very tough as when with_only_columns is called, it's exceedingly difficult to figure out that those are the same columns coming in.

  6. Log in to comment