render 'SELECT foo() AS x .. ORDER BY x' by default

Issue #1068 resolved
jek created an issue

If an expression present in the select list is used elsewhere in the query (ORDER BY, GROUP BY, etc.), render it as the column name (label) by default. Unlabeled non-column expressions re-used in ORDER BY can be rendered positionally as integers. Oracle can override and render expanded expressions.

Comments (21)

  1. Mike Bayer repo owner

    just to be clear, we're talking about:

    x = func.foo().label('x')
    select([x](x)).order_by(x)
    

    and not:

    select([func.foo()](func.foo())).order_by(func.foo())
    

    ?

    (do we want select([func.foo.label('x')](func.foo.label('x'))).order_by('x') to do it ? seems the most succinct way....since its 0.5 we're fine changing contract a little bit)

  2. jek reporter

    strings seem to work already with the caveat that you need to do your own quoting on reserved identifiers. or were you suggesting something smarter that does column inspection and quoting?

    i'm thinking maybe like this w/ basic reused references: (Postgres/Firebird)

    >>> cl = func.char_length('abc').label('cl')
    >>> print select([cl](cl)).where(cl > 2).group_by(cl).having(cl > 2).order_by(cl)
    SELECT char_length('abc') AS cl
      WHERE char_length('abc') > 2
      GROUP BY cl
      HAVING char_length('abc') > 2
      ORDER BY cl
    

    sqlite (alone?) supports 'cl' in WHERE. mysql and sqlite will both use 'cl' in HAVING, pg & firebird won't.

  3. Mike Bayer repo owner

    OK the "string" use case is probably not going to fly, since we have this general contract that select().call_something('some string') is going to just interpret the string as a text() clause. There's just as much chance that someone says order_by("CASE when x1>x2 THEN 2 ELSE 0") as they would say order_by('label1'). It would get out of hand to say that where(), order_by(), column() etc. are all going to have wildly different ideas of what a "string" means.

    So yeah having c1 = func.foo().label('c1') just render as the label in all cases except columns clause and overriding dialects is not a terribly big deal. Involves some different semantics regarding the render_labels argument to visit_label(), probably including a name change. still would be good at 0.5.0 to establish the new pattern.

  4. Mike Bayer repo owner

    the approach here is described in detail in edf6b16fae38b4c103ed2827ee5448fec2fdcb1a, and is implemented for ORDER BY only. Postgres' behavior was used as the guide here and it seems like order by <some simple label name> has special meaning versus the label name being used in arbitrary expressions or in the WHERE clause.

  5. Mike Bayer repo owner

    you need to show me the use case for order_by(somelabel) when "somelabel" is not present in the select list. This is specifically something that happens with ORM, correct ?

  6. jek reporter

    it can happen anywhere where a user makes a bunch of expression objects and uses them naturally as building blocks, like fullname = concat(tbl.c.fname, tbl.c.lname).label('fullname'). orm column_properties also have the same issue.

  7. Mike Bayer repo owner

    If you're using the SQL construction lib, you have to put fullname into the ORDER BY. If you put a label() object inside of ORDER BY, it should render the label unconditionally, why should the compiler try to second guess what you did ? That seems like unnecessary work to be done.

    for the ORM scenario it seems like we'd specifically prevent the ORM from putting labels into ORDER_BY.

    If my above points don't fly, then I'm heavily leaning towards rolling back the feature and closing the ticket at this point, unless someone else wants to take over and can ensure that no function call overhead is added to compiler.py. For the change I made it wasn't a big deal. But it's now inconveniencing me and I really don't need this behavior in the 1st place, I don't buy the actual use case as very strong either.

  8. jek reporter

    not letting the rdbms reuse non deterministic computed values in where and order_by isn't a strong use case?

    i think the assertion that branching on 'if label in the select list' is a untenable burden is pretty weak and makes the api needlessly inhumane.

  9. Mike Bayer repo owner

    Replying to jek:

    not letting the rdbms reuse non deterministic computed values in where and order_by isn't a strong use case?

    its not a use case for Oracle, for example

  10. Mike Bayer repo owner

    just a few thoughts here:

    • the work to be done here is to do a check if a label used within ORDER BY is present in the select() construct being rendered.
    • Postgres has a more strict policy regarding labels in ORDER BY - you can't embed the label into an expression of its own, it has to stand by itself with an optional ASC or DESC and that's it.
    • sqlite and mysql allow the label to be further calculated in an expression
    • I didn't check, but I have a feeling that labels cannot be used in the WHERE clause with postgres. I'm also fairly certain (again didn't check) that Oracle never allows labels in WHERE or ORDER BY.
    • I'm gathering that MySQL allows the label to be used anywhere.
    • The compiler will probably limit its search for the label to the columns clause of the select() construct that is currently at the top of the "select stack".
    • A lot of effort is needed with testing here (the tests I did are obviously not enough).
    • The whole feature is strictly a more natural and convenient way of issuing an optimization that is available in any case, by forcing usage of a label name with column('name').
  11. Mike Bayer repo owner
    • Fixed regression introduced in 0.9 where new "ORDER BY <labelname>" feature from 🎫1068 would not apply quoting rules to the label name as rendered in the ORDER BY. fix #3020, re: #1068

    → <<cset fcda519452cf>>

  12. Log in to comment