ColumnElement with label() is incorrectly referenced in group_by/order_by producing invalid query
https://gist.github.com/SantjagoCorkez/db207a7b533d1d6f05ae
When providing a ColumnElement itself into .group_by()/.order_by() the query compiler does not automatically quote the reference to that producing a query that in some circumstances becomes invalid (for example in case there are columns with the same name as the ColumnElement's label within the query).
This became an issue at 1.0.0 (and not resolved in 1.0.1)
Comments (19)
-
repo owner -
repo owner a simpler form of this query works on PG without fail, using a table that has a column called "state":
SELECT count(t.foob) AS count_1, max(t.state) AS x, CASE WHEN (t.state = %(state_1)s) THEN %(param_1)s WHEN (t.state = %(state_2)s) THEN %(param_2)s END AS state FROM t WHERE t.state > %(state_3)s GROUP BY state
What postgresql version is this?
-
reporter The request rendered is in a docstring in the gist provided. Or you need a fully compiled request with params substituted?
-
reporter The PG is 9.4.1
-
repo owner I've taken the time to unwrap your example into a proper test case where this now reproduces. Here is a test case:
from sqlalchemy import * e = create_engine("postgresql://scott:tiger@localhost/test", echo=True) m = MetaData() order_logs = Table( 'order_logs', m, Column('id', Integer, primary_key=True), Column('state', Integer), Column('order_id', Integer) ) orders = Table( 'orders', m, Column('id', Integer, primary_key=True), Column('state', Integer), ) expr = case( [ (order_logs.c.state.in_([1, 2]), 'x'), (order_logs.c.state.in_([3, 4]), 'y'), ], else_='z' ).label('state') expr2 = func.sum( case( [ (orders.c.state.in_([1, 2]), 1), (orders.c.state.in_([3, 4]), 2), ], else_=3 ) ).label("key") m.create_all(e) stmt = select( [ func.count(order_logs.c.id).label("count"), expr, expr2 ] ).where(order_logs.c.order_id == orders.c.id)\ .group_by(expr) e.execute(stmt)
-
repo owner I got the SQL, didn't see the fact that it was side-scrolled all on one line.
-
repo owner Postgresql is definitely being weird on this one. for this one (notice I spelled "state" as "statee"):
SELECT count(order_logs.id) AS count, CASE WHEN (order_logs.statee IN (%(state_1)s, %(state_2)s)) THEN %(param_1)s WHEN (order_logs.statee IN (%(state_3)s, %(state_4)s)) THEN %(param_2)s ELSE %(param_3)s END AS state FROM order_logs, orders WHERE order_logs.order_id = orders.id GROUP BY state
the error is:
(psycopg2.ProgrammingError) column "order_logs.statee" must appear in the GROUP BY clause or be used in an aggregate function
.But! If you take away the "orders" table:
SELECT count(order_logs.id) AS count, CASE WHEN (order_logs.statee IN (%(state_1)s, %(state_2)s)) THEN %(param_1)s WHEN (order_logs.statee IN (%(state_3)s, %(state_4)s)) THEN %(param_2)s ELSE %(param_3)s END AS state FROM order_logs GROUP BY state
now it's fine! I'm not understanding that at the moment.
-
reporter This was rendered with 0.9.x
SELECT count(order_logs.id) AS count, CASE WHEN (order_logs.state IN (1003, 1202, 1203, 1200, 1201)) THEN 'refuses' WHEN (order_logs.state IN (1100)) THEN 'errors' WHEN (order_logs.state IN (1001)) THEN 'no_answer' WHEN (order_logs.state IN (1002)) THEN 'call_later' WHEN (order_logs.state IN (1999)) THEN 'success' ELSE 'unknown' END AS state, sum(CASE WHEN (orders.state IN (1003, 1202, 1203, 1200, 1201, 1100)) THEN 1 ELSE 0 END) AS refused, order_logs.ts_spawn - (order_logs.ts_spawn - -10800) %% 3600 AS raw_key, order_logs.ts_spawn - (order_logs.ts_spawn - -10800) %% 3600 AS key FROM order_logs, orders WHERE order_logs.order_id = orders.id AND order_logs.state IN (1999, 1001, 1100, 1002, 1201, 1200, 1203, 1202, 1003) AND order_logs.ts_spawn >= 1429822800 AND order_logs.ts_spawn < 1429909200 AND ((order_logs.flags & 1) > 0 OR order_logs.operator_id IN (42, 46, 120, 133)) GROUP BY CASE WHEN (order_logs.state IN (1003, 1202, 1203, 1200, 1201)) THEN 'refuses' WHEN (order_logs.state IN (1100)) THEN 'errors' WHEN (order_logs.state IN (1001)) THEN 'no_answer' WHEN (order_logs.state IN (1002)) THEN 'call_later' WHEN (order_logs.state IN (1999)) THEN 'success' ELSE 'unknown' END, "raw_key", "key"
And this is rendered with 1.0.x
SELECT count(order_logs.id) AS count, CASE WHEN (order_logs.state IN (1003, 1202, 1203, 1200, 1201)) THEN 'refuses' WHEN (order_logs.state IN (1100)) THEN 'errors' WHEN (order_logs.state IN (1001)) THEN 'no_answer' WHEN (order_logs.state IN (1002)) THEN 'call_later' WHEN (order_logs.state IN (1999)) THEN 'success' ELSE 'unknown' END AS state, sum(CASE WHEN (orders.state IN (1003, 1202, 1203, 1200, 1201, 1100)) THEN 1 ELSE 0 END) AS refused, order_logs.ts_spawn - (order_logs.ts_spawn - -10800) %% 3600 AS raw_key, order_logs.ts_spawn - (ord er_logs.ts_spawn - -10800) %% 3600 AS key FROM order_logs, orders WHERE order_logs.order_id = orders.id AND order_logs.state IN (1999, 1001, 1100, 1002, 1201, 1200, 1203, 1202, 1003) AND order_logs.ts_spawn >= 1429736400 AND order_logs.ts_spawn < 1429822800 AND ((order_logs.flags & 1) > 0 OR order_logs.operator_id IN (42, 46, 120, 133)) GROUP BY state, "raw_key", "key"
-
repo owner However. In this case, while this behavior is definitely different, I think the current behavior is better. Keeping in mind that your issue goes away if you just group_by() the original element, and not the label, e.g.
group_by(state_column.element)
for now, but I will add a new methodunlabeled()
to all column elements to make this easier, let's review the feature.The change is simply that if you say group_by() and pass an object that is actually a Label object, it is intuitive that this is referring to that label as expressed in the columns clause:
from sqlalchemy import table, column, select, func t1 = table('t', column('a'), column('b')) expr = func.foo(t.c.a) labeled_expr = expr.label('foobar') # SELECT b, foo(a) AS foobar FROM t GROUP BY foo(a) stmt1 = select([t1.c.b, labeled_expr]).group_by(expr) # SELECT b, foo(a) AS foobar FROM t GROUP BY foobar stmt2 = select([t1.c.b, labeled_expr]).group_by(labeled_expr)
So workaround for 1.0.1 is to group_by(state_column.element), and I will add a convenience method
.unlabeled()
to force a labeled expression to render without any label. Let me know if that works for you. -
repo owner you know what nevermind, I think I'm just going to disable this feature again for now. The migration notes don't make it clear that this applies to label() objects also.
-
reporter Yes. state_column.element works as in previous versions.
-
reporter Wow. Maybe it's better to update docs? I've just updated the code at around the mentioned at the gist to make the label do not confuse PG's query compiler (renamed label to 'state_mark') and that provided a more clear and, I guess more lightweight for Query Compiler and Planner, query:
SELECT count(order_logs.id) AS count, CASE WHEN (order_logs.state IN (1003, 1202, 1203, 1200, 1201)) THEN 'refuses' WHEN (order_logs.state IN (1100)) THEN 'errors' WHEN (order_logs.state IN (1001)) THEN 'no_answer' WHEN (order_logs.state IN (1002)) THEN 'call_later' WHEN (order_logs.state IN (1999)) THEN 'success' ELSE 'unknown' END AS state_mark, sum(CASE WHEN (orders.state IN (1003, 1202, 1203, 1200, 1201, 1100)) THEN 1 ELSE 0 END) AS refused, order_logs.ts_spawn - (order_logs.ts_spawn - -10800) %% 3600 AS raw_key, order_logs.ts_spawn - (order_logs.ts_spawn - -10800) %% 3600 AS key FROM order_logs, orders WHERE order_logs.order_id = orders.id AND order_logs.state IN (1999, 1001, 1100, 1002, 1201, 1200, 1203, 1202, 1003) AND order_logs.ts_spawn >= 1429736400 AND order_logs.ts_spawn < 1429822800 AND ((order_logs.flags & 1) > 0 OR order_logs.operator_id IN (42, 46, 120, 133)) GROUP BY state_mark, raw_key, key
-
repo owner I'm really torn. I'm looking into undoing the feature, and I put a lot of work into getting these labels to render. But I already had to adjust for SQL server, and now PG. I dunno :). just this one spot is impacting you right?
-
repo owner also, its more GROUP BY that is problematic, rather than ORDER BY, but making it one way in ORDER BY and another in GROUP BY seems like its getting even more confusing.
-
repo owner OK. So I think, first off it's easier to keep this working by default the way it is in 0.9, because then ill get less of these regressions. Also, what I don't like about it right now is that it takes place for PG, MySQL, not for MSSQL, etc., so there's implicit changing stuff going on here.
So now my thinking is, make it an option, for when people want to order by / group by the label, but don't want to type out text("some label"):
This would be new features on Label:
expr = expression.label("foo") stmt = select([expr]).group_by(expr.asref()) expr = expression.label("foo", asref=True) stmt = select([expr]).group_by(expr)
-
repo owner wow. OK. 0.9 already does the label thing for ORDER BY fully, just not GROUP BY. So, we already do it differently for ORDER BY, even though the implementation in 1.0 is totally different, which is why I thought this was new. OK. Well, until my next realization, seems like I just will switch off this behavior in GROUP BY and we're done.
-
repo owner - changed status to duplicate
Duplicate of
#3338. -
repo owner - changed status to resolved
- Fixed a regression that was incorrectly fixed in 1.0.0b4
(hence becoming two regressions); reports that
SELECT statements would GROUP BY a label name and fail was misconstrued
that certain backends such as SQL Server should not be emitting
ORDER BY or GROUP BY on a simple label name at all; when in fact,
we had forgotten that 0.9 was already emitting ORDER BY on a simple
label name for all backends, as described in :ref:
migration_1068
, as 1.0 had rewritten this logic as part of2992
.
In 1.0.2, the bug is fixed both that SQL Server, Firebird and others will again emit ORDER BY on a simple label name when passed a :class:
.Label
construct that is expressed in the columns clause, and no backend will emit GROUP BY on a simple label name in this case, as even Postgresql can't reliably do GROUP BY on a simple name in every case. fixes#3338, fixes#3385→ <<cset f9275198c304>>
-
repo owner Sorry for my impatience! thank you for reporting this issue.
- Log in to comment
Wow....can you please provide the full SQL being rendered? at least?
The change here is http://docs.sqlalchemy.org/en/rel_1_0/changelog/migration_10.html#order-by-and-group-by-are-special-cases. It is by design, so the "state" column being expressed as a single identifier is expected, I also see something about a column called "Order.state" in this fragment of code, but I'm hoping you can appreciate those of us who fix bugs need to have full context, thanks.