consider rolling back / refining automatic columns clause logic with orm query.distinct().order_by()
When a distinct query is ordered, the columns on which are sorted are added to the SELECT statement, see example code:
from sqlalchemy.orm.session import sessionmaker
from sqlalchemy.dialects import postgresql
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, Integer
Base = declarative_base()
class User(Base):
__tablename__ = 'user'
id = Column(Integer, primary_key=True)
att1 = Column(Integer)
att2 = Column(Integer)
Session = sessionmaker()
sess = Session()
q = sess.query(User.att2).distinct(User.att1)
print(str(q.statement.compile(dialect=postgresql.dialect())))
as expected, only user.att2 will be returned:
SELECT DISTINCT ON ("user".att1) "user".att2
FROM "user"
Now with an order_by statement:
print(str(q.order_by(User.att1, User.id).statement.compile(
dialect=postgresql.dialect())))
not as expected, both user.att2 and user.id will be returned
SELECT DISTINCT ON ("user".att1) "user".att2, "user".att1, "user".id
FROM "user" ORDER BY "user".att1, "user".id
The intended result is:
SELECT DISTINCT ON ("user".att1) "user".att2
FROM "user" ORDER BY "user".att1, "user".id
Comments (9)
-
repo owner -
reporter - edited description
- Edited mistake, I meant to order by "user".att1 instead of "user".att2.
- Added intended result
-
repo owner So this logic is a "safety" so that when someone does add things to the ORDER BY that arent accounted for with the DISTINCT, they are added to the SELECT list. You'll note that the column that's being added is not actually returned in the result:
from sqlalchemy.orm.session import Session from sqlalchemy.ext.declarative import declarative_base from sqlalchemy import Column, Integer, create_engine Base = declarative_base() class User(Base): __tablename__ = 'user' id = Column(Integer, primary_key=True) att1 = Column(Integer) att2 = Column(Integer) e = create_engine("postgresql://scott:tiger@localhost/test", echo=True) sess = Session(e) sess.add(User(id=1, att1=2, att2=3)) q = sess.query(User.att2).distinct(User.att1).order_by(User.att1, User.id) Base.metadata.drop_all(e) Base.metadata.create_all(e) row = q.all()[0] assert row.keys() == ['att2']
that would be why this was never noticed before. We can turn down the rules for ordering here to try to mimic PG some more, or take the logic out entirely for non-entity queries, but if you are looking for very specific SQL of an exact string form, Core is more appropriate for that.
-
repo owner - add a note about DISTINCT in query w/ ORDER BY, references
#3518
→ <<cset 6d0c0994e9a0>>
- add a note about DISTINCT in query w/ ORDER BY, references
-
repo owner - add a note about DISTINCT in query w/ ORDER BY, references
#3518
(cherry picked from commit 6d0c0994e9a08cf5d149af0314970d5f6e25b159)
→ <<cset 2d48b9de5ad5>>
- add a note about DISTINCT in query w/ ORDER BY, references
-
repo owner - add a note about DISTINCT in query w/ ORDER BY, references
#3518
(cherry picked from commit 6d0c0994e9a08cf5d149af0314970d5f6e25b159)
→ <<cset 5c8d1f33cd10>>
- add a note about DISTINCT in query w/ ORDER BY, references
-
repo owner - marked as major
- changed title to consider rolling back / refining automatic columns clause logic with orm query.distinct().order_by()
- changed milestone to 1.2
- changed component to orm
- marked as proposal
I'm not a fan of this behavior in any case. The use case we test for is where there's a JOIN to a related entity. it's this:
q = sess.query(User).join('addresses').distinct(). \ order_by(desc(Address.email_address)) assert [User(id=7), User(id=9), User(id=8)] == q.all()
But that query above doesn't even make any sense. We want the users to be distinct from the join, OK, but then we want to ORDER BY the email address in the join, so of course we have to put that in the SELECT list, which means, we're no longer DISTINCT on the User. We only get back unique User objects because of the de-duping at the entity level. This whole behavior has legacy 0.3 written all over it :).
Consider deprecating it but this would be a hard jump for any app relying upon it. 1.2 for now but this might keep getting pushed.
-
repo owner - changed status to duplicate
Duplicate of
#3641. -
repo owner this is resolved w
#3641which is for exactly this case, where PG's "DISTINCT ON" is involved. original test in the description works as expected now. - Log in to comment
so the logic making that happen is here:
If I comment it out as above, here is the test that fails (only on PG):
If I run your actual query against Postgresql, with or without the above logic, I get this error:
With the behavior intact:
with the behavior commented out:
So, the query you're looking for is not valid SQL in any case.
What we need to know is - what is the valid SQL that you are looking for? Let's work on making that happen before we decide what's broken here.