DISTINCT is used in select() but not in count()

Mike Bayer repo owner

marked as major
changed status to invalid

OK, this is a little subtle. the statement you have produces this (I converted list(statement.execute())[0](0)[0](0) into the equivalent statement.scalar()):

res = table1.count(and_(table1.c.id==table2.c.t1id,table2.c.t1id==1), distinct=True).scalar()

SELECT DISTINCT count(Table1.id) AS tbl_row_count 
FROM Table1, Table2 
WHERE Table1.id = Table2.t1id AND Table2.t1id = ?

you can see the aggregate count() function result is what will be run into the DISTINCT call (so DISTINCT is being propigated from SQLAlchemy). While this may seem pointless since count() returns only one row, that is not necessarily the case, such as if you had a GROUP BY in there which returned multiple rows containing a count; you may in some case want to get a list of counts against some criterion and determine the set of all distinct count values.

What you really want to apply DISTINCT to is the rowset from the table, and then you want to "count" that. So you do it this way:

res = table1.select(and_(table1.c.id==table2.c.t1id,table2.c.t1id==1), distinct=True).count().scalar()

SELECT count(id) AS tbl_row_count 
FROM (SELECT DISTINCT Table1.id AS id 
FROM Table1, Table2 
WHERE Table1.id = Table2.t1id AND Table2.t1id = ?)

So the SELECT DISTINCT creates the matrix of results that you care about; then the "count" query wraps that result.

2006-08-26T16:12:24+00:00

Comments (1)

Mike Bayer repo owner
- marked as major
- changed status to invalid
OK, this is a little subtle. the statement you have produces this (I converted list(statement.execute())[0](0)[0](0) into the equivalent statement.scalar()):
```
res = table1.count(and_(table1.c.id==table2.c.t1id,table2.c.t1id==1), distinct=True).scalar()

SELECT DISTINCT count(Table1.id) AS tbl_row_count 
FROM Table1, Table2 
WHERE Table1.id = Table2.t1id AND Table2.t1id = ?
```
you can see the aggregate count() function result is what will be run into the DISTINCT call (so DISTINCT is being propigated from SQLAlchemy). While this may seem pointless since count() returns only one row, that is not necessarily the case, such as if you had a GROUP BY in there which returned multiple rows containing a count; you may in some case want to get a list of counts against some criterion and determine the set of all distinct count values.

What you really want to apply DISTINCT to is the rowset from the table, and then you want to "count" that. So you do it this way:
```
res = table1.select(and_(table1.c.id==table2.c.t1id,table2.c.t1id==1), distinct=True).count().scalar()

SELECT count(id) AS tbl_row_count 
FROM (SELECT DISTINCT Table1.id AS id 
FROM Table1, Table2 
WHERE Table1.id = Table2.t1id AND Table2.t1id = ?)
```
So the SELECT DISTINCT creates the matrix of results that you care about; then the "count" query wraps that result.
- 2006-08-26T16:12:24+00:00
Log in to comment