DISTINCT is used in select() but not in count()

Issue #285 resolved
Former user created an issue

When you use distinct=True in a select() statement on a table, a SELECT DISTINCT statemet is generated. But if you use the same clause and distinct=True in a count() statement, the distinct=True parameter is ignored and the count is wrong.

Attached is a small test progam which exercises the bug.

Raul Garcia.

Comments (1)

  1. Mike Bayer repo owner

    OK, this is a little subtle. the statement you have produces this (I converted list(statement.execute())[0](0)[0](0) into the equivalent statement.scalar()):

    res = table1.count(and_(table1.c.id==table2.c.t1id,table2.c.t1id==1), distinct=True).scalar()
    
    SELECT DISTINCT count(Table1.id) AS tbl_row_count 
    FROM Table1, Table2 
    WHERE Table1.id = Table2.t1id AND Table2.t1id = ?
    

    you can see the aggregate count() function result is what will be run into the DISTINCT call (so DISTINCT is being propigated from SQLAlchemy). While this may seem pointless since count() returns only one row, that is not necessarily the case, such as if you had a GROUP BY in there which returned multiple rows containing a count; you may in some case want to get a list of counts against some criterion and determine the set of all distinct count values.

    What you really want to apply DISTINCT to is the rowset from the table, and then you want to "count" that. So you do it this way:

    res = table1.select(and_(table1.c.id==table2.c.t1id,table2.c.t1id==1), distinct=True).count().scalar()
    
    SELECT count(id) AS tbl_row_count 
    FROM (SELECT DISTINCT Table1.id AS id 
    FROM Table1, Table2 
    WHERE Table1.id = Table2.t1id AND Table2.t1id = ?)
    

    So the SELECT DISTINCT creates the matrix of results that you care about; then the "count" query wraps that result.

  2. Log in to comment