func.count is not optimized

Issue #2559 resolved
Sławek G created an issue

query.count() generate: SELECT count(*) AS count_1 FROM (<query>) AS anon_1; nested query(select) here can be expensive and hurt count because first goes real select (ie: 50 heavy columns from k tables x N rows)

prefered way:

query.with_entities(func.count(<Table.primary_key>).scalar() avoid additional select and is much faster

but I (SQLa user) do not want remember that count is not good and each time write own better version...

my <dummy> workaround (auto detecting primary key as param for func.count): <query>.with_entities(func.count( getattr( <query>.column_descriptions0'expr', <query>.column_descriptions0'expr'].table.primary_key.columns.keys()0) )).scalar()

and this is fast, but do not work for query likes: session.query(model1, model2) with no join condition -> cartesian product and some others cases (InstrumentedAttributes as select column) - easy updated

however update is really easy - just modify query select part without from and filter parts. and worth implemmentation even if will handle only the most common select (single or joined tables)

I only test it with mysql with 2 dialects.

Comments (1)

  1. Mike Bayer repo owner
    • changed status to wontfix
    • changed component to sql

    The "really easy" implementation you refer to existed in SQLAlchemy up through version 0.6, which is also referred to in the documentation: http://docs.sqlalchemy.org/en/rel_0_7/orm/query.html?highlight=query.count#sqlalchemy.orm.query.Query.count

    Suffice to say that the approach is anything but easy, for a generic API method that seeks to work in all cases. It was extremely complex and unreliable. Please see #2093 for rationale.

    and worth implemmentation even if will handle only the most common select (single or joined tables)

    not really, as it needs to detect exactly when it can't do a count reliably, otherwise the end user has no way of knowing if a particular query will succeed to do a count() or not. reliable in all cases, with explicit options to render "exact SQL", is the best approach.

    Also, this is how to do a count:

    query(func.count(MyEntity.id)).scalar()
    

    I would recommend sticking to simple practices like the above, or assign to your classes a "count" hybrid if you're concerned about so-called "auto detection":

    class MyClass(Base):
       # ...
    
       @hybrid_property
       def id_count(self):
            raise NotImplementedError()
    
       @hybrid_property.expression
       def id_count(cls):
           return func.count(self.id)
    
    
    count = session.query(MyClass.id_count).scalar()
    
  2. Log in to comment