Better in filter handling

Issue #1835 resolved
Former user created an issue

I noticed that the "in_" filter generates a SQL in clause with as many bind parameters as the number of items. This is inefficient and would not take the best advantage of the server caches. There are alternatives such as using queries with fixed number of parameter counts (with NULL's to pad) and generating a temp table on the fly and joining back to it. For details see this nice discussion on stackoverflow:

http://stackoverflow.com/questions/337704/parameterizing-a-sql-in-clause

Comments (3)

  1. Mike Bayer repo owner
    • changed status to wontfix
    • changed component to sql

    Don't you think it would be a surprise, to say the least, if someone said in_(), and suddenly a new "CREATE TEMP TABLE" and a bunch of INSERT statements were generated ? How come the database's own IN operator doesn't do this ?

    The various workarounds for large INs mentioned in that article and others are perfectly fine for users to implement themselves, but none are appropriate in any way to be embedded in our own in_() construct.

    It's not a given in any case that using plain IN is inefficient. An application that uses it for small batches of parameters will have no issues. The database's caching of query plans will simply cache a handful of statements corresponding to each size of IN rather than one. Its up to the developer to decide if he or she wants to replace the usage of IN with a different scheme, not SQLAlchemy's.

  2. Former user Account Deleted

    Regarding why databases won't do this, it is about flexibility. Besides, they are definitely getting smarted, e.g., as one post mentioned in that article, newer versions of sql server are able to automatically parameterize a dynamic sql, so that using dynamic sql would still get you the same benefits as static sql.

    I am fine doing it in the app as long as there is a way to express the join to the temp table in a straight-forward way, I will do some more research into this.

  3. Mike Bayer repo owner

    If the databases themselves are providing optimized IN-like operators, then by all means, SQLAlchemy will allow their use.

    JOINs are tricky to use in a generic way since they highly modify the structure of a SELECT statement, and if for example a statement contained many INs inside of subqueries and such, the conversion of all those INs into additional joins would be both syntactically troublesome and also burdensome on the query planner.

  4. Log in to comment