Oracle has a hard limit of 1000 entries in the WHERE IN clause
If you try to select something using the in
operator with a list of more than 1000 entries, you're bound to get an ORA-01795 error. We used to have the following code for the 0.4 series to monkey-patch around this:
def new_oracle_visit_binary(self, expr):
""" Oracle does not support more than 1000 items in an IN expression.
To overcome this limitation we change the visitor for binary expressions
for the oracle compiler in such a way that IN expressions with more than
1000 items are automatically broken up in several IN queries that are ORed
together.
"""
if expr.operator != sqlalchemy.sql.operators.in_op:
# *jedi mind trick* This is not the operator you are looking for.
return sqlalchemy.sql.compiler.DefaultCompiler.visit_binary(self, expr)
# This is an in_ clause
chunk = 1000
if len(expr.right.elem.clauses) <= chunk:
# With less than "chunk" docs, keep going.
return sqlalchemy.sql.compiler.DefaultCompiler.visit_binary(self, expr)
# We need to split this expression up. Build sub-expressions.
subexpr = [ for i in range(0, len(expr.right.elem.clauses), chunk):
subexpr.append(sqlalchemy.sql.expression._BinaryExpression(
expr.left,
sqlalchemy.sql.expression.ClauseList(*expr.right.elem.clauses[i:i+chunk](]
)),
sqlalchemy.sql.operators.in_op))
# We have at least two subexpressions. Start or-ring from the left.
combexpr = subexpr[0](0)
for s in subexpr[1:](1:):
combexpr = sqlalchemy.sql.expression._BinaryExpression(combexpr, s, sqlalchemy.sql.operators.or_)
return sqlalchemy.sql.compiler.DefaultCompiler.visit_binary(self, combexpr)
# Install the new oracle binary expression visitor.
sqlalchemy.databases.oracle.dialect.statement_compiler.visit_binary = new_oracle_visit_binary
One of these days I'll probably come up with something for 0.5.
Comments (5)
-
repo owner -
Account Deleted It's true in general that unbound statements are a bad idea, but on the other hand, I kind of expect SQLAlchemy to do the right thing ''if'' the code ever comes across one, intended or not. Introducing another operator for this special case isn't much of an issue for me personally, to be honest we wrote the given code based on just such an operator before we switched to SA. But on the other hand, and of course you are the one to decide on this, it seems to me that SA should not carry over database driver specific limitations such as this one to its API. SA does such a good job abstracting away all the other differences, why keep this one?
-
repo owner I think the "right thing" for
in_()
to do is to render an "IN". Anything else is a surprise. Especially on Oracle, I think its important that a user be aware of that limitation and make an explicit decision how they'd like to handle it. -
Account Deleted Yeah, I do agree with what you're saying. Looking at the code again, it seems we could just as easily do with something like this:
chunk = 1000 subexprs = [dataset_table.c.id.in_(cases[i:i+chunk](dataset_table.c.id.in_(cases[i:i+chunk)) for i in range(0, len(cases), chunk)] query = query.where(or_(*subexprs))
Feel free to close this ticket.
-
repo owner - changed status to wontfix
- Log in to comment
In my own work, regardless of backend, I issue individual select() constructs for each batch of N records - and it is my work with Oracle that first made me aware of this kind of limitation. Sending over a statement that is unbounded in size seems like a bad idea to me. At the very least turning an IN into any number of OR statements, instead of allowing oracle to say, "hey we really aren't built to handle this large of a statement", without the user asking, seems like a huge surprise to me. So at the moment the approach above seems better to me as a new operator you'd build yourself, like
expanding_or()
or something like that.