warn when in_() used with no args / attempt to use "False" when available
I know we have our "correct" interpretation of col.in_([])
as col != col
, but I have now been "bitten" with this as when its on a large table, the query (at least on PG) is horrendously inefficient, of course silently since I didn't realize the list being passed is empty, so I'd never want to be calling in_()
with an empty list. So I propose a warning generated when this occurs, similarly to the "non-unicode passed to a unicode" warning.
Comments (16)
-
Account Deleted -
Account Deleted Oh, and please add
alex@brasetvik.com
to CC when replying. I don't have permission to do that as guest. :-)- AlexB
-
reporter - changed watchers to alex@brasetvik.com
Replying to guest:
Why is it rewritten to
col != col
? That's an expression that can never be true ---because "col in ()" can also never be true
unless perhaps you use it on a weird type which overloads the
!=
-operator, and even in that case I would not expectcol.in_([])
to result in those."col in ()" is not valid SQL - it will result in a SQL error otherwise. See
#545.Yet, it is not something a query planner will attempt to prove is a contradiction. However, if it had been rewritten to simply
False
, that's something a query planner can easily reason is a contradiction, and thus never execute the sub-plan.Oracle (and possibly others) does not support "false". As in
#545, for many versions we went with a complicated CASE construct to provide a "false", but the simplest such expression is "col != col" which was adjusted later on.I propose to change the implementation to return a
False
-expression instead ofcol != col
--- with or without warning.Would only work for those platforms which support "false" in the 1st place - but on Oracle, you'd get surprise terrible performance. I'd prefer people actively avoid using in_() with no arguments.
-
Account Deleted Replying to zzzeek:
Replying to guest:
Why is it rewritten to
col != col
? That's an expression that can never be true --- because "col in ()" can also never be trueOf course --- and false will never be true either. ;-) I just wondered why exactly that contradiction ...
Oracle (and possibly others) does not support "false". As in
#545, for many versions we went with a complicated CASE construct to provide a "false", but the simplest such expression is "col != col" which was adjusted later on.... which explains it. (I have zero Oracle-experience. :)
I propose to change the implementation to return a
False
-expression instead ofcol != col
--- with or without warning.Would only work for those platforms which support "false" in the 1st place - but on Oracle, you'd get surprise terrible performance. I'd prefer people actively avoid using in_() with no arguments.
In addition to a warning, perhaps we could compile
col != col
to a False-clause (the converse is probably harder) in the dialects that support them, to let the planner have a chance at handling it. I'd prefer warning + fast execution over just a warning. :-)– AlexB
-
reporter yeah we might be able to do an "in_" operator here and override it for Oracle. just more work.
-
reporter mmm, nah this is harder than that:
sqlite> create table foo(id int, data int); sqlite> insert into foo values(1, 1); sqlite> insert into foo values(2, 2); sqlite> insert into foo values(3, null); sqlite> select * from foo where (data != data) = 0; 1|1 2|2 sqlite> select * from foo where (1 != 1) = 0; 1|1 2|2 3| sqlite>
different -
col != col
handles NULLs the way they would be with an empty IN as well. so I really think the warning is what needs to be done here while maintaining correctness. -
reporter I notice django is doing
1=0
. Their logic will work for lots of cases but will not handle NULLs correctly. I added some exposition on the whole issue to http://www.sqlalchemy.org/trac/wiki/FAQ#WhydoesINProducecolcolandawarningin0.6thisisinefficientwhynot10likesomeotherORMdoes .Perhaps we can use the CASE construct in all cases except oracle. Though I am loathe to abandon the simplicity of the current approach.
-
reporter and....perhaps if they'd like to do away with the warning, we can provide this:
column.in_([], empty_false=True)
Warning: empty list passed to IN expression against
colwill produce
col != col. To render an efficient "false" expression that does not account for null values, specify
empty_false=Trueto the
in_()construct
-
Account Deleted - attached in_false.diff
Simple patch for Postgres
-
Account Deleted Attached patch works with Postgres and SQLite, although I simplified one of the tests that also runs with SQLite.
However, perhaps instead of relying on
col != col
, it can return aContradiction
or something? Then it's a bit clearer what's going on.How do they use the
0=1
-approach? That sounds pretty much like justfalse
to me, unless they're doing acol=(1=0)
, which obviously fails for nulls.– AlexB
-
Account Deleted Replying to zzzeek:
col != col
handles NULLs the way they would be with an empty IN as well. so I really think the warning is what needs to be done here while maintaining correctness.col != col
is fine for keeping it correct on the databases wherefalse
is not an expression. However, the false-as-the-entire-expression works on both Postgres, SQLite and MySQL, so providing the optimization there seems worthwhile:sqlite> select 1 where 0; sqlite> mysql> select 42 from ab where false; Empty set (0.00 sec) alex=# select 42 where false; ?column? ---------- (0 rows)
– AlexB
-
Account Deleted Per FAQ-update: The CASE-approach kind of misses the point. It is equal to col != col wrt. to the planner being unable to prove that it's a contradiction. If it's a simple
false
or an expression AND-ed withfalse
, it's trivially a contradiction. If not, you have k-sat at your hands, so no optimizer even tries.. :-) -
reporter there are tests in test/sql/test_query.py that will illustrate how only the CASE and the
col != col
approaches work completely. Basically ifcol
is null,col != col
returns NULL as well;1=0
does not - it always returns false. The other issue is negation;col = col
returns true unlesscol
is null, then it returns null which evaluates tofalse
. so again not the same as1!=0
.another route we could go here would be
1=0
forcol IN ()
, andcol IS NOT NULL
forcol NOT IN ()
. it still is not 100% correct for the non-negated case though. run the unit tests and you'll see. -
Account Deleted Ah, yes, negating and/or other combinations in expressions where nulls are a possibility is a good point. (Color me dense, I've always had NOT NULL-constraints on the stuff I've used IN on. ;-)
Although the tests that rely on "IN ()" being rewritten and then compare that expression again seem a bit artificial, to get the possibility for the quick plan and have correctness in all cases (which we should, of course), we'd have to reason on a columns nullability as well. With a
_Contradiction
-clause, that could still be feasible, but for a corner-case like this it doesn't seem worth it.+1 on just a warning for now.
– AlexB
-
reporter - changed status to resolved
glad we could work that out. d732e7bf261504f34c9b9d5049a9875633c3e4b3
-
reporter - removed milestone
Removing milestone: 0.6.0 (automated comment)
- Log in to comment
Why is it rewritten to
col != col
? That's an expression that can never be true --- unless perhaps you use it on a weird type which overloads the!=
-operator, and even in that case I would not expectcol.in_([])
to result in those. The comment in _in_impl indicates it really is expected that it won't ever return anything.Yet, it is not something a query planner will attempt to prove is a contradiction. However, if it had been rewritten to simply
False
, that's something a query planner can easily reason is a contradiction, and thus never execute the sub-plan.Example:
I propose to change the implementation to return a
False
-expression instead ofcol != col
--- with or without warning.– AlexB