warn when in_() used with no args / attempt to use "False" when available

Issue #1628 resolved

Mike Bayer repo owner created an issue 2009-11-27

I know we have our "correct" interpretation of col.in_([]) as col != col, but I have now been "bitten" with this as when its on a large table, the query (at least on PG) is horrendously inefficient, of course silently since I didn't realize the list being passed is empty, so I'd never want to be calling in_() with an empty list. So I propose a warning generated when this occurs, similarly to the "non-unicode passed to a unicode" warning.

in_false.diff

Comments (16)

Former user Account Deleted

Why is it rewritten to col != col? That's an expression that can never be true --- unless perhaps you use it on a weird type which overloads the !=-operator, and even in that case I would not expect col.in_([]) to result in those. The comment in _in_impl indicates it really is expected that it won't ever return anything.

Yet, it is not something a query planner will attempt to prove is a contradiction. However, if it had been rewritten to simply False, that's something a query planner can easily reason is a contradiction, and thus never execute the sub-plan.

Example:

alex => explain analyze select * from (select generate_series(0,1000000)) as i where i != i;
                                            QUERY PLAN                                             
---------------------------------------------------------------------------------------------------
 Subquery Scan i  (cost=0.00..0.03 rows=1 width=4) (actual time=1493.582..1493.582 rows=0 loops=1)
   Filter: (i.* <> i.*)
   ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.010..294.954 rows=1000001 loops=1)
 Total runtime: 1493.619 ms
(4 rows)

alex=> explain analyze select * from (select generate_series(0,1000000)) as i where false;
                                     QUERY PLAN                                     
------------------------------------------------------------------------------------
 Result  (cost=0.00..0.02 rows=1 width=4) (actual time=0.001..0.001 rows=0 loops=1)
   One-Time Filter: false
   ->  Result  (cost=0.00..0.01 rows=1 width=0) (never executed)
 Total runtime: 0.022 ms
(4 rows)

I propose to change the implementation to return a False-expression instead of col != col --- with or without warning.

– AlexB

2009-11-30T01:45:56+00:00

Former user Account Deleted
Oh, and please add alex@brasetvik.com to CC when replying. I don't have permission to do that as guest. :-)
- AlexB
- 2009-11-30T01:47:50+00:00
Mike Bayer reporter
- changed watchers to alex@brasetvik.com
Replying to guest:

Why is it rewritten to col != col? That's an expression that can never be true ---

because "col in ()" can also never be true

unless perhaps you use it on a weird type which overloads the !=-operator, and even in that case I would not expect col.in_([]) to result in those.

"col in ()" is not valid SQL - it will result in a SQL error otherwise. See ~~#545~~.

Yet, it is not something a query planner will attempt to prove is a contradiction. However, if it had been rewritten to simply False, that's something a query planner can easily reason is a contradiction, and thus never execute the sub-plan.

Oracle (and possibly others) does not support "false". As in ~~#545~~, for many versions we went with a complicated CASE construct to provide a "false", but the simplest such expression is "col != col" which was adjusted later on.

I propose to change the implementation to return a False-expression instead of col != col --- with or without warning.

Would only work for those platforms which support "false" in the 1st place - but on Oracle, you'd get surprise terrible performance. I'd prefer people actively avoid using in_() with no arguments.
- 2009-11-30T06:06:15+00:00
Former user Account Deleted
Replying to zzzeek:

Replying to guest:

Why is it rewritten to col != col? That's an expression that can never be true --- because "col in ()" can also never be true

Of course --- and false will never be true either. ;-) I just wondered why exactly that contradiction ...

Oracle (and possibly others) does not support "false". As in ~~#545~~, for many versions we went with a complicated CASE construct to provide a "false", but the simplest such expression is "col != col" which was adjusted later on.

... which explains it. (I have zero Oracle-experience. :)

I propose to change the implementation to return a False-expression instead of col != col --- with or without warning.

Would only work for those platforms which support "false" in the 1st place - but on Oracle, you'd get surprise terrible performance. I'd prefer people actively avoid using in_() with no arguments.

In addition to a warning, perhaps we could compile col != col to a False-clause (the converse is probably harder) in the dialects that support them, to let the planner have a chance at handling it. I'd prefer warning + fast execution over just a warning. :-)

– AlexB
- 2009-11-30T06:24:42+00:00
Mike Bayer reporter
- changed title to warn when in_() used with no args / attempt to use "False" when available
yeah we might be able to do an "in_" operator here and override it for Oracle. just more work.
- 2009-11-30T10:12:19+00:00

Mike Bayer reporter

mmm, nah this is harder than that:

sqlite> create table foo(id int, data int);
sqlite> insert into foo values(1, 1);
sqlite> insert into foo values(2, 2);
sqlite> insert into foo values(3, null);
sqlite> select * from foo where (data != data) = 0;
1|1
2|2
sqlite> select * from foo where (1 != 1) = 0;
1|1
2|2
3|
sqlite>

different - col != col handles NULLs the way they would be with an empty IN as well. so I really think the warning is what needs to be done here while maintaining correctness.

2009-11-30T11:27:52+00:00

Mike Bayer reporter
I notice django is doing 1=0. Their logic will work for lots of cases but will not handle NULLs correctly. I added some exposition on the whole issue to http://www.sqlalchemy.org/trac/wiki/FAQ#WhydoesINProducecolcolandawarningin0.6thisisinefficientwhynot10likesomeotherORMdoes .

Perhaps we can use the CASE construct in all cases except oracle. Though I am loathe to abandon the simplicity of the current approach.
- 2009-11-30T12:28:23+00:00
Mike Bayer reporter
and....perhaps if they'd like to do away with the warning, we can provide this:
```
column.in_([], empty_false=True)
```
Warning: empty list passed to IN expression againstcolwill producecol != col. To render an efficient "false" expression that does not account for null values, specifyempty_false=Trueto thein_()construct
- 2009-11-30T12:34:38+00:00
Former user Account Deleted
- attached in_false.diff
Simple patch for Postgres
- 2009-11-30T13:01:46+00:00
Former user Account Deleted
Attached patch works with Postgres and SQLite, although I simplified one of the tests that also runs with SQLite.

However, perhaps instead of relying on col != col, it can return a Contradiction or something? Then it's a bit clearer what's going on.

How do they use the 0=1-approach? That sounds pretty much like just false to me, unless they're doing a col=(1=0), which obviously fails for nulls.

– AlexB
- 2009-11-30T13:08:29+00:00
Former user Account Deleted
Replying to zzzeek:

col != col handles NULLs the way they would be with an empty IN as well. so I really think the warning is what needs to be done here while maintaining correctness.

col != col is fine for keeping it correct on the databases where false is not an expression. However, the false-as-the-entire-expression works on both Postgres, SQLite and MySQL, so providing the optimization there seems worthwhile:
```
sqlite> select 1 where 0;
sqlite>

mysql> select 42 from ab where false;
Empty set (0.00 sec)

alex=# select 42 where false;
 ?column? 
----------
(0 rows)
```
– AlexB
- 2009-11-30T13:15:30+00:00
Former user Account Deleted
Per FAQ-update: The CASE-approach kind of misses the point. It is equal to col != col wrt. to the planner being unable to prove that it's a contradiction. If it's a simple false or an expression AND-ed with false, it's trivially a contradiction. If not, you have k-sat at your hands, so no optimizer even tries.. :-)
- 2009-11-30T13:34:31+00:00
Mike Bayer reporter
there are tests in test/sql/test_query.py that will illustrate how only the CASE and the col != col approaches work completely. Basically if col is null, col != col returns NULL as well; 1=0 does not - it always returns false. The other issue is negation; col = col returns true unless col is null, then it returns null which evaluates to false. so again not the same as 1!=0.

another route we could go here would be 1=0 for col IN (), and col IS NOT NULL for col NOT IN (). it still is not 100% correct for the non-negated case though. run the unit tests and you'll see.
- 2009-11-30T16:47:39+00:00
Former user Account Deleted
Ah, yes, negating and/or other combinations in expressions where nulls are a possibility is a good point. (Color me dense, I've always had NOT NULL-constraints on the stuff I've used IN on. ;-)

Although the tests that rely on "IN ()" being rewritten and then compare that expression again seem a bit artificial, to get the possibility for the quick plan and have correctness in all cases (which we should, of course), we'd have to reason on a columns nullability as well. With a _Contradiction-clause, that could still be feasible, but for a corner-case like this it doesn't seem worth it.

+1 on just a warning for now.

– AlexB
- 2009-12-01T01:39:22+00:00
Mike Bayer reporter
- changed status to resolved
glad we could work that out. d732e7bf261504f34c9b9d5049a9875633c3e4b3
- 2009-12-29T18:20:58+00:00
Mike Bayer reporter
- removed milestone
Removing milestone: 0.6.0 (automated comment)
- 2014-02-23T18:14:38+00:00
Log in to comment

Assignee: Mike Bayer

Type: enhancement

Priority: major

Status: resolved

Component: sql

Milestone: –

Votes: 0

Watchers: 0