warn when in_() used with no args / attempt to use "False" when available

Issue #1628 resolved
Mike Bayer repo owner created an issue

I know we have our "correct" interpretation of col.in_([]) as col != col, but I have now been "bitten" with this as when its on a large table, the query (at least on PG) is horrendously inefficient, of course silently since I didn't realize the list being passed is empty, so I'd never want to be calling in_() with an empty list. So I propose a warning generated when this occurs, similarly to the "non-unicode passed to a unicode" warning.

Comments (16)

  1. Former user Account Deleted

    Why is it rewritten to col != col? That's an expression that can never be true --- unless perhaps you use it on a weird type which overloads the !=-operator, and even in that case I would not expect col.in_([]) to result in those. The comment in _in_impl indicates it really is expected that it won't ever return anything.

    Yet, it is not something a query planner will attempt to prove is a contradiction. However, if it had been rewritten to simply False, that's something a query planner can easily reason is a contradiction, and thus never execute the sub-plan.

    Example:

    alex => explain analyze select * from (select generate_series(0,1000000)) as i where i != i;
                                                QUERY PLAN                                             
    ---------------------------------------------------------------------------------------------------
     Subquery Scan i  (cost=0.00..0.03 rows=1 width=4) (actual time=1493.582..1493.582 rows=0 loops=1)
       Filter: (i.* <> i.*)
       ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.010..294.954 rows=1000001 loops=1)
     Total runtime: 1493.619 ms
    (4 rows)
    
    alex=> explain analyze select * from (select generate_series(0,1000000)) as i where false;
                                         QUERY PLAN                                     
    ------------------------------------------------------------------------------------
     Result  (cost=0.00..0.02 rows=1 width=4) (actual time=0.001..0.001 rows=0 loops=1)
       One-Time Filter: false
       ->  Result  (cost=0.00..0.01 rows=1 width=0) (never executed)
     Total runtime: 0.022 ms
    (4 rows)
    

    I propose to change the implementation to return a False-expression instead of col != col --- with or without warning.

    – AlexB

  2. Former user Account Deleted

    Oh, and please add alex@brasetvik.com to CC when replying. I don't have permission to do that as guest. :-)

    • AlexB
  3. Mike Bayer reporter

    Replying to guest:

    Why is it rewritten to col != col? That's an expression that can never be true ---

    because "col in ()" can also never be true

    unless perhaps you use it on a weird type which overloads the !=-operator, and even in that case I would not expect col.in_([]) to result in those.

    "col in ()" is not valid SQL - it will result in a SQL error otherwise. See #545.

    Yet, it is not something a query planner will attempt to prove is a contradiction. However, if it had been rewritten to simply False, that's something a query planner can easily reason is a contradiction, and thus never execute the sub-plan.

    Oracle (and possibly others) does not support "false". As in #545, for many versions we went with a complicated CASE construct to provide a "false", but the simplest such expression is "col != col" which was adjusted later on.

    I propose to change the implementation to return a False-expression instead of col != col --- with or without warning.

    Would only work for those platforms which support "false" in the 1st place - but on Oracle, you'd get surprise terrible performance. I'd prefer people actively avoid using in_() with no arguments.

  4. Former user Account Deleted

    Replying to zzzeek:

    Replying to guest:

    Why is it rewritten to col != col? That's an expression that can never be true --- because "col in ()" can also never be true

    Of course --- and false will never be true either. ;-) I just wondered why exactly that contradiction ...

    Oracle (and possibly others) does not support "false". As in #545, for many versions we went with a complicated CASE construct to provide a "false", but the simplest such expression is "col != col" which was adjusted later on.

    ... which explains it. (I have zero Oracle-experience. :)

    I propose to change the implementation to return a False-expression instead of col != col --- with or without warning.

    Would only work for those platforms which support "false" in the 1st place - but on Oracle, you'd get surprise terrible performance. I'd prefer people actively avoid using in_() with no arguments.

    In addition to a warning, perhaps we could compile col != col to a False-clause (the converse is probably harder) in the dialects that support them, to let the planner have a chance at handling it. I'd prefer warning + fast execution over just a warning. :-)

    – AlexB

  5. Mike Bayer reporter

    mmm, nah this is harder than that:

    sqlite> create table foo(id int, data int);
    sqlite> insert into foo values(1, 1);
    sqlite> insert into foo values(2, 2);
    sqlite> insert into foo values(3, null);
    sqlite> select * from foo where (data != data) = 0;
    1|1
    2|2
    sqlite> select * from foo where (1 != 1) = 0;
    1|1
    2|2
    3|
    sqlite>
    

    different - col != col handles NULLs the way they would be with an empty IN as well. so I really think the warning is what needs to be done here while maintaining correctness.

  6. Mike Bayer reporter

    and....perhaps if they'd like to do away with the warning, we can provide this:

    column.in_([], empty_false=True)
    

    Warning: empty list passed to IN expression againstcolwill producecol != col. To render an efficient "false" expression that does not account for null values, specifyempty_false=Trueto thein_()construct

  7. Former user Account Deleted

    Attached patch works with Postgres and SQLite, although I simplified one of the tests that also runs with SQLite.

    However, perhaps instead of relying on col != col, it can return a Contradiction or something? Then it's a bit clearer what's going on.

    How do they use the 0=1-approach? That sounds pretty much like just false to me, unless they're doing a col=(1=0), which obviously fails for nulls.

    – AlexB

  8. Former user Account Deleted

    Replying to zzzeek:

    col != col handles NULLs the way they would be with an empty IN as well. so I really think the warning is what needs to be done here while maintaining correctness.

    col != col is fine for keeping it correct on the databases where false is not an expression. However, the false-as-the-entire-expression works on both Postgres, SQLite and MySQL, so providing the optimization there seems worthwhile:

    sqlite> select 1 where 0;
    sqlite>
    
    mysql> select 42 from ab where false;
    Empty set (0.00 sec)
    
    alex=# select 42 where false;
     ?column? 
    ----------
    (0 rows)
    

    – AlexB

  9. Former user Account Deleted

    Per FAQ-update: The CASE-approach kind of misses the point. It is equal to col != col wrt. to the planner being unable to prove that it's a contradiction. If it's a simple false or an expression AND-ed with false, it's trivially a contradiction. If not, you have k-sat at your hands, so no optimizer even tries.. :-)

  10. Mike Bayer reporter

    there are tests in test/sql/test_query.py that will illustrate how only the CASE and the col != col approaches work completely. Basically if col is null, col != col returns NULL as well; 1=0 does not - it always returns false. The other issue is negation; col = col returns true unless col is null, then it returns null which evaluates to false. so again not the same as 1!=0.

    another route we could go here would be 1=0 for col IN (), and col IS NOT NULL for col NOT IN (). it still is not 100% correct for the non-negated case though. run the unit tests and you'll see.

  11. Former user Account Deleted

    Ah, yes, negating and/or other combinations in expressions where nulls are a possibility is a good point. (Color me dense, I've always had NOT NULL-constraints on the stuff I've used IN on. ;-)

    Although the tests that rely on "IN ()" being rewritten and then compare that expression again seem a bit artificial, to get the possibility for the quick plan and have correctness in all cases (which we should, of course), we'd have to reason on a columns nullability as well. With a _Contradiction-clause, that could still be feasible, but for a corner-case like this it doesn't seem worth it.

    +1 on just a warning for now.

    – AlexB

  12. Log in to comment