propose null is not in the empty set

Issue #3907 resolved
Mike Bayer repo owner created an issue

e.g. revisit gerrit at https://gerrit.sqlalchemy.org/#/c/103/ .

Postgresql and other databases won't give us direct "x IN ()" access, but we can get the same effect using a SELECT for no rows:

test=# select null in (select 1 where 1=0);
 ?column? 
----------
 f
(1 row)

test=# select (null in (select 1 where 1=0)) is null;
 ?column? 
----------
 f
(1 row)

test=# select (null in (1, 2, 3)) is null;
 ?column? 
----------
 t
(1 row)

negation:

test=# select null not in (select 1 where 1=0);
 ?column? 
----------
 t
(1 row)

test=# select null not in (1, 2, 3);
 ?column? 
----------

(1 row)

test=# select null not in (1, 2, 3) is null;
 ?column? 
----------
 t
(1 row)

test=# select null not in (select 1 where 1=0) is null;
 ?column? 
----------
 f
(1 row)

therefore, "null IN ()" can safely return False, which means "anything IN ()" can safely be turned into 1 != 0.

However, we'd like backwards compatibility to work. For that, we'd like an engine level flag. So for that, the "x in empty set" needs to be received at the compiler level. We should be able to pass the empty IN straight to the compiler where "def visit_in_op()" / "def visit_notin_op()" can look on the right side of the expression and make the right decision. When the "IN" behavior was first done, I don't think we had "def visit_xyz_op" and perhaps not even "notin".

Comments (8)

  1. Mike Bayer reporter

    also, consider for documentation using modern set notation as this might look a little more natural to people:

    column('x').in_({1, 2, 3})
    

    someone told me the other day sets don't work...but they do ? so do tuples.

  2. Mike Bayer reporter

    surprise, Mariadb thinks the answer is NULL:

    MariaDB [(none)]> select null not in (select 1 where 1=0);
    +----------------------------------+
    | null not in (select 1 where 1=0) |
    +----------------------------------+
    |                             NULL |
    +----------------------------------+
    1 row in set (0.00 sec)
    
    MariaDB [(none)]> select null in (select 1 where 1=0);
    +------------------------------+
    | null in (select 1 where 1=0) |
    +------------------------------+
    |                         NULL |
    +------------------------------+
    1 row in set (0.00 sec)
    
    MariaDB [(none)]> select 5 not in (select 1 where 1=0);
    +-------------------------------+
    | 5 not in (select 1 where 1=0) |
    +-------------------------------+
    |                             1 |
    +-------------------------------+
    1 row in set (0.00 sec)
    
    MariaDB [(none)]> select 5 in (select 1 where 1=0);
    +---------------------------+
    | 5 in (select 1 where 1=0) |
    +---------------------------+
    |                         0 |
    +---------------------------+
    1 row in set (0.00 sec)
    

    I was considering maybe we could even let the DB make the decision here by having "empty set" just render as "select 1 where 1=0", but...this changes that :). FAQ will have to be revised a lot to point out we're still just guessing here.

  3. erebus1

    But, why we can't really delegate this decision to db? We can change x in (empty set) on x in (select 1 where 1=0) And then Postgres will return False, when Mariadb will return null for null. How previous post contadict this?

    The performance seems to be the same, as for change x in (empty set) on 1=0

  4. Mike Bayer reporter

    there's a world where I get hit regularly for, "SQLAlchemy is not abstracting differences between the databases!!". If we're going to say Postgresql and Mariadb return a totally different answer for "x in ()", it's true that the "null" evaluates to False in a WHERE clause but still, it's a different answer.

    i think the problem would be resolved best if we just give in and do the "1 = 0" thing to force a False and be done with it; I just want this to be something that can be altered at the compiler level so people can have it do what they want (and nobody will ever care).

  5. Mike Bayer reporter

    Add "empty in" strategies; default to "static"

    The longstanding behavior of the :meth:.Operators.in_ and :meth:.Operators.not_in_ operators emitting a warning when the right-hand condition is an empty sequence has been revised; a new flag :paramref:.create_engine.empty_in_strategy allows an empty "IN" expression to generate a simple boolean expression, or to invoke the previous behavior of dis-equating the expression to itself, with or without a warning. The default behavior is now to emit the simple boolean expression, allowing an empty IN to be evaulated without any performance penalty.

    Change-Id: I65cc37f2d7cf65a59bf217136c42fee446929352 Fixes: #3907

    → <<cset f3b6f4f8da52>>

  6. Log in to comment