Different behaviour for collections of nans from CPython

Issue #1974 resolved
David MacIver
created an issue

All of the following asserts pass in cpython (I've tried on 2.7, 3.3 and 3.4) but some of them fail on pypy (I've tried on pypy-2.5.0 and pypy3-2.4.0).

```n1 = float('nan')
n2 = float('nan')
x = [n1]
assert n1 in x
assert n2 not in x
assert x.index(n1) == 0
try:
x.index(n2)
assert False
except ValueError:
pass
```

Basically, it seems like in cpython collection methods have a shortcut that treats x and y as equal if x is y even if not x == y.

The behaviour for sets is also different. For set like collections CPython treats reference equal nans as equal but other nans as distinct in the same way as the above, but pypy seems to collapse all nans together into one value.

```n1 = float('nan')
n2 = float('nan')
n3 = float('nan')
x = {n1, n2}
assert len(x) == 2
assert n1 in x
assert n2 in x
assert n3 not in x

t = {n1}

assert t == t
assert t == {n1}
assert t != {n2}
```

It's hard to fix. I'm not discarding the problem as a "won't fix" straight away, but this is a variant of this strange CPython behavior:

```>>> int('5') is int('5')
True
>>> int('500') is int('500')
False

>>> def f():
...      return 5.0
>>> f() is f()
True
>>> f() is 5.0
False
```

There are basically no reasonable rules in CPython for when two floats are the same object or not; instead, you only have some number of cases where it happens in one way or in the other. And then equality of two nan floats is defined on top of this lack of reasonable rule, giving a result that is very hard to reproduce in PyPy (particularly because of the JIT, which unboxes floats for performance).

The current rules in PyPy are at least consistent: "float('nan') is float('nan')" always return True, and "float('nan') == float('nan')" always return False (or should always do so, at least).

(With sets and dictionary keys, we use the same rule as CPython: the key you're looking for must be either `is` or `==` to the one already in the set or dict.)

1. reporter

Yep. The CPython behaviour is weird and inconsistent and the pypy behaviour is actually a lot nicer (although I'm not a huge fan of the difference in pypy behaviours between sets and list).

TBH this doesn't personally affect me that much and I don't really mind if you close it as wontfix - it's one more behaviour difference I have to worry about when supporting both pypy and cpython, but it's one that I doubt will crop up very often.

I mostly opened this bug because I couldn't find it documented anywhere that there was this difference of behaviour and wanted to make sure you were aware of it. If you are and don't think it's plausible to fix, fine by me.

2. reporter

Ah, right! So the reason for the dict/set difference is just that float('nan') is float('nan') in pypy and not in CPython. That makes sense.

It also suggests that the bugfix here is simpler: CPython is using that rule for contains and index on lists, but pypy is not.

Unsure what you mean in your last sentence. It should be the case that both CPython and PyPy are using the following rule both for `list.index()` and for dict/set keys: if `x is y or x == y` then the two items are considered equal.

The only difference between PyPy and CPython is that in PyPy there is only one `float('nan')` object ever, as reported by `is` checks. Directly comparing `x == x` returns False on this object (on both PyPy and CPython). The difference is only with `[x] == [y]` or with sets `{x, y}` in the case where you get two different nan objects on CPython --- which you can't get on PyPy.

3. reporter

No, pypy isn't using that rule for index or list contains. [float('nan')].index(float('nan')) throws a ValueError and float('nan') in [float('nan')] returns False.

Ah, indeed, my mistake. Thanks for reporting it then :-)

Ah, it's even more subtle. The issue shows up with our lists-of-floats-only optimization:

```>>>> x = [float('nan'), None]
>>>> x.index(float('nan'))
0
>>>> x = [float('nan')]
>>>> x.index(float('nan'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: nan is not in list
```
4. Mark as resolved with the bug fix above and the extra documentation that explains the remaining issue in more details.