在运算符中，float（“NaN”）和np.nan

I used to believe that in operator in Python checks the presence of element in some collection using equality checking ==, so element in some_list is roughly equivalent to any(x == element for x in some_list). For example:

我以前认为Python中的运算符使用等式检查==检查某些集合中元素的存在,因此some_list中的元素大致等于任何(some_list中x的x ==元素)。例如:

True in [1, 2, 3]
# True because True == 1

1 in [1., 2., 3.]
# also True because 1 == 1.

However, it is well-known that NaN is not equal to itself. So I expected that float("NaN") in [float("NaN")] is False. And it is False indeed.

然而,众所周知,NaN不等于它自己。所以我期望[float(“NaN”)]中的float(“NaN”)为False。这确实是假的。

However, if we use numpy.nan instead of float("NaN"), the situation is quite different:

但是,如果我们使用numpy.nan而不是float(“NaN”),情况则完全不同:

import numpy as np
np.nan in [np.nan, 1, 2]
# True

But np.nan == np.nan still gives False!

但是np.nan == np.nan仍然给出错误!

How is it possible? What's the difference between np.nan and float("NaN")? How does in deal with np.nan?

这怎么可能? np.nan和float(“NaN”)有什么区别?如何处理np.nan?

2 个解决方案

#1

To check if the item is in the list, Python tests for object identity first, and then tests for equality only if the objects are different.¹

要检查项是否在列表中,Python首先测试对象标识,然后仅在对象不同时测试相等性

float("NaN") in [float("NaN")] is False because two different NaN objects are involved in the comparison. The test for identity therefore returns False, and then the test for equality also returns False since NaN != NaN.

[float(“NaN”)]中的float(“NaN”)为False,因为比较中涉及两个不同的NaN对象。因此,对于身份的测试返回False,然后对于相等性的测试也返回False,因为NaN!= NaN。

np.nan in [np.nan, 1, 2] however is True because the same NaN object is involved in the comparison. The test for object identity returns True and so Python immediately recognises the item as being in the list.

[np.nan,1,2]中的np.nan是True,因为比较中涉及相同的NaN对象。对象标识的测试返回True,因此Python立即将该项识别为列在列表中。

The __contains__ method (invoked using in) for many of Python's other builtin Container types, such as tuples and sets, is implemented using the same check.

使用相同的检查实现了许多Python的其他内置Container类型(如元组和集合)的__contains__方法(使用in调用)。

¹ At least this is true in CPython. Object identity here means that the objects are found at the same memory address: the contains method for lists is performed using PyObject_RichCompareBool which quickly compares object pointers before a potentially more complicated object comparison. Other Python implementations may differ.

1至少在CPython中也是如此。这里的对象标识意味着在相同的内存地址中找到对象:使用PyObject_RichCompareBool执行列表的contains方法,PyObject_RichCompareBool在可能更复杂的对象比较之前快速比较对象指针。其他Python实现可能有所不同。

#2

One thing worth mentioning is that numpy arrays do behave as expected:

值得一提的是numpy数组的行为符合预期:

a = np.array((np.nan,))
a[0] in a
# False

Variations of the theme:

主题的变化:

[np.nan]==[np.nan]
# True
[float('nan')]==[float('nan')]
# False
{np.nan: 0}[np.nan]
# 0
{float('nan'): 0}[float('nan')]
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
# KeyError: nan

Everything else is covered in @AlexRiley's excellent answer.

@ AlexRiley的优秀答案涵盖了其他所有内容。

#1