bpo-43475: Fix worst case collision behavior for NaN instances by rhettinger · Pull Request #25493 · python/cpython
Is my understanding right, that this PR would break the following code:
import math
class A:
def __init__(self, a):
self.a=a
def __hash__(self):
return hash(self.a)
def __eq__(self, other):
if(math.isnan(self.a) and math.isnan(other.a)):
return True
return self.a == other.a
def __repr__(self):
return str(self.a)
set([A(float("nan")), A(float("nan"))]) # result: {nan}
I.e. when somebody tries to wrap Float and change __eq__ in such a way, that all float-nans will be equivalent?
With this PR, the chances are high, that the result will be {nan, nan}, as hashes from both objects will be different.
Until now, it was clear - don't put nans into set/dict because the default "="-relation for floats isn't an equivalence relation. People worked around this by redefining the "="-relation and didn't so for hash function because until now "a,b - nans => hash(a)=hash(b)" was given.
I think the intuitive behavior for set([float("nan"), float("nan")] is {nan} and not {nan, nan}. Given how Py_EQ is defined for floats, this is not possible. Maybe there is need for a new Py_EQ_FOR_HASH comparator, which would be used in hashset/hashdict and be more or less the same as Py_EQ but would yield true for nan==nan.