bpo-43475: Fix worst case collision behavior for NaN instances by rhettinger · Pull Request #25493 · python/cpython

rhettinger

Is my understanding right, that this PR would break the following code:

import math

class A:
    def __init__(self, a):
        self.a=a
    def __hash__(self):
        return hash(self.a)
    def __eq__(self, other):
        if(math.isnan(self.a) and math.isnan(other.a)):
            return True
        return self.a == other.a
    def __repr__(self):
        return str(self.a)
        
set([A(float("nan")), A(float("nan"))])  # result: {nan}

I.e. when somebody tries to wrap Float and change __eq__ in such a way, that all float-nans will be equivalent?

With this PR, the chances are high, that the result will be {nan, nan}, as hashes from both objects will be different.

Until now, it was clear - don't put nans into set/dict because the default "="-relation for floats isn't an equivalence relation. People worked around this by redefining the "="-relation and didn't so for hash function because until now "a,b - nans => hash(a)=hash(b)" was given.

I think the intuitive behavior for set([float("nan"), float("nan")] is {nan} and not {nan, nan}. Given how Py_EQ is defined for floats, this is not possible. Maybe there is need for a new Py_EQ_FOR_HASH comparator, which would be used in hashset/hashdict and be more or less the same as Py_EQ but would yield true for nan==nan.