当空间大于32位时,如何实现GetHashCode兼容的Equals方法?

时间:2023-01-24 16:11:27

In .NET you need that Equals(object) and GetHashCode() are compatible. But sometimes you can't:

在.NET中,您需要Equals(object)和GetHashCode()兼容。但有时你不能:

public class GreaterThan32Bits
{
    public int X { get; set; }
    public int Y { get; set; }
}

Because the data density is greater than 32 bits, and GetHashCode returns an Int32, you will have 3 solutions (assuming a correctly implemented GetHashCode):

因为数据密度大于32位,并且GetHashCode返回Int32,所以您将有3个解决方案(假设正确实现了GetHashCode):

  1. Avoid code duplication discarded as incorrect

    避免将代码重复丢弃为不正确

    public override bool Equals(object other)
    {
        if(ReferenceEquals(null, other)) return false;
        if(ReferenceEquals(this, other)) return true;
        return this.GetHashCode() == other.GetHashCode();
    }
    
  2. Implement Equals separately from GetHashCode()

    与GetHashCode()分开实现Equals

    public override bool Equals(object obj)
    {
        if(ReferenceEquals(null, other)) return false;
        if(ReferenceEquals(this, other)) return true;
        var other = obj as GreaterThan32Bits;
        if(this.X == other.X) return this.Y == other.Y;
        return false;
    }
    
  3. Implement a greater precision GetHashCode64, the overrided GetHashCode (32 bits) will return (int)GetHashCode64(), and Equals will return this.GetHashCode64() == other.GetHashCode64()

    实现更高精度的GetHashCode64,重写的GetHashCode(32位)将返回(int)GetHashCode64(),而Equals将返回this.GetHashCode64()== other.GetHashCode64()

Which one would you implement?

你会实施哪一个?

The first solution is imprecise incorrect, but cleaner. The second option seems clean, but gets very complex when the class has more properties. The third option is a compromise.

第一种解决方案是不精确的不正确,但更清洁。第二个选项看起来很干净,但是当类有更多属性时会变得非常复杂。第三种选择是妥协。

4 个解决方案

#1


The requirement is as follows: if (a.Equals(b)), then a.GetHashCode() == b.GetHashCode()

要求如下:if(a.Equals(b)),然后a.GetHashCode()== b.GetHashCode()

Not the other way round.

不是相反。

You shouldn't implement Equals() in term of GetHashCode(), ever. It's perfectly valid for GetHashCode to have collisions, but Equals() must not return false positives.

你不应该在GetHashCode()方面实现Equals()。 GetHashCode碰撞完全有效,但Equals()不得返回误报。

I'd suggest this implementation:

我建议这个实现:

public override int GetHashCode()
{
    return unchecked( this.X * p1 + this.Y * p2 );
}

public override bool Equals(object obj) 
{
    var other = obj as GreaterThan32Bits;
    // you must do the null test after the cast, otherwise the
    // function crashes when obj is not a GreaterThan32Bits instance
    if (ReferenceEquals(other, null)) return false;
    return this.X == other.X && this.Y == other.Y;
}

Where p1 and p2 are large primes. This usually results in a good hash function (few hash collisions -> Dictionary becomes efficient). If the X and Y values are independent (for example, you don't expect many points on a straight line like X=Y), then even something simple like X ^ Y can be a good hash function.

其中p1和p2是大素数。这通常会产生良好的哈希函数(少数哈希冲突 - >词典变得高效)。如果X和Y值是独立的(例如,你不希望像X = Y这样的直线上有很多点),那么即使像X ^ Y这样简单的东西也可以是一个很好的散列函数。

But then again, you only need a good hash function if you actually use the class as keys in a dictionary (or other hash table).

但话说回来,如果你真的将这个类用作字典(或其他哈希表)中的键,那么你只需要一个好的哈希函数。

In fact, it's perfectly fine to always return 0 in GetHashCode() and only implement Equals(). Dictionary will still work correctly with such objects as keys, it'll just be inefficient.

实际上,总是在GetHashCode()中返回0并且只实现Equals()是完全正确的。字典仍然可以正常使用像键这样的对象,它只是效率低下。

#2


Your first implementation is not correct. The hash code of two objects might be equal even though the objects themselves are not equal: This is the essence of a hash code.

你的第一个实现是不正确的。即使对象本身不相等,两个对象的哈希码也可能相等:这是哈希码的本质。

An object hash code may be useful for determining when two objects are not equal, but to determine whether they are equal you will have to call .Equals().

对象哈希码可能对确定两个对象何时不相等很有用,但要确定它们是否相等,则必须调用.Equals()。

An implementation that always returns 0 for GetHashCode() is legal, but may not be very efficient when objects of that type are inserted int various types of containers.

始终为GetHashCode()返回0的实现是合法的,但是当将该类型的对象插入到各种类型的容器中时可能效率不高。

Your option 2 is the best choice. It's a good idea to keep the implementation of Equals() separate from GetHashCode(), because they do quite different things. Equals() must return true if and only if the two objects are equal in all respects. To do this, you will normally have to check each object property individually.

您的选择2是最佳选择。将Equals()的实现与GetHashCode()分开是一个好主意,因为它们完全不同。当且仅当两个对象在所有方面都相等时,Equals()必须返回true。为此,您通常必须单独检查每个对象属性。

#3


Strictly speaking first solution doesn't work. It is not a solution then.

严格来说,第一个解决方案不起作用。那不是解决方案。

The idea of hashing is quite different. Int32 is pretty enough for that purposes.

散列的想法是完全不同的。 Int32非常适合这种用途。

The suggested GetHashCode() is

建议的GetHashCode()是

return X ^ Y;

Simply as it is.

简直就是这样。

EDIT: Equals methods then can use GetHashCode(), but only to return false when hashes differ. Deep comparison is required anyway.

编辑:等于方法然后可以使用GetHashCode(),但只有当哈希值不同时才返回false。无论如何都需要深入比较。

#4


I think the key that you're missing is that GetHashCode() doesn't have to return unique values.

我认为你缺少的关键是GetHashCode()不必返回唯一值。

Its perfectly acceptable for two different objects to return the same GetHashCode. Say you add two objects to a HashSet that have the same HashCode, then the container will use the GetHashCode first to find where approximately in the HashSet the object is, then use the equals on all the matching objects to find your exact object.

它完全可以接受两个不同的对象返回相同的GetHashCode。假设您将两个对象添加到具有相同HashCode的HashSet,则容器将首先使用GetHashCode查找对象所在的HashSet中的大致位置,然后在所有匹配对象上使用equals来查找您的确切对象。

Obviously its better if each object has a unique hash code. If every object returned the same hashCode then performance would be horrible.

如果每个对象都有唯一的哈希码,显然会更好。如果每个对象返回相同的hashCode,那么性能将是可怕的。

#1


The requirement is as follows: if (a.Equals(b)), then a.GetHashCode() == b.GetHashCode()

要求如下:if(a.Equals(b)),然后a.GetHashCode()== b.GetHashCode()

Not the other way round.

不是相反。

You shouldn't implement Equals() in term of GetHashCode(), ever. It's perfectly valid for GetHashCode to have collisions, but Equals() must not return false positives.

你不应该在GetHashCode()方面实现Equals()。 GetHashCode碰撞完全有效,但Equals()不得返回误报。

I'd suggest this implementation:

我建议这个实现:

public override int GetHashCode()
{
    return unchecked( this.X * p1 + this.Y * p2 );
}

public override bool Equals(object obj) 
{
    var other = obj as GreaterThan32Bits;
    // you must do the null test after the cast, otherwise the
    // function crashes when obj is not a GreaterThan32Bits instance
    if (ReferenceEquals(other, null)) return false;
    return this.X == other.X && this.Y == other.Y;
}

Where p1 and p2 are large primes. This usually results in a good hash function (few hash collisions -> Dictionary becomes efficient). If the X and Y values are independent (for example, you don't expect many points on a straight line like X=Y), then even something simple like X ^ Y can be a good hash function.

其中p1和p2是大素数。这通常会产生良好的哈希函数(少数哈希冲突 - >词典变得高效)。如果X和Y值是独立的(例如,你不希望像X = Y这样的直线上有很多点),那么即使像X ^ Y这样简单的东西也可以是一个很好的散列函数。

But then again, you only need a good hash function if you actually use the class as keys in a dictionary (or other hash table).

但话说回来,如果你真的将这个类用作字典(或其他哈希表)中的键,那么你只需要一个好的哈希函数。

In fact, it's perfectly fine to always return 0 in GetHashCode() and only implement Equals(). Dictionary will still work correctly with such objects as keys, it'll just be inefficient.

实际上,总是在GetHashCode()中返回0并且只实现Equals()是完全正确的。字典仍然可以正常使用像键这样的对象,它只是效率低下。

#2


Your first implementation is not correct. The hash code of two objects might be equal even though the objects themselves are not equal: This is the essence of a hash code.

你的第一个实现是不正确的。即使对象本身不相等,两个对象的哈希码也可能相等:这是哈希码的本质。

An object hash code may be useful for determining when two objects are not equal, but to determine whether they are equal you will have to call .Equals().

对象哈希码可能对确定两个对象何时不相等很有用,但要确定它们是否相等,则必须调用.Equals()。

An implementation that always returns 0 for GetHashCode() is legal, but may not be very efficient when objects of that type are inserted int various types of containers.

始终为GetHashCode()返回0的实现是合法的,但是当将该类型的对象插入到各种类型的容器中时可能效率不高。

Your option 2 is the best choice. It's a good idea to keep the implementation of Equals() separate from GetHashCode(), because they do quite different things. Equals() must return true if and only if the two objects are equal in all respects. To do this, you will normally have to check each object property individually.

您的选择2是最佳选择。将Equals()的实现与GetHashCode()分开是一个好主意,因为它们完全不同。当且仅当两个对象在所有方面都相等时,Equals()必须返回true。为此,您通常必须单独检查每个对象属性。

#3


Strictly speaking first solution doesn't work. It is not a solution then.

严格来说,第一个解决方案不起作用。那不是解决方案。

The idea of hashing is quite different. Int32 is pretty enough for that purposes.

散列的想法是完全不同的。 Int32非常适合这种用途。

The suggested GetHashCode() is

建议的GetHashCode()是

return X ^ Y;

Simply as it is.

简直就是这样。

EDIT: Equals methods then can use GetHashCode(), but only to return false when hashes differ. Deep comparison is required anyway.

编辑:等于方法然后可以使用GetHashCode(),但只有当哈希值不同时才返回false。无论如何都需要深入比较。

#4


I think the key that you're missing is that GetHashCode() doesn't have to return unique values.

我认为你缺少的关键是GetHashCode()不必返回唯一值。

Its perfectly acceptable for two different objects to return the same GetHashCode. Say you add two objects to a HashSet that have the same HashCode, then the container will use the GetHashCode first to find where approximately in the HashSet the object is, then use the equals on all the matching objects to find your exact object.

它完全可以接受两个不同的对象返回相同的GetHashCode。假设您将两个对象添加到具有相同HashCode的HashSet,则容器将首先使用GetHashCode查找对象所在的HashSet中的大致位置,然后在所有匹配对象上使用equals来查找您的确切对象。

Obviously its better if each object has a unique hash code. If every object returned the same hashCode then performance would be horrible.

如果每个对象都有唯一的哈希码,显然会更好。如果每个对象返回相同的hashCode,那么性能将是可怕的。