
时间:2023-01-24 16:11:15

According to MSDN, a hash function must have the following properties:


  1. If two objects compare as equal, the GetHashCode method for each object must return the same value. However, if two objects do not compare as equal, the GetHashCode methods for the two object do not have to return different values.


  2. The GetHashCode method for an object must consistently return the same hash code as long as there is no modification to the object state that determines the return value of the object's Equals method. Note that this is true only for the current execution of an application, and that a different hash code can be returned if the application is run again.


  3. For the best performance, a hash function must generate a random distribution for all input.


I keep finding myself in the following scenario: I have created a class, implemented IEquatable<T> and overridden object.Equals(object). MSDN states that:

我一直在以下场景中找到自己:我创建了一个类,实现了IEquatable 并重写了object.Equals(object)。 MSDN声明:

Types that override Equals must also override GetHashCode ; otherwise, Hashtable might not work correctly.


And then it usually stops up a bit for me. Because, how do you properly override object.GetHashCode()? Never really know where to start, and it seems to be a lot of pitfalls.


Here at *, there are quite a few questions related to GetHashCode overriding, but most of them seems to be on quite particular cases and specific issues. So, therefore I would like to get a good compilation here. An overview with general advice and guidelines. What to do, what not to do, common pitfalls, where to start, etc.


I would like it to be especially directed at C#, but I would think it will work kind of the same way for other .NET languages as well(?).


I think maybe the best way is to create one answer per topic with a quick and short answer first (close to one-liner if at all possible), then maybe some more information and end with related questions, discussions, blog posts, etc., if there are any. I can then create one post as the accepted answer (to get it on top) with just a "table of contents". Try to keep it short and concise. And don't just link to other questions and blog posts. Try to take the essence of them and then rather link to source (especially since the source could disappear. Also, please try to edit and improve answers instead of created lots of very similar ones.

我想也许最好的方法是首先用快速简短的答案创建一个答案(如果可能的话,尽可能接近单行),然后可能会提供更多信息并以相关问题,讨论,博客文章等结束。 ,如果有的话。然后,我可以创建一个帖子作为接受的答案(将其置于顶部),只需一个“目录”。尽量保持简洁明了。而且不要只链接到其他问题和博客文章。尝试采用它们的本质然后链接到源(特别是因为源可能会消失。另外,请尝试编辑和改进答案,而不是创建许多非常相似的答案。

I am not a very good technical writer, but I will at least try to format answers so they look alike, create the table of contents, etc. I will also try to search up some of the related questions here at SO that answers parts of these and maybe pull out the essence of the ones I can manage. But since I am not very stable on this topic, I will try to stay away for the most part :p


10 个解决方案



Table of contents

Things that I would like to be covered, but haven't been yet:


  • How to create the integer (How to "convert" an object into an int wasn't very obvious to me anyways).
  • 如何创建整数(如何将对象“转换”为int对我来说不是很明显)。
  • What fields to base the hash code upon.
    • If it should only be on immutable fields, what if there are only mutable ones?
    • 如果它只应该在不可变字段上,那么如果只有可变字段呢?
  • 基于哈希代码的字段。如果它只应该在不可变字段上,那么如果只有可变字段呢?
  • How to generate a good random distribution. (MSDN Property #3)
    • Part to this, seems to choose a good magic prime number (have seen 17, 23 and 397 been used), but how do you choose it, and what is it for exactly?
    • 在这方面,似乎选择了一个很好的魔术素数(已经看过使用了17,23和397),但是你如何选择它,它究竟是什么呢?
  • 如何生成良好的随机分布。 (MSDN Property#3)对此,似乎选择了一个好的魔术素数(已经看过使用了17,23和397),但是你如何选择它,它究竟是什么呢?
  • How to make sure the hash code stays the same all through the object lifetime. (MSDN Property #2)
    • Especially when the equality is based upon mutable fields. (MSDN Property #1)
    • 特别是当相等性基于可变字段时。 (MSDN Property#1)
  • 如何确保哈希代码在整个对象生存期内保持不变。 (MSDN属性#2)特别是当相等性基于可变字段时。 (MSDN Property#1)
  • How to deal with fields that are complex types (not among the built-in C# types).
    • Complex objects and structs, arrays, collections, lists, dictionaries, generic types, etc.
    • 复杂对象和结构,数组,集合,列表,字典,泛型类型等。
    • For example, even though the list or dictionary might be readonly, that doesn't mean the contents of it are.
    • 例如,即使列表或字典可能只读,但这并不意味着它的内容。
  • 如何处理复杂类型的字段(不在内置的C#类型中)。复杂对象和结构,数组,集合,列表,字典,泛型类型等。例如,即使列表或字典可能只读,但这并不意味着它的内容。
  • How to deal with inherited classes.
    • Should you somehow incorporate base.GetHashCode() into your hash code?
    • 你应该以某种方式将base.GetHashCode()合并到你的哈希代码中吗?
  • 如何处理继承的类。你应该以某种方式将base.GetHashCode()合并到你的哈希代码中吗?
  • Could you technically just be lazy and return 0? Would heavily break MSDN guideline number #3, but would at least make sure #1 and #2 were always true :P
  • 你在技术上可能只是懒惰并返回0吗?将严重打破MSDN指南#3,但至少会确保#1和#2始终为真:P
  • Common pitfalls and gotchas.
  • 常见的陷阱和陷阱。



What are those magic numbers often seen in GetHashCode implementations?

They are prime numbers. Prime numbers are used for creating hash codes because prime number maximize the usage of the hash code space.


Specifically, start with the small prime number 3, and consider only the low-order nybbles of the results:


  • 3 * 1 = 3 = 3(mod 8) = 0011
  • 3 * 1 = 3 = 3(mod 8)= 0011
  • 3 * 2 = 6 = 6(mod 8) = 1010
  • 3 * 2 = 6 = 6(mod 8)= 1010
  • 3 * 3 = 9 = 1(mod 8) = 0001
  • 3 * 3 = 9 = 1(mod 8)= 0001
  • 3 * 4 = 12 = 4(mod 8) = 1000
  • 3 * 4 = 12 = 4(mod 8)= 1000
  • 3 * 5 = 15 = 7(mod 8) = 1111
  • 3 * 5 = 15 = 7(mod 8)= 1111
  • 3 * 6 = 18 = 2(mod 8) = 0010
  • 3 * 6 = 18 = 2(mod 8)= 0010
  • 3 * 7 = 21 = 5(mod 8) = 1001
  • 3 * 7 = 21 = 5(mod 8)= 1001
  • 3 * 8 = 24 = 0(mod 8) = 0000
  • 3 * 8 = 24 = 0(mod 8)= 0000
  • 3 * 9 = 27 = 3(mod 8) = 0011
  • 3 * 9 = 27 = 3(mod 8)= 0011

And we start over. But you'll notice that successive multiples of our prime generated every possible permutation of bits in our nybble before starting to repeat. We can get the same effect with any prime number and any number of bits, which makes prime numbers optimal for generating near-random hash codes. The reason we usually see larger primes instead of small primes like 3 in the example above is that, for greater numbers of bits in our hash code, the results obtained from using a small prime are not even pseudo-random - they're simply an increasing sequence until an overflow is encountered. For optimal randomness, a prime number that results in overflow for fairly small coefficients should be used, unless you can guarantee that your coefficients will not be small.

我们重新开始。但是你会注意到,在开始重复之前,我们的素数的连续倍数在我们的nybble中生成了每个可能的位排列。我们可以使用任何素数和任意数量的位获得相同的效果,这使得素数最适合生成近随机哈希码。我们通常在上面的例子中看到较大的素数而不是像3这样的小素数的原因是,对于哈希码中更大的比特数,使用小素数得到的结果甚至不是伪随机的 - 它们只是一个增加序列直到遇到溢出。为了获得最佳随机性,应使用导致相当小系数溢出的素数,除非您可以保证系数不会很小。

Related links:




Check out Guidelines and rules for GetHashCode by Eric Lippert

查看Eric Lippert的GetHashCode指南和规则



You should override it whenever you have a meaningful measure of equality for objects of that type (i.e. you override Equals). If you knew the object wasn't going to be hashed for any reason you could leave it, but it's unlikely you could know this in advance.


The hash should be based only on the properties of the object that are used to define equality since two objects that are considered equal should have the same hash code. In general you would usually do something like:


public override int GetHashCode()
    int mc = //magic constant, usually some prime
    return mc * prop1.GetHashCode() * prop2.GetHashCode * ... * propN.GetHashCode();

I usually assume multiplying the values together will produce a fairly uniform distribution, assuming each property's hashcode function does the same, although this may well be wrong. Using this method, if the objects equality-defining properties change, then the hash code is also likely to change, which is acceptable given definition #2 in your question. It also deals with all types in a uniform way.


You could return the same value for all instances, although this will make any algorithms that use hashing (such as dictionarys) very slow - essentially all instances will be hashed to the same bucket and lookup will then become O(n) instead of the expected O(1). This of course negates any benefits of using such structures for lookup.

您可以为所有实例返回相同的值,但这会使任何使用散列的算法(例如dictionarys)非常慢 - 基本上所有实例都将被散列到同一个桶,然后查找将变为O(n)而不是预期O(1)。这当然否定了使用这种结构进行查找的任何好处。



Why do I have to override object.GetHashCode()?

Overriding this method is important because the following property must always remain true:


If two objects compare as equal, the GetHashCode method for each object must return the same value.


The reason, as stated by JaredPar in a blog post on implementing equality, is that


Many classes use the hash code to classify an object. In particular hash tables and dictionaries tend to place objects in buckets based on their hash code. When checking if an object is already in the hash table it will first look for it in a bucket. If two objects are equal but have different hash codes they may be put into different buckets and the dictionary would fail to lookup the object.


Related links:



A) You must override both Equals and GetHashCode if you want to employ value equality instead of the default reference equality. With the later, two object references compare as equal if they both refer to the same object instance. With the former they compare as equal if their value is the same even if they refer to different objects. For example, you probably want to employ value equality for Date, Money, and Point objects.


B) In order to implement value equality you must override Equals and GetHashCode. Both should depend on the fields of the object that encapsulate the value. For example, Date.Year, Date.Month and Date.Day; or Money.Currency and Money.Amount; or Point.X, Point.Y and Point.Z. You should also consider overriding operator ==, operator !=, operator <, and operator >.

B)为了实现值相等,您必须重写Equals和GetHashCode。两者都应该取决于封装该值的对象的字段。例如,Date.Year,Date.Month和Date.Day;或Money.Currency和Money.Amount;或Point.X,Point.Y和Point.Z。您还应该考虑重写operator ==,operator!=,operator <和operator> 。

C) The hashcode doesn't have to stay constant all through the object lifetime. However it must remain immutable while it participates as the key in a hash. From MSDN doco for Dictionary: "As long as an object is used as a key in the Dictionary<(Of <(TKey, TValue>)>), it must not change in any way that affects its hash value." If you must change the value of a key remove the entry from the dictionary, change the key value, and replace the entry.

C)哈希码不必在整个对象生存期内保持不变。但是,当它作为哈希中的键参与时,它必须保持不可变。从MSDN doco for Dictionary:“只要一个对象被用作Dictionary <(Of <(TKey,TValue>)>)中的一个键,它就不能以任何影响其哈希值的方式改变。”如果必须更改密钥的值,请从字典中删除条目,更改密钥值,然后替换该条目。

D) IMO, you will simplify your life if your value objects are themselves immutable.




When do I override object.GetHashCode()?

As MSDN states:


Types that override Equals must also override GetHashCode ; otherwise, Hashtable might not work correctly.


Related links:




What fields to base the hash code upon? If it should only be on immutable fields, what if there are only mutable ones?

It doesn't need to be based only on immutable fields. I would base it on the fields that determine the outcome of the equals method.




How to make sure the hash code stays the same all through the object lifetime. (MSDN Property #2) Especially when the equality is based upon mutable fields. (MSDN Property #1)

You seem to misunderstand Property #2. The hashcode doesn't need to stay the same thoughout the objects lifetime. It just needs to stay the same as long as the values that determine the outcome of the equals method are not changed. So logically, you base the hashcode on those values only. Then there shouldn't be a problem.




public override int GetHashCode()
    return IntProp1 ^ IntProp2 ^ StrProp3.GetHashCode() ^ StrProp4.GetHashCode ^ CustomClassProp.GetHashCode;

Do the same in the customClass's GetHasCode method. Works like a charm.




Table of contents

Things that I would like to be covered, but haven't been yet:


  • How to create the integer (How to "convert" an object into an int wasn't very obvious to me anyways).
  • 如何创建整数(如何将对象“转换”为int对我来说不是很明显)。
  • What fields to base the hash code upon.
    • If it should only be on immutable fields, what if there are only mutable ones?
    • 如果它只应该在不可变字段上,那么如果只有可变字段呢?
  • 基于哈希代码的字段。如果它只应该在不可变字段上,那么如果只有可变字段呢?
  • How to generate a good random distribution. (MSDN Property #3)
    • Part to this, seems to choose a good magic prime number (have seen 17, 23 and 397 been used), but how do you choose it, and what is it for exactly?
    • 在这方面,似乎选择了一个很好的魔术素数(已经看过使用了17,23和397),但是你如何选择它,它究竟是什么呢?
  • 如何生成良好的随机分布。 (MSDN Property#3)对此,似乎选择了一个好的魔术素数(已经看过使用了17,23和397),但是你如何选择它,它究竟是什么呢?
  • How to make sure the hash code stays the same all through the object lifetime. (MSDN Property #2)
    • Especially when the equality is based upon mutable fields. (MSDN Property #1)
    • 特别是当相等性基于可变字段时。 (MSDN Property#1)
  • 如何确保哈希代码在整个对象生存期内保持不变。 (MSDN属性#2)特别是当相等性基于可变字段时。 (MSDN Property#1)
  • How to deal with fields that are complex types (not among the built-in C# types).
    • Complex objects and structs, arrays, collections, lists, dictionaries, generic types, etc.
    • 复杂对象和结构,数组,集合,列表,字典,泛型类型等。
    • For example, even though the list or dictionary might be readonly, that doesn't mean the contents of it are.
    • 例如,即使列表或字典可能只读,但这并不意味着它的内容。
  • 如何处理复杂类型的字段(不在内置的C#类型中)。复杂对象和结构,数组,集合,列表,字典,泛型类型等。例如,即使列表或字典可能只读,但这并不意味着它的内容。
  • How to deal with inherited classes.
    • Should you somehow incorporate base.GetHashCode() into your hash code?
    • 你应该以某种方式将base.GetHashCode()合并到你的哈希代码中吗?
  • 如何处理继承的类。你应该以某种方式将base.GetHashCode()合并到你的哈希代码中吗?
  • Could you technically just be lazy and return 0? Would heavily break MSDN guideline number #3, but would at least make sure #1 and #2 were always true :P
  • 你在技术上可能只是懒惰并返回0吗?将严重打破MSDN指南#3,但至少会确保#1和#2始终为真:P
  • Common pitfalls and gotchas.
  • 常见的陷阱和陷阱。



What are those magic numbers often seen in GetHashCode implementations?

They are prime numbers. Prime numbers are used for creating hash codes because prime number maximize the usage of the hash code space.


Specifically, start with the small prime number 3, and consider only the low-order nybbles of the results:


  • 3 * 1 = 3 = 3(mod 8) = 0011
  • 3 * 1 = 3 = 3(mod 8)= 0011
  • 3 * 2 = 6 = 6(mod 8) = 1010
  • 3 * 2 = 6 = 6(mod 8)= 1010
  • 3 * 3 = 9 = 1(mod 8) = 0001
  • 3 * 3 = 9 = 1(mod 8)= 0001
  • 3 * 4 = 12 = 4(mod 8) = 1000
  • 3 * 4 = 12 = 4(mod 8)= 1000
  • 3 * 5 = 15 = 7(mod 8) = 1111
  • 3 * 5 = 15 = 7(mod 8)= 1111
  • 3 * 6 = 18 = 2(mod 8) = 0010
  • 3 * 6 = 18 = 2(mod 8)= 0010
  • 3 * 7 = 21 = 5(mod 8) = 1001
  • 3 * 7 = 21 = 5(mod 8)= 1001
  • 3 * 8 = 24 = 0(mod 8) = 0000
  • 3 * 8 = 24 = 0(mod 8)= 0000
  • 3 * 9 = 27 = 3(mod 8) = 0011
  • 3 * 9 = 27 = 3(mod 8)= 0011

And we start over. But you'll notice that successive multiples of our prime generated every possible permutation of bits in our nybble before starting to repeat. We can get the same effect with any prime number and any number of bits, which makes prime numbers optimal for generating near-random hash codes. The reason we usually see larger primes instead of small primes like 3 in the example above is that, for greater numbers of bits in our hash code, the results obtained from using a small prime are not even pseudo-random - they're simply an increasing sequence until an overflow is encountered. For optimal randomness, a prime number that results in overflow for fairly small coefficients should be used, unless you can guarantee that your coefficients will not be small.

我们重新开始。但是你会注意到,在开始重复之前,我们的素数的连续倍数在我们的nybble中生成了每个可能的位排列。我们可以使用任何素数和任意数量的位获得相同的效果,这使得素数最适合生成近随机哈希码。我们通常在上面的例子中看到较大的素数而不是像3这样的小素数的原因是,对于哈希码中更大的比特数,使用小素数得到的结果甚至不是伪随机的 - 它们只是一个增加序列直到遇到溢出。为了获得最佳随机性,应使用导致相当小系数溢出的素数,除非您可以保证系数不会很小。

Related links:




Check out Guidelines and rules for GetHashCode by Eric Lippert

查看Eric Lippert的GetHashCode指南和规则



You should override it whenever you have a meaningful measure of equality for objects of that type (i.e. you override Equals). If you knew the object wasn't going to be hashed for any reason you could leave it, but it's unlikely you could know this in advance.


The hash should be based only on the properties of the object that are used to define equality since two objects that are considered equal should have the same hash code. In general you would usually do something like:


public override int GetHashCode()
    int mc = //magic constant, usually some prime
    return mc * prop1.GetHashCode() * prop2.GetHashCode * ... * propN.GetHashCode();

I usually assume multiplying the values together will produce a fairly uniform distribution, assuming each property's hashcode function does the same, although this may well be wrong. Using this method, if the objects equality-defining properties change, then the hash code is also likely to change, which is acceptable given definition #2 in your question. It also deals with all types in a uniform way.


You could return the same value for all instances, although this will make any algorithms that use hashing (such as dictionarys) very slow - essentially all instances will be hashed to the same bucket and lookup will then become O(n) instead of the expected O(1). This of course negates any benefits of using such structures for lookup.

您可以为所有实例返回相同的值,但这会使任何使用散列的算法(例如dictionarys)非常慢 - 基本上所有实例都将被散列到同一个桶,然后查找将变为O(n)而不是预期O(1)。这当然否定了使用这种结构进行查找的任何好处。



Why do I have to override object.GetHashCode()?

Overriding this method is important because the following property must always remain true:


If two objects compare as equal, the GetHashCode method for each object must return the same value.


The reason, as stated by JaredPar in a blog post on implementing equality, is that


Many classes use the hash code to classify an object. In particular hash tables and dictionaries tend to place objects in buckets based on their hash code. When checking if an object is already in the hash table it will first look for it in a bucket. If two objects are equal but have different hash codes they may be put into different buckets and the dictionary would fail to lookup the object.


Related links:



A) You must override both Equals and GetHashCode if you want to employ value equality instead of the default reference equality. With the later, two object references compare as equal if they both refer to the same object instance. With the former they compare as equal if their value is the same even if they refer to different objects. For example, you probably want to employ value equality for Date, Money, and Point objects.


B) In order to implement value equality you must override Equals and GetHashCode. Both should depend on the fields of the object that encapsulate the value. For example, Date.Year, Date.Month and Date.Day; or Money.Currency and Money.Amount; or Point.X, Point.Y and Point.Z. You should also consider overriding operator ==, operator !=, operator <, and operator >.

B)为了实现值相等,您必须重写Equals和GetHashCode。两者都应该取决于封装该值的对象的字段。例如,Date.Year,Date.Month和Date.Day;或Money.Currency和Money.Amount;或Point.X,Point.Y和Point.Z。您还应该考虑重写operator ==,operator!=,operator <和operator> 。

C) The hashcode doesn't have to stay constant all through the object lifetime. However it must remain immutable while it participates as the key in a hash. From MSDN doco for Dictionary: "As long as an object is used as a key in the Dictionary<(Of <(TKey, TValue>)>), it must not change in any way that affects its hash value." If you must change the value of a key remove the entry from the dictionary, change the key value, and replace the entry.

C)哈希码不必在整个对象生存期内保持不变。但是,当它作为哈希中的键参与时,它必须保持不可变。从MSDN doco for Dictionary:“只要一个对象被用作Dictionary <(Of <(TKey,TValue>)>)中的一个键,它就不能以任何影响其哈希值的方式改变。”如果必须更改密钥的值,请从字典中删除条目,更改密钥值,然后替换该条目。

D) IMO, you will simplify your life if your value objects are themselves immutable.




When do I override object.GetHashCode()?

As MSDN states:


Types that override Equals must also override GetHashCode ; otherwise, Hashtable might not work correctly.


Related links:




What fields to base the hash code upon? If it should only be on immutable fields, what if there are only mutable ones?

It doesn't need to be based only on immutable fields. I would base it on the fields that determine the outcome of the equals method.




How to make sure the hash code stays the same all through the object lifetime. (MSDN Property #2) Especially when the equality is based upon mutable fields. (MSDN Property #1)

You seem to misunderstand Property #2. The hashcode doesn't need to stay the same thoughout the objects lifetime. It just needs to stay the same as long as the values that determine the outcome of the equals method are not changed. So logically, you base the hashcode on those values only. Then there shouldn't be a problem.




public override int GetHashCode()
    return IntProp1 ^ IntProp2 ^ StrProp3.GetHashCode() ^ StrProp4.GetHashCode ^ CustomClassProp.GetHashCode;

Do the same in the customClass's GetHasCode method. Works like a charm.
