有相同hashcode但不相等的两个实例

时间:2022-04-21 14:16:24

I was reading the paragraph quoted below from an article entitled- Java theory and practice: Hashing it out - Defining hashCode() and equals() effectively and correctly

我正在阅读一篇题为“Java理论与实践:散列它——有效而正确地定义hashCode()和equals()”的文章中引用的段落

Defining equality The Object class has two methods for making inferences about an object's identity: equals() and hashCode(). In general, if you override one of these methods, you must override both, as there are important relationships between them that must be maintained. In particular, if two objects are equal according to the equals() method, they must have the same hashCode() value (although the reverse is not generally true).[emphasis added by me]

定义等式对象类有两个方法来推断对象的标识:equals()和hashCode()。通常,如果您覆盖其中一个方法,您必须覆盖这两个方法,因为它们之间有重要的关系必须维护。特别地,如果根据equals()方法,两个对象是相等的,那么它们必须具有相同的hashCode()值(尽管反向通常不是正确的)。(我添加的重点)

My question relates to the latter bit of the paragraph "although the reverse is not generally true". How is it possible for two different instances of a class to have the same hashCode but not be equal?

我的问题与该段的后一段有关,“虽然相反的情况一般不是正确的”。一个类的两个不同实例怎么会有相同的hashCode却不相等呢?

8 个解决方案

#1


16  

In simple terms hashcode () is a function to generate hash by some formula, so there can be some collisions, two different values can turn out to have same hashcode.

简单地说,hashcode()是通过某个公式生成散列的函数,因此可能会有一些冲突,两个不同的值可能会产生相同的散列代码。

If I simply calculate the hashcode by taking mod by 6, then two different values might be having same hashcode.

如果我只是通过取mod 6来计算hashcode,那么两个不同的值可能具有相同的hashcode。

#2


4  

You can consider hashes to be a bucket..

你可以把哈希看成一个水桶。

  • If two objects are equal, they will go into the same bucket (have same hashcodes)
  • 如果两个对象相等,它们将进入相同的bucket(具有相同的hashcode)
  • But, if the two objects go into the same bucket (have same hashcode), that doesn't mean that they must be equal
  • 但是,如果两个对象进入同一个bucket(具有相同的hashcode),并不意味着它们必须相等
  • Also note that, if two objects are not equal, even then they can have the same hash code.. Obviously, this infers from the above two points..
  • 还要注意,如果两个对象不相等,那么它们也可以有相同的哈希代码。很明显,这两种观点都是正确的。

So, hashcode is nothing but the hash-value for that Bucket.. Any number of objects can have same hashcode, depending upon the algorithm used to calculate the hashcodes..

所以,hashcode就是这个Bucket的hashvalue。任何数量的对象都可以有相同的哈希码,这取决于用来计算哈希码的算法。

An ideal algorithm is the one, which generates different hashcodes for different objects. So, there is ideally 1 object per bucket.. Of course this is the perfect case, which might not be possible..

一个理想的算法是为不同的对象生成不同的hashcode。理想情况下,每个桶有一个对象。当然,这是一个完美的例子,这是不可能的。

A bucket may of course contain several objects, based on some property..

根据某些属性,一个桶当然可能包含几个对象。

#3


4  

Think of hashcode as something that just reduces the effort in checking equality. If two objects are equal they will definitely have the same hashcode. However if two objects have the same hashcode, they might have a mathematically high similarity but still not be the same. Just for mindset: Think of comparing a duck to an elephant in a zoo. They are highly dissimilar and will have different abstract hashcode, so you dont have to bother comparing their legs, wings etc to check if they are same. However if you are comparing a duck and a swan, they are highly similar and have same abstract hashcode, so now you are down to comparing very minute features of each animal to check for equality. As you reduce the extremeness between two elements being compared, the abstract hashcode becomes more and more concrete. Like comparing ducks and swans has more concrete hashcode than comparing ducks and elephants, comparing different breed of ducks makes the hash code even more concrete, comparing dna of two ducks of same breed makes the hashcode even more concrete. This answer is just designed to create a mindset to understand concept of hashcode. After reading this, you must blur out the understanding of the word hashcode in context of this answer.

可以把hashcode看作是减少了检查等式的工作量。如果两个对象相等,它们肯定有相同的hashcode。然而,如果两个对象具有相同的hashcode,它们在数学上的相似性可能很高,但仍然不相同。只是为了心态:想想在动物园里把鸭子和大象作比较。它们非常不同,并且会有不同的抽象哈希码,所以你不必费心去比较它们的腿、翅膀等来检查它们是否相同。然而,如果你在比较一只鸭子和一只天鹅,它们是高度相似的,并且有相同的抽象哈希码,所以现在你需要比较每只动物的每一个细微特征,以检查它们是否相等。当您减少两个元素之间的极端值时,抽象的hashcode变得越来越具体。就像比较鸭子和天鹅比比较鸭子和大象有更多具体的哈希码一样,比较不同品种的鸭子使哈希码更加具体,比较相同品种的两只鸭子的dna使哈希码更加具体。这个答案只是为了建立一种理解hashcode概念的思维方式。在阅读本文之后,您必须模糊理解这个答案上下文中的hashcode这个词。

#4


3  

I think the reverse is actually

我认为恰恰相反

if two objects are NOT equal according to the equals() method, they must have the A DIFFERENT hashCode() value

如果根据equals()方法,两个对象不相等,那么它们必须具有不同的hashCode()值

which clearly does not hold since generating unique hashes in the general case is not possible because you're usually trying to map a set of values onto a set of hash codes of lower cardinality.

这显然不成立,因为在一般情况下生成唯一的散列是不可能的,因为您通常试图将一组值映射到一组较低基数的散列码。

#5


2  

I will explain it using example. Let's say that hashCode() of string is based on the string length. In this case the hash code of "foo" and "bar" are equal. But "foo" itself is not equal to "bar".

我将用例子来解释它。假设字符串的hashCode()是基于字符串长度的。在这种情况下,“foo”和“bar”的哈希代码是相等的。但是foo本身并不等于bar。

It is because has code implements a kind of formula: you can determine has code for each object but cannot restore object from hash code. There can be several objects with same hash code.

这是因为have代码实现了一种公式:您可以为每个对象确定有代码,但不能从散列代码中还原对象。可以有几个具有相同哈希代码的对象。

#6


1  

You can define your hashCode() implementation to always return 1 fore example. This is perfectly valid: Different instances (which are not equal) can have the same hashCode. But the runtime performance of looking up these objects in HashMaps, Sets or other types of collections will be very poor (because they all land in the same bucket internally - the lookup performance degrades from O(1) to O(n) because you need to traverse the list of objects in the same bucket).

可以将hashCode()实现定义为总是返回一个前述示例。这是完全有效的:不同的实例(不相等)可以有相同的hashCode。但在hashmap中查找这些对象的运行时性能,集或其他类型的集合将会非常可怜的(因为他们所有的土地在同一个桶内部——查找性能从O(1)降低到O(n),因为您需要遍历对象在同一个桶)的列表。

Also consider taking a look at how HashMaps work in Java.

还可以考虑看看hashmap是如何在Java中工作的。

#7


0  

A hash code of an object is usually much smaller than the original object. This is one purpose of the hash function. So you can imagine, that if you have n different objects (say all permutations of a class) it is not possible to code them in m (where m < n) different and smaller (than the original object) unique codes.

对象的哈希代码通常比原始对象小得多。这是哈希函数的一个目的。所以你可以想象,如果你有n个不同的对象(比如一个类的所有排列),那么在m (m < n)中编码它们是不可能的(在m < n),而小于原始对象的唯一代码。

#8


0  

Let me show with an example:

让我举个例子:

suppose that the HashCode of a string obtains as follow: hashCode = sum of each character ASCII code (but we know, real hash is more complicated)

假设字符串的HashCode =每个字符ASCII码的和(但是我们知道,真正的哈希要复杂一些)

For example : hash code of "abc" calculate in such form : 49+50+51 = 150

例如:“abc”的哈希码以这样的形式计算:49+50+51 = 150

Then hash code of "acb" equals : 49+51+50 = 150

那么“acb”的哈希码等于:49+51+50 = 150

And so on. as you can see, there are many strings having hashcode=150 but they are not equal.

等等。如您所见,有许多字符串的hashcode=150,但是它们并不相等。

#1


16  

In simple terms hashcode () is a function to generate hash by some formula, so there can be some collisions, two different values can turn out to have same hashcode.

简单地说,hashcode()是通过某个公式生成散列的函数,因此可能会有一些冲突,两个不同的值可能会产生相同的散列代码。

If I simply calculate the hashcode by taking mod by 6, then two different values might be having same hashcode.

如果我只是通过取mod 6来计算hashcode,那么两个不同的值可能具有相同的hashcode。

#2


4  

You can consider hashes to be a bucket..

你可以把哈希看成一个水桶。

  • If two objects are equal, they will go into the same bucket (have same hashcodes)
  • 如果两个对象相等,它们将进入相同的bucket(具有相同的hashcode)
  • But, if the two objects go into the same bucket (have same hashcode), that doesn't mean that they must be equal
  • 但是,如果两个对象进入同一个bucket(具有相同的hashcode),并不意味着它们必须相等
  • Also note that, if two objects are not equal, even then they can have the same hash code.. Obviously, this infers from the above two points..
  • 还要注意,如果两个对象不相等,那么它们也可以有相同的哈希代码。很明显,这两种观点都是正确的。

So, hashcode is nothing but the hash-value for that Bucket.. Any number of objects can have same hashcode, depending upon the algorithm used to calculate the hashcodes..

所以,hashcode就是这个Bucket的hashvalue。任何数量的对象都可以有相同的哈希码,这取决于用来计算哈希码的算法。

An ideal algorithm is the one, which generates different hashcodes for different objects. So, there is ideally 1 object per bucket.. Of course this is the perfect case, which might not be possible..

一个理想的算法是为不同的对象生成不同的hashcode。理想情况下,每个桶有一个对象。当然,这是一个完美的例子,这是不可能的。

A bucket may of course contain several objects, based on some property..

根据某些属性,一个桶当然可能包含几个对象。

#3


4  

Think of hashcode as something that just reduces the effort in checking equality. If two objects are equal they will definitely have the same hashcode. However if two objects have the same hashcode, they might have a mathematically high similarity but still not be the same. Just for mindset: Think of comparing a duck to an elephant in a zoo. They are highly dissimilar and will have different abstract hashcode, so you dont have to bother comparing their legs, wings etc to check if they are same. However if you are comparing a duck and a swan, they are highly similar and have same abstract hashcode, so now you are down to comparing very minute features of each animal to check for equality. As you reduce the extremeness between two elements being compared, the abstract hashcode becomes more and more concrete. Like comparing ducks and swans has more concrete hashcode than comparing ducks and elephants, comparing different breed of ducks makes the hash code even more concrete, comparing dna of two ducks of same breed makes the hashcode even more concrete. This answer is just designed to create a mindset to understand concept of hashcode. After reading this, you must blur out the understanding of the word hashcode in context of this answer.

可以把hashcode看作是减少了检查等式的工作量。如果两个对象相等,它们肯定有相同的hashcode。然而,如果两个对象具有相同的hashcode,它们在数学上的相似性可能很高,但仍然不相同。只是为了心态:想想在动物园里把鸭子和大象作比较。它们非常不同,并且会有不同的抽象哈希码,所以你不必费心去比较它们的腿、翅膀等来检查它们是否相同。然而,如果你在比较一只鸭子和一只天鹅,它们是高度相似的,并且有相同的抽象哈希码,所以现在你需要比较每只动物的每一个细微特征,以检查它们是否相等。当您减少两个元素之间的极端值时,抽象的hashcode变得越来越具体。就像比较鸭子和天鹅比比较鸭子和大象有更多具体的哈希码一样,比较不同品种的鸭子使哈希码更加具体,比较相同品种的两只鸭子的dna使哈希码更加具体。这个答案只是为了建立一种理解hashcode概念的思维方式。在阅读本文之后,您必须模糊理解这个答案上下文中的hashcode这个词。

#4


3  

I think the reverse is actually

我认为恰恰相反

if two objects are NOT equal according to the equals() method, they must have the A DIFFERENT hashCode() value

如果根据equals()方法,两个对象不相等,那么它们必须具有不同的hashCode()值

which clearly does not hold since generating unique hashes in the general case is not possible because you're usually trying to map a set of values onto a set of hash codes of lower cardinality.

这显然不成立,因为在一般情况下生成唯一的散列是不可能的,因为您通常试图将一组值映射到一组较低基数的散列码。

#5


2  

I will explain it using example. Let's say that hashCode() of string is based on the string length. In this case the hash code of "foo" and "bar" are equal. But "foo" itself is not equal to "bar".

我将用例子来解释它。假设字符串的hashCode()是基于字符串长度的。在这种情况下,“foo”和“bar”的哈希代码是相等的。但是foo本身并不等于bar。

It is because has code implements a kind of formula: you can determine has code for each object but cannot restore object from hash code. There can be several objects with same hash code.

这是因为have代码实现了一种公式:您可以为每个对象确定有代码,但不能从散列代码中还原对象。可以有几个具有相同哈希代码的对象。

#6


1  

You can define your hashCode() implementation to always return 1 fore example. This is perfectly valid: Different instances (which are not equal) can have the same hashCode. But the runtime performance of looking up these objects in HashMaps, Sets or other types of collections will be very poor (because they all land in the same bucket internally - the lookup performance degrades from O(1) to O(n) because you need to traverse the list of objects in the same bucket).

可以将hashCode()实现定义为总是返回一个前述示例。这是完全有效的:不同的实例(不相等)可以有相同的hashCode。但在hashmap中查找这些对象的运行时性能,集或其他类型的集合将会非常可怜的(因为他们所有的土地在同一个桶内部——查找性能从O(1)降低到O(n),因为您需要遍历对象在同一个桶)的列表。

Also consider taking a look at how HashMaps work in Java.

还可以考虑看看hashmap是如何在Java中工作的。

#7


0  

A hash code of an object is usually much smaller than the original object. This is one purpose of the hash function. So you can imagine, that if you have n different objects (say all permutations of a class) it is not possible to code them in m (where m < n) different and smaller (than the original object) unique codes.

对象的哈希代码通常比原始对象小得多。这是哈希函数的一个目的。所以你可以想象,如果你有n个不同的对象(比如一个类的所有排列),那么在m (m < n)中编码它们是不可能的(在m < n),而小于原始对象的唯一代码。

#8


0  

Let me show with an example:

让我举个例子:

suppose that the HashCode of a string obtains as follow: hashCode = sum of each character ASCII code (but we know, real hash is more complicated)

假设字符串的HashCode =每个字符ASCII码的和(但是我们知道,真正的哈希要复杂一些)

For example : hash code of "abc" calculate in such form : 49+50+51 = 150

例如:“abc”的哈希码以这样的形式计算:49+50+51 = 150

Then hash code of "acb" equals : 49+51+50 = 150

那么“acb”的哈希码等于:49+51+50 = 150

And so on. as you can see, there are many strings having hashcode=150 but they are not equal.

等等。如您所见,有许多字符串的hashcode=150,但是它们并不相等。