为一个位数组生成一个好的哈希码(GetHashCode)。

时间:2023-01-24 16:11:39

I need to generate a fast hash code in GetHashCode for a BitArray. I have a Dictionary where the keys are BitArrays, and all the BitArrays are of the same length.

我需要在GetHashCode中为位数组生成一个快速哈希代码。我有一个字典,其中的键是位数组,所有的位数组的长度都是相同的。

Does anyone know of a fast way to generate a good hash from a variable number of bits, as in this scenario?

有没有人知道一种快速的方法可以从可变的比特数生成一个好的散列,就像在这个场景中那样?

UPDATE:

更新:

The approach I originally took was to access the internal array of ints directly through reflection (speed is more important than encapsulation in this case), then XOR those values. The XOR approach seems to work well i.e. my 'Equals' method isn't called excessively when searching in the Dictionary:

我最初采用的方法是通过反射直接访问int的内部数组(在这种情况下,速度比封装更重要),然后是XOR这些值。XOR方法似乎很有效,即在字典中搜索时,我的“等”方法不会被过度调用:

    public int GetHashCode(BitArray array)
    {
        int hash = 0;
        foreach (int value in array.GetInternalValues())
        {
            hash ^= value;
        }
        return hash;
    }

However, the approach suggested by Mark Byers and seen elsewhere on * was slightly better (16570 Equals calls vs 16608 for the XOR for my test data). Note that this approach fixes a bug in the previous one where bits beyond the end of the bit array could affect the hash value. This could happen if the bit array was reduced in length.

但是,Mark Byers建议的方法和在*上看到的方法稍微好一些(16570等于调用XOR的16608,用于测试数据)。注意,此方法修复了前面的一个bug,其中位数组末尾之外的位可能会影响散列值。如果位数组的长度减少,就会发生这种情况。

    public int GetHashCode(BitArray array)
    {
        UInt32 hash = 17;
        int bitsRemaining = array.Length;
        foreach (int value in array.GetInternalValues())
        {
            UInt32 cleanValue = (UInt32)value;
            if (bitsRemaining < 32)
            {
                //clear any bits that are beyond the end of the array
                int bitsToWipe = 32 - bitsRemaining;
                cleanValue <<= bitsToWipe;
                cleanValue >>= bitsToWipe;
            }

            hash = hash * 23 + cleanValue;
            bitsRemaining -= 32;
        }
        return (int)hash;
    }

The GetInternalValues extension method is implemented like this:

GetInternalValues扩展方法的实现如下:

public static class BitArrayExtensions
{
    static FieldInfo _internalArrayGetter = GetInternalArrayGetter();

    static FieldInfo GetInternalArrayGetter()
    {
        return typeof(BitArray).GetField("m_array", BindingFlags.NonPublic | BindingFlags.Instance);
    }

    static int[] GetInternalArray(BitArray array)
    {
        return (int[])_internalArrayGetter.GetValue(array);
    }

    public static IEnumerable<int> GetInternalValues(this BitArray array)
    {
        return GetInternalArray(array);
    }

... more extension methods
}

Any suggestions for improvement are welcome!

欢迎提出改进建议!

2 个解决方案

#1


1  

If the bit arrays are 32 bits or shorter then you just need to convert them to 32 bit integers (padding with zero bits if necessary).

如果位数组是32位或更短,那么只需将它们转换为32位整数(如果需要,填充为0位)。

If they can be longer then you can either convert them to a series of 32-bit integers and XOR them, or better: use the algorithm described in Effective Java.

如果它们可能更长,那么您可以将它们转换为一系列32位整数和XOR,或者更好:使用有效Java中描述的算法。

public int GetHashCode()
{
    int hash = 17;
    hash = hash * 23 + field1.GetHashCode();
    hash = hash * 23 + field2.GetHashCode();
    hash = hash * 23 + field3.GetHashCode();
    return hash;
}

Taken from here. The field1, field2 correcpond the the first 32 bits, second 32 bits, etc.

取自这里。field1、field2纠正了前32位、后32位等。

#2


3  

It is a terrible class to act as a key in a Dictionary. The only reasonable way to implement GetHashCode() is by using its CopyTo() method to copy the bits into a byte[]. That's not great, it creates a ton of garbage.

在字典里充当钥匙是一门可怕的课程。实现GetHashCode()的唯一合理方法是使用它的CopyTo()方法将比特复制到一个字节中[]。这不太好,会产生大量垃圾。

Beg, steal or borrow to use a BitVector32 instead. It has a good implementation for GetHashCode(). If you've got more than 32 bits then consider spinning your own class so you can get to the underlying array without having to copy.

用BitVector32代替乞讨,偷或借。它有一个很好的GetHashCode()实现。如果你有超过32位,那么考虑旋转你自己的类,这样你就可以不需要复制就可以到达底层数组。

#1


1  

If the bit arrays are 32 bits or shorter then you just need to convert them to 32 bit integers (padding with zero bits if necessary).

如果位数组是32位或更短,那么只需将它们转换为32位整数(如果需要,填充为0位)。

If they can be longer then you can either convert them to a series of 32-bit integers and XOR them, or better: use the algorithm described in Effective Java.

如果它们可能更长,那么您可以将它们转换为一系列32位整数和XOR,或者更好:使用有效Java中描述的算法。

public int GetHashCode()
{
    int hash = 17;
    hash = hash * 23 + field1.GetHashCode();
    hash = hash * 23 + field2.GetHashCode();
    hash = hash * 23 + field3.GetHashCode();
    return hash;
}

Taken from here. The field1, field2 correcpond the the first 32 bits, second 32 bits, etc.

取自这里。field1、field2纠正了前32位、后32位等。

#2


3  

It is a terrible class to act as a key in a Dictionary. The only reasonable way to implement GetHashCode() is by using its CopyTo() method to copy the bits into a byte[]. That's not great, it creates a ton of garbage.

在字典里充当钥匙是一门可怕的课程。实现GetHashCode()的唯一合理方法是使用它的CopyTo()方法将比特复制到一个字节中[]。这不太好,会产生大量垃圾。

Beg, steal or borrow to use a BitVector32 instead. It has a good implementation for GetHashCode(). If you've got more than 32 bits then consider spinning your own class so you can get to the underlying array without having to copy.

用BitVector32代替乞讨,偷或借。它有一个很好的GetHashCode()实现。如果你有超过32位,那么考虑旋转你自己的类,这样你就可以不需要复制就可以到达底层数组。