Python与数据结构[4] -> 散列表[0] -> 散列表与散列函数的 Python 实现

散列表 / Hash Table

散列表与散列函数

散列表是一种将关键字映射到特定数组位置的一种数据结构，而将关键字映射到0至TableSize-1过程的函数，即为散列函数。

Hash Table:

        [0] -> A

        [1] -> B

        [2] -> C

        [3] -> D

        [4] -> E

下面以一个简单的散列函数 Hash(Key)=Key mod TableSize为例，完成一个散列表的实现。

Note: 为方便起见，这里选用了一个非素数作为TableSize，适宜的TableSize应为一个素数。

完整代码

 from collections import Iterable

 class CollisionError(Exception):

     pass

 class HashTable:

     """

     Hash Table:

         [0] -> A

         [1] -> B

         [2] -> C

         [3] -> D

         [4] -> E

     """

     def __init__(self, size, fn):

         self._array = [None for i in range(size)]

         self._hashing = fn

     def __str__(self):

         return '\n'.join('[%d] %s' % (index, item) for index, item in enumerate(self._array))

     def find(self, item):

         hash_code = self._hashing(item)

         value = self._array[hash_code]

         return value if value == item else None, hash_code

     def insert(self, *args):

         for i in args:

             if isinstance(i, Iterable):

                 for j in i:

                     self._insert(j)

             else:

                 self._insert(i)

     def _insert(self, item):

         if item is None:

             return

         hash_code = self._hashing(item)

         value = self._array[hash_code]

         if value is not None and value != item:      # Handle value 0 and value existed situation.

             raise CollisionError('Hashing value collided!')

         self._array[hash_code] = item

     def delete(self, item):

         hash_code = self._hashing(item)

         if self._array[hash_code] != item:

             raise KeyError('Key error with %s' % item)

         self._array[hash_code] = None

     def show(self):

         print(self)

     @property

     def size(self):

         return len(self._array)

     @property

     def load_factor(self):

         element_num = sum(map(lambda x: 0 if x is None else 1, self._array))

         return element_num/self.size

     def make_empty(self):

         self._array = [None for i in range(self.size)]

 def kmt_hashing(size):

     # Key = Key mod TableSize

     return lambda x: x % size

 def test(h):

     print('\nShow hash table:')

     h.show()

     print('\nInsert values:')

     h.insert(7, 8, 9)

     h.insert(range(7))

     h.show()

     print('\nInsert values (existed):')

     h.insert(1)

     h.show()

     print('\nInsert value (collided):')

     try:

         h.insert(11)

     except CollisionError as e:

         print(e)

     print('\nFind value:')

     print(h.find(7))

     print('\nFind value (not existed):')

     print(h.find(77))

     print('\nDelete value:')

     h.delete(7)

     h.show()

     print('\nDelete value (not existed):')

     try:

         h.delete(111)

     except KeyError as e:

         print(e)

     print('\nLoad factor is:', h.load_factor)

     print('\nClear hash table:')

     h.make_empty()

     h.show()

 if __name__ == '__main__':

     test(HashTable(10, kmt_hashing(10)))

分段解释

首先导入一个可迭代类，用于判断参数类型时使用，并定义一个散列冲突异常类

 from collections import Iterable

 class CollisionError(Exception):

     pass

接着定义一个散列表类，构造函数接收两个参数，一个用于设置散列表的大小，一个用于设置散列函数，

Note: 由于Python的列表无法像C语言中的数组一样提前声明大小，因此这里的列表需要先用None进行填充。

 class HashTable:

     """

     Hash Table:

         [0] -> A

         [1] -> B

         [2] -> C

         [3] -> D

         [4] -> E

     """

     def __init__(self, size, fn):

         self._array = [None for i in range(size)]

         self._hashing = fn

再重载__str__方法，用于更加清晰的显示散列表，

     def __str__(self):

         return '\n'.join('[%d] %s' % (index, item) for index, item in enumerate(self._array))

定义散列表的find方法，find方法的时间复杂度为O(1)，查找时仅需根据键值计算哈希值，再从散列表中获取元素即可。返回查找到的结果和对应哈希值，若未找到元素则返回None和最后查找的位置。

Note: O(1)的前提是散列函数足够简单快速

     def find(self, item):

         hash_code = self._hashing(item)

         value = self._array[hash_code]

         return value if value == item else None, hash_code

定义散列表的insert方法，首先对传入的参数进行判断，若为可迭代对象则迭代插入，否则直接插入。私有的插入方法将利用散列函数对插入值进行散列计算，然后插入对应位置，若对应位置已被占有，则引发一个冲突异常。

     def insert(self, *args):

         for i in args:

             if isinstance(i, Iterable):

                 for j in i:

                     self._insert(j)

             else:

                 self._insert(i)

     def _insert(self, item):

         if item is None:

             return

         hash_code = self._hashing(item)

         value = self._array[hash_code]

         if value is not None and value != item:      # Handle value 0 and value existed situation.

             raise CollisionError('Hashing value collided!')

         self._array[hash_code] = item

定义散列表的delete方法，当需要删除某个值时，同样先进行散列计算，找到对应散列位置，若该位置的值与删除值不同，则引发一个键错误异常，若相同或为None，则直接删除该元素。

     def delete(self, item):

         hash_code = self._hashing(item)

         if self._array[hash_code] != item:

             raise KeyError('Key error with %s' % item)

         self._array[hash_code] = None

接着定义散列表几个基本方法，包括显示散列表，获取散列表大小，计算装填因子和清空散列表。

     def show(self):

         print(self)

     @property

     def size(self):

         return len(self._array)

     @property

     def load_factor(self):

         element_num = sum(map(lambda x: 0 if x is None else 1, self._array))

         return element_num/self.size

     def make_empty(self):

         self._array = [None for i in range(self.size)]

最后，定义一个简单的散列函数Hash(Key)=Key mode TableSize。

 def kmt_hashing(size):

     # Key = Key mod TableSize

     return lambda x: x % size

以及一个测试函数，对散列表进行测试。

首先显示一个初始的散列表，

 def test(h):

     print('\nShow hash table:')

     h.show()

得到结果

Show hash table:

[0] None

[1] None

[2] None

[3] None

[4] None

[5] None

[6] None

[7] None

[8] None

[9] None

接着测试插入方法，向散列表中插入元素

     print('\nInsert values:')

     h.insert(7, 8, 9)

     h.insert(range(7))

     h.show()

得到结果

Insert values:

[0] 0

[1] 1

[2] 2

[3] 3

[4] 4

[5] 5

[6] 6

[7] 7

[8] 8

[9] 9

尝试插入已存在的元素，则没有影响，而尝试插入一个冲突元素，则会引发一个冲突异常

     print('\nInsert values (existed):')

     h.insert(1)

     h.show()

     print('\nInsert value (collided):')

     try:

         h.insert(11)

     except CollisionError as e:

         print(e)

显示结果

Insert values (existed):

[0] 0

[1] 1

[2] 2

[3] 3

[4] 4

[5] 5

[6] 6

[7] 7

[8] 8

[9] 9

Insert value (collided):

Hashing value collided!

尝试查找一个存在的元素和一个不存在的元素

     print('\nFind value:')

     print(h.find(7))

     print('\nFind value (not existed):')

     print(h.find(77))

得到结果

Find value:

(7, 7)

Find value (not existed):

(None, 7)

尝试删除一个存在元素和一个不存在的元素

     print('\nDelete value:')

     h.delete(7)

     h.show()

     print('\nDelete value (not existed):')

     try:

         h.delete(111)

     except KeyError as e:

         print(e)

得到结果

Delete value:

[0] 0

[1] 1

[2] 2

[3] 3

[4] 4

[5] 5

[6] 6

[7] None

[8] 8

[9] 9

Delete value (not existed):

'Key error with 111'

查看装载因子，最后清空散列表

     print('\nLoad factor is:', h.load_factor)

     print('\nClear hash table:')

     h.make_empty()

     h.show()

得到结果

Load factor is: 0.9

Clear hash table:

[0] None

[1] None

[2] None

[3] None

[4] None

[5] None

[6] None

[7] None

[8] None

[9] None

一个基本的散列表基本建立完成，但还存在一个插入冲突的问题没有解决，对于插入冲突现象，解决的方式主要有分离链接法和开放定址法，具体内容可参考相关阅读。

相关阅读

1. 分离链接法

2. 开放定址法