Hash table: why size should be prime?

时间:2023-03-08 15:54:03

Question:

Possible Duplicate:
Why should hash functions use a prime number modulus?

Why is it necessary for a hash table's (the data structure) size to be a prime?

From what I understand, it assures a more even distribution but is there any other reason?

Answer:

The only reason is to avoid clustering of values into a small number of buckets (yes, distribution). A more even distributed hashtable will perform more consistently.

from http://srinvis.blogspot.com/2006/07/hash-table-lengths-and-prime-numbers.html

If suppose your hashCode function results in the following hashCodes among others {x , 2x, 3x, 4x, 5x, 6x...}, then all these are going to be clustered in just m number of buckets, where m = table_length/GreatestCommonFactor(table_length, x). (It is trivial
to verify/derive this). Now you can do one of the following to avoid clustering

  1. Make sure that you don't generate too many hashCodes that are multiples of another hashCode like in {x, 2x, 3x, 4x, 5x, 6x...}.But this may be kind of difficult if your hashTable is supposed to have millions of entries.

  2. Or simply make m equal to the table_length by making GreatestCommonFactor(table_length, x) equal to 1, i.e by making table_length coprime with x. And if x can be just about any number then make sure that table_length is a prime number.