在某些条件下,布隆过滤器是否会返回误报?

时间:2020-12-30 04:06:23

Assume a bloom filter api, with 2 parameters - 1. number of bits in bloom filter (n) and 2. expected number of insertions (m).

假设具有2个参数的布隆过滤器api - 1.布隆过滤器中的比特数(n)和2.预期的插入数量(m)。

Question:

Will m > n always lead to complete false positives? By complete I intend to say, will every test for 'contains(element)' method return true, after m > n condition ?

m> n总会导致完全误报吗?通过完成我打算说,在m> n条件之后,'contains(element)'方法的每个测试都会返回true吗?

1 个解决方案

#1


1  

The bloom filter will always answer yes not when your m > n, but when all n of its bits are 1 - then every query of h positions (where h is the number of hash functions) will yield h 1s. Still, the typical setup that optimizes the space vs. false positive rate tradeoff is when the probability of any bit being set is 1/2. The analysis is shown on the Bloom filter wikipedia article: http://en.wikipedia.org/wiki/Bloom_filter

布隆过滤器总是回答是,而不是当你的m> n,但是当它的所有n个位都是1时 - 那么h位置的每个查询(其中h是散列函数的数量)将产生h 1s。尽管如此,优化空间与误报率权衡的典型设置是当任何比特被设置的概率为1/2时。该分析显示在Bloom过滤器*文章中:http://en.wikipedia.org/wiki/Bloom_filter

#1


1  

The bloom filter will always answer yes not when your m > n, but when all n of its bits are 1 - then every query of h positions (where h is the number of hash functions) will yield h 1s. Still, the typical setup that optimizes the space vs. false positive rate tradeoff is when the probability of any bit being set is 1/2. The analysis is shown on the Bloom filter wikipedia article: http://en.wikipedia.org/wiki/Bloom_filter

布隆过滤器总是回答是,而不是当你的m> n,但是当它的所有n个位都是1时 - 那么h位置的每个查询(其中h是散列函数的数量)将产生h 1s。尽管如此,优化空间与误报率权衡的典型设置是当任何比特被设置的概率为1/2时。该分析显示在Bloom过滤器*文章中:http://en.wikipedia.org/wiki/Bloom_filter