在大小为n*k+b的数组中查找发生b次的元素

Description

描述

Given an Array of size (n*k+b) where n elements occur k times and one element occurs b times, in other words there are n+1 distinct Elements. Given that 0 < b < k find the element occurring b times.

给定一个大小为(n*k+b)的数组，其中n个元素出现k次，一个元素出现b次，换句话说，有n+1个不同的元素。假设0 < b < k，找到发生b次的元素。

My Attempted solutions

我试图解决方案

Obvious solution will be using hashing but it will not work if the numbers are very large. Complexity is O(n)

显然的解决方案是使用散列，但是如果数字很大，它就不能工作。复杂度是O(n)
Using map to store the frequencies of each element and then traversing map to find the element occurring b times.As Map's are implemented as height balanced trees Complexity will be O(nlogn).

使用map来存储每个元素的频率，然后遍历map以查找发生b次的元素。当Map被实现为高度平衡树时，复杂度将是O(nlogn)。

Both of my solution were accepted but the interviewer wanted a linear solution without using hashing and hint he gave was make the height of tree constant in tree in which you are storing frequencies, but I am not able to figure out the correct solution yet.

我的两个解决方案都被接受了，但是面试官想要一个线性的解决方案而没有使用散列和暗示，他给出的是使树的树的高度恒定，在树中存储频率，但是我还不能找到正确的解决方案。

I want to know how to solve this problem in linear time without hashing?

我想知道如何在线性时间内不哈希地解决这个问题?

EDIT:

编辑:

Sample:

示例:

Input: n=2 b=2 k=3

输入:n = 2 b = 2 k = 3

Aarray: 2 2 2 3 3 3 1 1

数组:2 2 2 2 3 3 3 1 1

Output: 1

输出:1

5 个解决方案

#1

An idea using cyclic groups.

使用循环群的想法。

To guess i-th bit of answer, follow this procedure:

要猜测第i位答案，请按照以下步骤:

Count how many numbers in array has i-th bit set, store as cnt
数数数组中有多少个数字设置为第i位，存储为cnt
If cnt % k is non-zero, then i-th bit of answer is set. Otherwise it is clear.
如果cnt % k是非零的，那么就设置了i-th比特，否则就很清楚了。

To guess whole number, repeat the above for every bit.

要猜出整个数字，请对每一比特重复以上内容。

This solution is technically O((n*k+b)*log max N), where max N is maximal value in the table, but because number of bits is usually constant, this solution is linear in array size.

这个解在技术上是O((n*k+b)*log max n)，其中max n是表中的最大值，但是因为位的个数通常是常数，所以这个解在数组大小上是线性的。

No hashing, memory usage is O(log k * log max N).

没有哈希，内存使用是O(log k * log max N)。

Example implementation:

示例实现:

from random import randint, shuffle

def generate_test_data(n, k, b):
    k_rep = [randint(0, 1000) for i in xrange(n)]
    b_rep = [randint(0, 1000)]
    numbers = k_rep*k + b_rep*b
    shuffle(numbers)
    print "k_rep: ", k_rep
    print "b_rep: ", b_rep
    return numbers

def solve(data, k):
    cnts = [0]*10
    for number in data:
        bits = [number >> b & 1 for b in xrange(10)]
        cnts = [cnts[i] + bits[i] for i in xrange(10)]
    return reduce(lambda a,b:2*a+(b%k>0), reversed(cnts), 0)

print "Answer: ", solve(generate_test_data(10, 15, 13), 3)

#2

I assume:

我认为:

The elements of the array are comparable.
数组的元素是可比较的。
We know the values of n and k beforehand.
我们事先知道n和k的值。
A solution O(n*k+b) is good enough.
解O(n*k+b)就足够了。

Let the number occuring only b times be S. We are trying to find the S in an array of n*k+b size.

让只出现b的数为S，我们试图在n*k+b的数组中找到S。

Recursive Step: Find the median element of the current array slice as in Quick Sort in lineer time. Let the median element be M.

递归步骤:找到当前数组切片的中值元素，就像在lineer时间中快速排序一样。中位数是M。

After the recursive step you have an array where all elements smaller than M occur on the left of the first occurence of M. All M elements are next to each other and all element larger than M are on the right of all occurences of M.

在递归步骤之后，你会有一个数组，其中所有小于M的元素都出现在M的第一次出现的左边，所有M元素都是相邻的，所有大于M的元素都在M的所有发生的右边。

Look at the index of the leftmost M and calculate whether S<M or S>=M. Recurse either on the left slice or the right slice.

看最左边的M的指数，计算S =M。递归在左片或右片上。还是s>

So you are doing a quick sort but delving only one part of the divisions at any time. You will recurse O(logN) times but each time with 1/2, 1/4, 1/8, .. sizes of the original array, so the total time will still be O(n).

所以你在做一个快速排序，但在任何时候只进行一个部分的划分。递归O(logN)次但每次都是1/2,1/4,1/8。原始数组的大小，所以总时间仍然是O(n)

Clarification: Let's say n=20 and k = 10. Then, there are 21 distinct elements in the array, 20 of which occur 10 times and the last occur let's say 7 times. I find the medium element, let's say it is 1111. If the S<1111 than the index of the leftmost occurence of 1111 will be less than 11*10. If S>=1111 then the index will be equal to 11*10.

说明:假设n=20 k = 10。然后，数组中有21个不同的元素，其中20个发生了10次，最后一次发生了7次。找到中间元素，假设是1111。如果S<1111，则最左发生率为1111的指数将小于11*10。如果S>=1111，那么指数将等于11*10。

Full example: n = 4. k = 3. Array = {1,2,3,4,5,1,2,3,4,5,1,2,3,5} After the first recursive step I find the median element is 3 and the array is something like: {1,2,1,2,1,2,3,3,3,5,4,5,5,4} There are 6 elements on the left of 3. 6 is a multiple of k=3. So each element must be occuring 3 times there. So S>=3. Recurse on the right side. And so on.

完整的例子:n = 4。k = 3。数组={1、2、3、4、5、1、2、4、5、1、2、3、5}在第一个递归步骤之后，我发现中位数元素是3，数组是这样的:{1、2、1、2、2、3、3、5、5、5、4}在3的左边有6个元素。6是k=3的倍数。所以每个元素必须出现3次。所以年代> = 3。在右边递归。等等。

#3

In order to have a constant height B-tree containing n distinct elements, with height h constant, you need z=n^(1/h) children per nodes: h=log_z(n), thus h=log(n)/log(z), thus log(z)=log(n)/h, thus z=e^(log(n)/h), thus z=n^(1/h).

为了有一个恒定的高度b -树包含n个不同的元素,用高度h常数,你需要z = n ^(1 / h)孩子每节点:h = log_z(n),因此h = log(n)/ log(z),因此日志(z)= log(n)/ h,因此z = e ^(log(n)/ h),因此z = n ^(1 / h)。

Example, with n=1000000, h=10, z=3.98, that is z=4.

例如，n=1000000, h=10, z=3.98，即z=4。

The time to reach a node in that case is O(h.log(z)). Assuming h and z to be "constant" (since N=n.k, then log(z)=log(n^(1/h))=log(N/k^(1/h))=ct by properly choosing h based on k, you can then say that O(h.log(z))=O(1)... This is a bit far-fetched, but maybe that was the kind of thing the interviewer wanted to hear?

在这种情况下，到达节点的时间是O(h.log(z))。假设h和z是“常数”(因为N= N。k,然后log(z)= log(n ^(1 / h))= log(n / k ^(1 / h))= ct通过正确选择基于k h,然后说O(h.log(z))= O(1)……这听起来有点牵强，但也许这正是面试官想要听到的?

#4

UPDATE: this one use hashing, so it's not a good answer :(

更新:这个使用散列，所以它不是一个好的答案:(

in python this would be linear time (set will remove the duplicates):

在python中，这是线性时间(set将删除副本):

result = (sum(set(arr))*k - sum(arr)) / (k - b)

#5

If 'k' is even and 'b' is odd, then XOR will do. :)

如果k是偶数，b是奇数，那么x就可以了。:)

#1