找到重复次数大于n/2的元素

There is an array (of size N) with an element repeated more than N/2 number of time and the rest of the element in the array can also be repeated but only one element is repeated more than N/2 times. Find the number.

有一个数组(大小为N)，元素重复次数超过N/2，数组中的其他元素也可以重复，但只有一个元素重复次数超过N/2次。找到这个号码。

I could think of few approaches:

我能想到的方法很少:

Naive, keep the count of each number in a hash map.
天真，在哈希映射中保存每个数字的计数。
Simplest, sort the array and the number at n/2+1 th index is the required number.
最简单的是，对数组进行排序，n/2+1索引处的数字是所需的数字。
Keep count of only consecutive duplicate values found. Check separately for the pattern where the values are stored alternatively.
只计算发现的连续重复值。分别检查保存值的模式。

Unable to think of a better solution, there has to be.

无法想出更好的解决方案，就必须有。

7 个解决方案

#1

There is a beautiful algorithm for solving this that works in two passes (total time O(N)) using only constant external space (O(1)). I have an implementation of this algorithm, along with comments including a correctness proof, available here

有一个很好的算法可以解决这个问题，它只需要使用常数外部空间(O(1)就可以在两次遍历(总时间O(N))中完成。我有这个算法的一个实现，以及包括正确性证明在内的注释，可以在这里找到

The intuition behind the algorithm is actually quite beautiful. Suppose that you were to have a roomful of people each holding one element of the array. Whenever two people find each other where neither is holding the same array element as the other, the two of them sit down. Eventually, at the very end, if anyone is left standing, there's a chance that they're in the majority, and you can just check that element. As long as one element occurs with frequency at least N/2, you can guarantee that this approach will always find the majority element.

这个算法背后的直觉其实很漂亮。假设有一屋子人每个人都持有数组的一个元素。当两个人发现彼此都不与对方拥有相同的数组元素时，他们俩就会坐下来。最后，最后，如果有人站着，有可能他们是多数派，你可以检查这个元素。只要一个元素出现的频率至少为N/2，就可以保证这种方法总是找到大多数元素。

To actually implement the algorithm, you make a linear scan over the array and keep track of your current guess as to what the majority element is, along with the number of times that you've seen it so far. Initially, this guess is undefined and the number of repeats is zero. As you walk across the array, if the current element matches your guess, you increment the counter. If the current element doesn't match your guess, you decrement the counter. If the counter ever hits zero, then you reset it to the next element you encounter. You can think about this implementation as a concrete realization of the above "standing around in a room" algorithm. Whenever two people meet with different elements, they cancel out (dropping the counter). Whenever two people have the same element, then they don't interact with each other.

要实现这个算法，你需要对数组进行线性扫描，跟踪当前对多数元素的猜测，以及到目前为止看到的次数。最初，这个猜测没有定义，重复的次数为零。当您遍历数组时，如果当前元素与您的猜测匹配，您将增加计数器。如果当前元素与您的猜测不匹配，您将递减计数器。如果计数器达到零，那么您将它重置为您遇到的下一个元素。您可以将此实现视为上述“站在房间中”算法的具体实现。当两个人遇到不同的元素时，他们就会抵消(放弃计数器)。当两个人有相同的元素时，他们就不会相互作用。

For a full correctness proof, citation to the original paper (by Boyer and Moore of the more famous Boyer-Moore string matching algorithm), and an implementation in C++, check out the above link.

要获得完整的正确性证明，请参考原始论文(作者是比较著名的Boyer-Moore字符串匹配算法的Boyer和Moore)，以及c++中的实现，请查看上面的链接。

#2

This is the Majority element problem. There is a single pass, constant space algorithm for this problem. Here is a brief algorithm coded in python:

这是多数要素问题。对于这个问题，有一个单一的，不变的空间算法。下面是用python编写的一个简短算法:


    import random

    items = [1, 2, 3, 4, 5, 5, 5, 5, 5 ]
    # shuffle the items
    random.shuffle(items)

    print("shuffled items: ", items)

    majority_elem = items[0]
    count = 1
    for i in range(1,len(items)):
        if items[i] == majority_elem:
            count += 1
        else: 
            count -= 1
            if count == 0:
                majority_elem = items[i]
                count = 1

    print("majority element : %d" % majority_elem )

We use a variable majority_elem to keep track of majority element and a counter (count)

我们使用变量马约公司_elem跟踪多数元素和计数器(计数)

Initially we set the first element of the array as the majority element.

首先，我们将数组的第一个元素设置为多数元素。
we navigate through the array,

我们在数组中导航，
if the current element == majority element : increment count

如果当前元素==多数元素:递增计数
else : { decrement count. if count becomes zero, set count = 1 and set majority_element = current element. }

else:{递减计数。如果count变为0，设置count = 1，设置majority_element = current元素。}

There is a variation to this problem, instead of an array, there could be a very large sequence and we do not know the length before hand. If this case, sorting or partioning is not helpful.

这个问题有一个变化，不是一个数组，而是一个非常大的序列，我们之前不知道长度。如果是这种情况，排序或分区是没有帮助的。

References:

引用:

The Art of Computer Programming, Fascicle 0: Introduction to Combinatorial Algorithms and Boolean Functions, Volume 0; Volume 4
计算机编程艺术，分集0:组合算法和布尔函数介绍，卷0;卷4

#3

If an element is repeated more than N/2 times then it must be the median. There are many algorithms that allow you to find this efficiently.

如果一个元素重复次数超过N/2，那么它一定是中位数。有很多算法可以让你有效地找到这个。

#4

Are you familiar with quicksort? It has a function called 'partition' that, given a value, divides the array into a section where all values are greater than the value (the pivot) are on one side, while all values less than the value are on the other side. Note that this is not a sort, simply a separation. The N/2 count item will be in the larger of the two sections. You can recursively apply this technique to find the element in O(n) time.

你熟悉快速排序吗?它有一个名为“partition”的函数，给定一个值，将数组分成一个部分，其中所有的值都大于值(pivot)在一边，而所有的值都小于该值在另一边。注意，这不是一种排序，只是一种分离。N/2计数项将在两个部分中较大的部分中。您可以递归地应用此技术来查找O(n)时间中的元素。

wikipedia: quicksort, or Partition-based general selection algorithm

wikipedia:快速排序，或基于分区的通用选择算法。

#5

In your second approach, you are essentially selecting the median element. Take a look at algorithms for finding the median of a list of numbers. In particular, a selection algorithm would work fine for this and compute it in O(n).

在第二种方法中，实际上是选择中值元素。看一看找到数字列表中位数的算法。特别地，选择算法可以很好地解决这个问题，并在O(n)中计算它。

Hoare's selection algorithm works very similar to quick sort, except that instead of recursing down both partitions, it only recurses down one partition (the partition that contains the kth element).

Hoare的选择算法与quick sort非常相似，除了它不会递归两个分区，它只递归一个分区(包含第kth元素的分区)。

In C++, the standard library provides a selection algorithm in the form of std::nth_element, which guarantees O(n) average complexity. You can use this find the median.

在c++中，标准库以std::nth_element的形式提供了一种选择算法，它保证了O(n)的平均复杂度。你可以用这个求中值。

int a[8] = {5, 1, 1, 1, 2, 1, 3, 1};
int median = *std::nth_element(a, a + 4, a + 8);

Note that std::nth_element will also partially sort the array in place.

注意，std::nth_element也将对数组进行部分排序。

#6

No need for sorting. You can simply use a median selection algorithm to determine the n/2-th element. Quickselect runs in O(n) expected time. Median of medians runs in O(n).

不需要排序。您可以简单地使用中值选择算法来确定n/2元素。Quickselect在O(n)预期时间内运行。中位数在O(n)中。

#7

Sort the array using any sorting algorithm. The element which was repeated more than half of the time will always be the mid element.The complexity will be nlog(n).

使用任何排序算法对数组进行排序。超过一半时间重复的元素总是中间元素。复杂度是nlog(n)

#1