找出是否有一个元素在重复n/k次

时间:2022-06-04 21:47:36

You have an array size n and a constant k (whatever)

数组大小为n,常数k(随便)

You can assume the the array is of int type (although it could be of any type)

您可以假设这个数组是int类型的(尽管它可以是任何类型的)

Describe an algorithm that finds if there is an element(s) that repeats itself at least n/k times... if there is return one. Do so in linear time (O(n))

描述一个算法,如果有一个元素至少重复n/k次……如果有返回1。用线性时间(O(n)

The catch: do this algorithm (or even pseudo-code) using constant memory and running over the array only twice

捕获:使用常量内存执行此算法(甚至伪代码),并且只在数组上运行两次

6 个解决方案

#1


6  

  1. Create a temporary array of size (k-1) to store elements and their counts (The output elements are going to be among these k-1 elements).

    创建一个临时的大小数组(k-1)来存储元素和它们的计数(输出元素将在这些k-1元素中)。

  2. Traverse through the input array and update temp[] (add/remove an element or increase/decrease count) for every traversed element. The array temp[] stores potential (k-1) candidates at every step. This step takes O(nk) time.

    遍历输入数组并更新每个遍历元素的temp[](添加/删除一个元素或增加/减少计数)。数组temp[]在每个步骤中都存储潜在的(k-1)候选者。这个步骤花费O(nk)时间。

  3. Iterate through final (k-1) potential candidates (stored in temp[]). or every element, check if it actually has count more than n/k. This step takes O(nk) time.
  4. 迭代最终(k-1)潜在候选对象(存储在temp[]中)。或者每个元素,检查它的计数是否大于n/k。这个步骤花费O(nk)时间。

The main step is step 2, how to maintain (k-1) potential candidates at every point? The steps used in step 2 are like famous game: Tetris. We treat each number as a piece in Tetris, which falls down in our temporary array temp[]. Our task is to try to keep the same number stacked on the same column (count in temporary array is incremented).

主要步骤是第二步,如何在每个点保持(k-1)潜在的候选人?第2步使用的步骤就像著名的游戏:俄罗斯方块。我们将每个数字都视为俄罗斯方块中的一块,它在临时数组temp[]中下降。我们的任务是在同一列上保持相同的数字(增加临时数组的计数)。

Consider k = 4, n = 9 
Given array: 3 1 2 2 2 1 4 3 3 

i = 0
         3 _ _
temp[] has one element, 3 with count 1

i = 1
         3 1 _
temp[] has two elements, 3 and 1 with 
counts 1 and 1 respectively

i = 2
         3 1 2
temp[] has three elements, 3, 1 and 2 with
counts as 1, 1 and 1 respectively.

i = 3
         - - 2 
         3 1 2
temp[] has three elements, 3, 1 and 2 with
counts as 1, 1 and 2 respectively.

i = 4
         - - 2 
         - - 2 
         3 1 2
temp[] has three elements, 3, 1 and 2 with
counts as 1, 1 and 3 respectively.

i = 5
         - - 2 
         - 1 2 
         3 1 2
temp[] has three elements, 3, 1 and 2 with
counts as 1, 2 and 3 respectively. 
Now the question arises, what to do when temp[] is full and we see a new element – we remove the bottom row from stacks of elements, i.e., we decrease count of every element by 1 in temp[]. We ignore the current element.

i = 6
         - - 2 
         - 1 2 
temp[] has two elements, 1 and 2 with
counts as 1 and 2 respectively.

i = 7
           - 2 
         3 1 2 
temp[] has three elements, 3, 1 and 2 with
counts as 1, 1 and 2 respectively.

i = 8          
         3 - 2
         3 1 2 
temp[] has three elements, 3, 1 and 2 with
counts as 2, 1 and 2 respectively.

Finally, we have at most k-1 numbers in temp[]. The elements in temp are {3, 1, 2}. Note that the counts in temp[] are useless now, the counts were needed only in step 2. Now we need to check whether the actual counts of elements in temp[] are more than n/k (9/4) or not. The elements 3 and 2 have counts more than 9/4. So we print 3 and 2.

最后,我们在temp[]中最多有k-1个数字。temp中的元素是{3,1,2}。注意,temp[]中的计数现在是无用的,这些计数仅在步骤2中需要。现在我们需要检查temp[]中的元素的实际计数是否大于n/k(9/4)。元素3和元素2的计数超过9/4。我们打印3和2。

Note that the algorithm doesn’t miss any output element. There can be two possibilities, many occurrences are together or spread across the array. If occurrences are together, then count will be high and won’t become 0. If occurrences are spread, then the element would come again in temp[]. Following is C++ implementation of above algorithm.

注意,该算法不会遗漏任何输出元素。可以有两种可能性,许多事件一起发生或分布在数组中。如果事件同时发生,那么计数将会很高,不会变成0。如果出现的情况是分散的,那么元素将在temp[]中再次出现。下面是上述算法的c++实现。

// A C++ program to print elements with count more than n/k
#include<iostream>
using namespace std;

// A structure to store an element and its current count
struct eleCount
{
    int e;  // Element
    int c;  // Count
};

// Prints elements with more than n/k occurrences in arr[] of
// size n. If there are no such elements, then it prints nothing.
void moreThanNdK(int arr[], int n, int k)
{
    // k must be greater than 1 to get some output
    if (k < 2)
       return;

    /* Step 1: Create a temporary array (contains element
       and count) of size k-1. Initialize count of all
       elements as 0 */
    struct eleCount temp[k-1];
    for (int i=0; i<k-1; i++)
        temp[i].c = 0;

    /* Step 2: Process all elements of input array */
    for (int i = 0; i < n; i++)
    {
        int j;

        /* If arr[i] is already present in
           the element count array, then increment its count */
        for (j=0; j<k-1; j++)
        {
            if (temp[j].e == arr[i])
            {
                 temp[j].c += 1;
                 break;
            }
        }

        /* If arr[i] is not present in temp[] */
        if (j == k-1)
        {
            int l;

            /* If there is position available in temp[], then place 
              arr[i] in the first available position and set count as 1*/
            for (l=0; l<k-1; l++)
            {
                if (temp[l].c == 0)
                {
                    temp[l].e = arr[i];
                    temp[l].c = 1;
                    break;
                }
            }

            /* If all the position in the temp[] are filled, then 
               decrease count of every element by 1 */
            if (l == k-1)
                for (l=0; l<k; l++)
                    temp[l].c -= 1;
        }
    }

    /*Step 3: Check actual counts of potential candidates in temp[]*/
    for (int i=0; i<k-1; i++)
    {
        // Calculate actual count of elements 
        int ac = 0;  // actual count
        for (int j=0; j<n; j++)
            if (arr[j] == temp[i].e)
                ac++;

        // If actual count is more than n/k, then print it
        if (ac > n/k)
           cout << "Number:" << temp[i].e
                << " Count:" << ac << endl;
    }
}

/* Driver program to test above function */
int main()
{
    cout << "First Test\n";
    int arr1[] = {4, 5, 6, 7, 8, 4, 4};
    int size = sizeof(arr1)/sizeof(arr1[0]);
    int k = 3;
    moreThanNdK(arr1, size, k);

    cout << "\nSecond Test\n";
    int arr2[] = {4, 2, 2, 7};
    size = sizeof(arr2)/sizeof(arr2[0]);
    k = 3;
    moreThanNdK(arr2, size, k);

    cout << "\nThird Test\n";
    int arr3[] = {2, 7, 2};
    size = sizeof(arr3)/sizeof(arr3[0]);
    k = 2;
    moreThanNdK(arr3, size, k);

    cout << "\nFourth Test\n";
    int arr4[] = {2, 3, 3, 2};
    size = sizeof(arr4)/sizeof(arr4[0]);
    k = 3;
    moreThanNdK(arr4, size, k);

    return 0;
}

#2


11  

I'm not 100% sure, but it sounds like you want to solve the Britney Spears problem—finding an item that makes up a certain fraction of a sample using constant memory.

我不是百分之百的确定,但听起来你想解决“小甜甜”布兰妮的问题——用恒定的记忆找出一个只占样本一定比例的物品。

Here is a statement of the problem in English, with a sketch of the solution:

以下是用英语对问题的描述,以及解决方案的草图:

… from a 2002 article by Erik D. Demaine of MIT and Alejandro López-Ortiz and J. Ian Munro of the University of Waterloo in Canada. Demaine and his colleagues have extended the algorithm to cover a more-general problem: Given a stream of length n, identify a set of size m that includes all the elements occurring with a frequency greater than n /( m +1). (In the case of m =1, this reduces to the majority problem.) The extended algorithm requires m registers for the candidate elements as well as m counters. The basic scheme of operation is analogous to that of the majority algorithm. When a stream ele­ment matches one of the candidates, the corresponding counter is incremented; when there is no match to any candidate, all of the counters are decremented; if a counter is at 0, the associated candidate is replaced by a new element from the stream.

——摘自麻省理工学院(MIT)的埃里克·d·德梅内(Erik D. Demaine)、加拿大滑铁卢大学(University of Waterloo)的亚历杭德罗·洛佩兹-奥尔蒂斯(Alejandro Lopez-Ortiz)和j·伊恩·芒罗(J. Ian Munro) 2002年的一篇文章。Demaine和他的同事已经扩展了算法来解决一个更一般的问题:给定一个长度为n的流,识别一组大小为m的元素,其中包含所有频率大于n /(m +1)的元素。(在m =1的情况下,这可以简化为多数问题。)扩展算法需要对候选元素和m计数器进行m寄存器。操作的基本方案类似于大多数算法。当一个流避署­匹配的候选人之一,相应的计数器递增;当没有匹配的候选人时,所有的计数器都是递减的;如果计数器在0处,关联的候选项将被来自流的新元素替换。

#3


3  

There are two common (theoretical) approaches to this problem in O(n)

在O(n)中有两种常见的(理论)方法

I) The first idea is the simplest

第一个想法是最简单的

Step 1) While there are more than k distinct elements, Select k distinct elements and erase them all.

步骤1)当有超过k个不同的元素时,选择k个不同的元素并删除它们。

Step 2) Test all k distinct remaining elements for it's frequency

步骤2)测试所有k个不同的剩余元素的频率

Proof of correctness: Note that While step will be executed at most n/k - 1 times. Suppose there is an element that repeats itself at least n/k times. In the worst case it could be chosen in all n/k-1 iterations and it will still be in the final array after it, after being tested it will be found.

正确性证明:请注意,尽管步骤将最多执行n/k - 1次。假设有一个元素至少重复n/k次。在最坏的情况下,它可以在所有的n/k-1迭代中被选择,并且在经过测试后仍然会在最终的数组中被找到。

Implementation: Step 1 can be implemented keeping an associative array (maps a key to a value) of size k-1 (constant), you sweep from left to right on the array, if you find an element that is already on the map, increase it's counter to 1, if the element is not on the map and the map is not full yet (less than k-1 elements), add this new element with initial counting 1, if the map is full, remove 1 from the counter of each element, if any element reaches 0, remove it from the map. In the end, the elements on this map will be the equivalent as the remaining elements you need to test. If, in the last iteration your map becomes empty you need to test all the elements before erasing to cover the case that the frequency is exactly n/k.

实现:第1步可以实现密切关联数组(地图的关键值)的大小k - 1(常数),你从左到右扫描数组,如果你发现已经在地图上的一个元素,增加它的计数器1,如果元素不是地图,地图上还没有完整的(不到k - 1元素),与初始计算1添加新的元素,如果全地图,删除1从柜台的每个元素,如果任何元素达到0,将它从地图。最后,这个映射上的元素将与您需要测试的其他元素相同。如果在最后一次迭代中,您的映射变成空的,您需要在删除之前测试所有元素,以覆盖频率正好为n/k的情况。

Complexity: Considering the worst approach to this map, O(n * k) = O(n), as k is contant.

复杂度:考虑最坏的方法,O(n * k) = O(n),因为k是不一致的。

Step 2 can be implemented by counting the frequency of all (maximum) k-1 elements Complexity: O(k*n) = O(n)

步骤2可以通过计算所有(最大)k-1元素的频率来实现:O(k*n) = O(n)

Overall complexity: O(n) + O(n) = O(n). (there's a small detail that was different from the implementation, a difference of 1 element, this happens because we want to also cover the case of frequency exactly n/k repetitions in the pseudocode, if not, we could allow one more iteration be possible with there are exactly k different elements, not necessarily more than k)

总复杂度:O(n) + O(n) = O(n)(有个小细节是不同的实现,不同的元素,这是因为我们还想要覆盖的情况下频率完全n / k伪代码重复,如果不是,我们可以让一个迭代可能有完全k不同元素,不一定比k)

II) The second algorithm uses the selection algorithm in linear time http://en.wikipedia.org/wiki/Selection_algorithm and the partition algorithm which also runs in linear time. Using them, you break your array in k-1 buckets, with the invariant that any element in the ith bucket is smaller or equal than any element in the jth bucket for j > i in O(n). But note that the elements are not sorted inside each bucket.

第二种算法使用线性时间的选择算法http://en.wikipedia.org/wiki/Selection_algorithm和同样在线性时间运行的分区算法。使用它们,你可以在k-1桶中打破你的数组,在第i个桶中,任何元素都比第j桶中的任何元素都要小或相等。但是请注意,元素不是在每个bucket中排序的。

Now, you use the fact that each bucket has n/(k-1) elements, and you're looking for an element that repeats itself at least (n/k), and (n/k) > n/(2*(k-1)). This suffices to use the majority theorem, which states that if an element is the majority (more frequent than number of elements divided by 2), then it's also the median of the array. You can get the median again using the selection algorithm.

现在,您使用了每个bucket都有n/(k-1)元素的事实,并且您正在寻找一个至少重复自己的元素(n/k)和(n/k) > n/(2*(k-1))。这足以使用多数定理,它指出如果一个元素是多数元素(比元素数量除以2更频繁),那么它也是数组的中位数。你可以使用选择算法得到中位数。

So, you just test all medians and all pivots for the partitions, which you need to test them because they may split equal values in two different buckets, there are k-1 + k values, complexity O((2*k-1)*n)) = O(n).

因此,您只需测试分区的所有中介和所有数据透视,您需要对它们进行测试,因为它们可能在两个不同的bucket中分割相等的值,有k-1 + k值,复杂度O(2*k-1)*n) = O(n)。

#4


0  

A simple O(n) algorithm would be to keep a hashed map from the number found to the number of instances found. Using a hashed map is important for maintaining O(n). The, a final pass over the map will reveal the answers. This pass is also O(n) since the worst case scenario is that every element appeared only once and hence the map is the same size as the original array.

一个简单的O(n)算法是将一个散列的映射从找到的实例数中保留下来。使用散列映射对于维护O(n)很重要。最后通过地图将会揭晓答案。这个传递也是O(n),因为最坏的情况是每个元素只出现一次,因此映射的大小与原始数组相同。

#5


0  

I don't know if you are restricted on which additional data structures you can use.

我不知道您是否受限于可以使用哪些附加数据结构。

What about creating a hashmap with 'elements' <--> count mapping. Insertion is O(log N). Lookup is O(1). For each element, lookup on hashtable, insert if does not exist with count 1. If exists, check if count < (n/k). It will stay O(n).

创建一个带有“elements”<——>计数映射的hashmap怎么样?插入是O(log N),查找是O(1)。对于每个元素,在hashtable上查找,如果不存在count 1,则插入。如果存在,检查count是否< (n/k)。它将保持O(n)。

EDIT:

编辑:

I forgot the constant memory restriction. It's preallocating hash map entries with N elements permitted?

我忘记了恒存限制。它在预分配允许有N个元素的散列映射条目?

#6


0  

This is my implementation of Jerky algorithm described above:

这是我在上面描述的Jerky算法的实现:

#include <map>
#include <vector>
#include <iostream>
#include <algorithm>

std::vector<int> repeatingElements(const std::vector<int>& a, int k)
{
    if (a.empty())
        return std::vector<int>();

    std::map<int, int> candidateMap; //value, count;

    for (int i = 0; i < a.size(); i++)
    {
        if (candidateMap.find(a[i]) != candidateMap.end())
        {
            candidateMap[a[i]]++;
        }
        else
        {
            if (candidateMap.size() < k-1)
            {
                candidateMap[a[i]] = 1;
            }
            else
            {
                for (std::map<int, int>::iterator iter = candidateMap.begin();
                     iter != candidateMap.end();)
                {
                    (iter->second)--;

                    if (iter->second == 0)
                    {
                        iter = candidateMap.erase(iter);
                    }
                    else
                    {
                        iter++;
                    }
                }   
            }
        }
    }

    std::vector<int> ret;

    for (std::map<int, int>::iterator iter = candidateMap.begin();
         iter != candidateMap.end(); iter++)
    {
        int candidate = iter->first;

        if (std::count(a.begin(), a.end(), candidate) > (a.size() / k))
        {
            ret.push_back(candidate);
        }
    }

    return ret;
}

int main()
{
    std::vector<int> a = { 1, 1, 4, 2, 2, 3, 3 };   
    int k = 4;

    std::vector<int> repeating_elements = repeatingElements(a, k);

    for (int elem : repeating_elements)
    {
        std::cout << "Repeating more than n/" << k << " : " << elem << std::endl;
    }

    return 0;
}

And the output is:

和输出是:

Repeating more than n/4 : 1

重复超过n/4: 1

Repeating more than n/4 : 2

重复超过n/4: 2

Repeating more than n/4 : 3

重复n/4: 3。

#1


6  

  1. Create a temporary array of size (k-1) to store elements and their counts (The output elements are going to be among these k-1 elements).

    创建一个临时的大小数组(k-1)来存储元素和它们的计数(输出元素将在这些k-1元素中)。

  2. Traverse through the input array and update temp[] (add/remove an element or increase/decrease count) for every traversed element. The array temp[] stores potential (k-1) candidates at every step. This step takes O(nk) time.

    遍历输入数组并更新每个遍历元素的temp[](添加/删除一个元素或增加/减少计数)。数组temp[]在每个步骤中都存储潜在的(k-1)候选者。这个步骤花费O(nk)时间。

  3. Iterate through final (k-1) potential candidates (stored in temp[]). or every element, check if it actually has count more than n/k. This step takes O(nk) time.
  4. 迭代最终(k-1)潜在候选对象(存储在temp[]中)。或者每个元素,检查它的计数是否大于n/k。这个步骤花费O(nk)时间。

The main step is step 2, how to maintain (k-1) potential candidates at every point? The steps used in step 2 are like famous game: Tetris. We treat each number as a piece in Tetris, which falls down in our temporary array temp[]. Our task is to try to keep the same number stacked on the same column (count in temporary array is incremented).

主要步骤是第二步,如何在每个点保持(k-1)潜在的候选人?第2步使用的步骤就像著名的游戏:俄罗斯方块。我们将每个数字都视为俄罗斯方块中的一块,它在临时数组temp[]中下降。我们的任务是在同一列上保持相同的数字(增加临时数组的计数)。

Consider k = 4, n = 9 
Given array: 3 1 2 2 2 1 4 3 3 

i = 0
         3 _ _
temp[] has one element, 3 with count 1

i = 1
         3 1 _
temp[] has two elements, 3 and 1 with 
counts 1 and 1 respectively

i = 2
         3 1 2
temp[] has three elements, 3, 1 and 2 with
counts as 1, 1 and 1 respectively.

i = 3
         - - 2 
         3 1 2
temp[] has three elements, 3, 1 and 2 with
counts as 1, 1 and 2 respectively.

i = 4
         - - 2 
         - - 2 
         3 1 2
temp[] has three elements, 3, 1 and 2 with
counts as 1, 1 and 3 respectively.

i = 5
         - - 2 
         - 1 2 
         3 1 2
temp[] has three elements, 3, 1 and 2 with
counts as 1, 2 and 3 respectively. 
Now the question arises, what to do when temp[] is full and we see a new element – we remove the bottom row from stacks of elements, i.e., we decrease count of every element by 1 in temp[]. We ignore the current element.

i = 6
         - - 2 
         - 1 2 
temp[] has two elements, 1 and 2 with
counts as 1 and 2 respectively.

i = 7
           - 2 
         3 1 2 
temp[] has three elements, 3, 1 and 2 with
counts as 1, 1 and 2 respectively.

i = 8          
         3 - 2
         3 1 2 
temp[] has three elements, 3, 1 and 2 with
counts as 2, 1 and 2 respectively.

Finally, we have at most k-1 numbers in temp[]. The elements in temp are {3, 1, 2}. Note that the counts in temp[] are useless now, the counts were needed only in step 2. Now we need to check whether the actual counts of elements in temp[] are more than n/k (9/4) or not. The elements 3 and 2 have counts more than 9/4. So we print 3 and 2.

最后,我们在temp[]中最多有k-1个数字。temp中的元素是{3,1,2}。注意,temp[]中的计数现在是无用的,这些计数仅在步骤2中需要。现在我们需要检查temp[]中的元素的实际计数是否大于n/k(9/4)。元素3和元素2的计数超过9/4。我们打印3和2。

Note that the algorithm doesn’t miss any output element. There can be two possibilities, many occurrences are together or spread across the array. If occurrences are together, then count will be high and won’t become 0. If occurrences are spread, then the element would come again in temp[]. Following is C++ implementation of above algorithm.

注意,该算法不会遗漏任何输出元素。可以有两种可能性,许多事件一起发生或分布在数组中。如果事件同时发生,那么计数将会很高,不会变成0。如果出现的情况是分散的,那么元素将在temp[]中再次出现。下面是上述算法的c++实现。

// A C++ program to print elements with count more than n/k
#include<iostream>
using namespace std;

// A structure to store an element and its current count
struct eleCount
{
    int e;  // Element
    int c;  // Count
};

// Prints elements with more than n/k occurrences in arr[] of
// size n. If there are no such elements, then it prints nothing.
void moreThanNdK(int arr[], int n, int k)
{
    // k must be greater than 1 to get some output
    if (k < 2)
       return;

    /* Step 1: Create a temporary array (contains element
       and count) of size k-1. Initialize count of all
       elements as 0 */
    struct eleCount temp[k-1];
    for (int i=0; i<k-1; i++)
        temp[i].c = 0;

    /* Step 2: Process all elements of input array */
    for (int i = 0; i < n; i++)
    {
        int j;

        /* If arr[i] is already present in
           the element count array, then increment its count */
        for (j=0; j<k-1; j++)
        {
            if (temp[j].e == arr[i])
            {
                 temp[j].c += 1;
                 break;
            }
        }

        /* If arr[i] is not present in temp[] */
        if (j == k-1)
        {
            int l;

            /* If there is position available in temp[], then place 
              arr[i] in the first available position and set count as 1*/
            for (l=0; l<k-1; l++)
            {
                if (temp[l].c == 0)
                {
                    temp[l].e = arr[i];
                    temp[l].c = 1;
                    break;
                }
            }

            /* If all the position in the temp[] are filled, then 
               decrease count of every element by 1 */
            if (l == k-1)
                for (l=0; l<k; l++)
                    temp[l].c -= 1;
        }
    }

    /*Step 3: Check actual counts of potential candidates in temp[]*/
    for (int i=0; i<k-1; i++)
    {
        // Calculate actual count of elements 
        int ac = 0;  // actual count
        for (int j=0; j<n; j++)
            if (arr[j] == temp[i].e)
                ac++;

        // If actual count is more than n/k, then print it
        if (ac > n/k)
           cout << "Number:" << temp[i].e
                << " Count:" << ac << endl;
    }
}

/* Driver program to test above function */
int main()
{
    cout << "First Test\n";
    int arr1[] = {4, 5, 6, 7, 8, 4, 4};
    int size = sizeof(arr1)/sizeof(arr1[0]);
    int k = 3;
    moreThanNdK(arr1, size, k);

    cout << "\nSecond Test\n";
    int arr2[] = {4, 2, 2, 7};
    size = sizeof(arr2)/sizeof(arr2[0]);
    k = 3;
    moreThanNdK(arr2, size, k);

    cout << "\nThird Test\n";
    int arr3[] = {2, 7, 2};
    size = sizeof(arr3)/sizeof(arr3[0]);
    k = 2;
    moreThanNdK(arr3, size, k);

    cout << "\nFourth Test\n";
    int arr4[] = {2, 3, 3, 2};
    size = sizeof(arr4)/sizeof(arr4[0]);
    k = 3;
    moreThanNdK(arr4, size, k);

    return 0;
}

#2


11  

I'm not 100% sure, but it sounds like you want to solve the Britney Spears problem—finding an item that makes up a certain fraction of a sample using constant memory.

我不是百分之百的确定,但听起来你想解决“小甜甜”布兰妮的问题——用恒定的记忆找出一个只占样本一定比例的物品。

Here is a statement of the problem in English, with a sketch of the solution:

以下是用英语对问题的描述,以及解决方案的草图:

… from a 2002 article by Erik D. Demaine of MIT and Alejandro López-Ortiz and J. Ian Munro of the University of Waterloo in Canada. Demaine and his colleagues have extended the algorithm to cover a more-general problem: Given a stream of length n, identify a set of size m that includes all the elements occurring with a frequency greater than n /( m +1). (In the case of m =1, this reduces to the majority problem.) The extended algorithm requires m registers for the candidate elements as well as m counters. The basic scheme of operation is analogous to that of the majority algorithm. When a stream ele­ment matches one of the candidates, the corresponding counter is incremented; when there is no match to any candidate, all of the counters are decremented; if a counter is at 0, the associated candidate is replaced by a new element from the stream.

——摘自麻省理工学院(MIT)的埃里克·d·德梅内(Erik D. Demaine)、加拿大滑铁卢大学(University of Waterloo)的亚历杭德罗·洛佩兹-奥尔蒂斯(Alejandro Lopez-Ortiz)和j·伊恩·芒罗(J. Ian Munro) 2002年的一篇文章。Demaine和他的同事已经扩展了算法来解决一个更一般的问题:给定一个长度为n的流,识别一组大小为m的元素,其中包含所有频率大于n /(m +1)的元素。(在m =1的情况下,这可以简化为多数问题。)扩展算法需要对候选元素和m计数器进行m寄存器。操作的基本方案类似于大多数算法。当一个流避署­匹配的候选人之一,相应的计数器递增;当没有匹配的候选人时,所有的计数器都是递减的;如果计数器在0处,关联的候选项将被来自流的新元素替换。

#3


3  

There are two common (theoretical) approaches to this problem in O(n)

在O(n)中有两种常见的(理论)方法

I) The first idea is the simplest

第一个想法是最简单的

Step 1) While there are more than k distinct elements, Select k distinct elements and erase them all.

步骤1)当有超过k个不同的元素时,选择k个不同的元素并删除它们。

Step 2) Test all k distinct remaining elements for it's frequency

步骤2)测试所有k个不同的剩余元素的频率

Proof of correctness: Note that While step will be executed at most n/k - 1 times. Suppose there is an element that repeats itself at least n/k times. In the worst case it could be chosen in all n/k-1 iterations and it will still be in the final array after it, after being tested it will be found.

正确性证明:请注意,尽管步骤将最多执行n/k - 1次。假设有一个元素至少重复n/k次。在最坏的情况下,它可以在所有的n/k-1迭代中被选择,并且在经过测试后仍然会在最终的数组中被找到。

Implementation: Step 1 can be implemented keeping an associative array (maps a key to a value) of size k-1 (constant), you sweep from left to right on the array, if you find an element that is already on the map, increase it's counter to 1, if the element is not on the map and the map is not full yet (less than k-1 elements), add this new element with initial counting 1, if the map is full, remove 1 from the counter of each element, if any element reaches 0, remove it from the map. In the end, the elements on this map will be the equivalent as the remaining elements you need to test. If, in the last iteration your map becomes empty you need to test all the elements before erasing to cover the case that the frequency is exactly n/k.

实现:第1步可以实现密切关联数组(地图的关键值)的大小k - 1(常数),你从左到右扫描数组,如果你发现已经在地图上的一个元素,增加它的计数器1,如果元素不是地图,地图上还没有完整的(不到k - 1元素),与初始计算1添加新的元素,如果全地图,删除1从柜台的每个元素,如果任何元素达到0,将它从地图。最后,这个映射上的元素将与您需要测试的其他元素相同。如果在最后一次迭代中,您的映射变成空的,您需要在删除之前测试所有元素,以覆盖频率正好为n/k的情况。

Complexity: Considering the worst approach to this map, O(n * k) = O(n), as k is contant.

复杂度:考虑最坏的方法,O(n * k) = O(n),因为k是不一致的。

Step 2 can be implemented by counting the frequency of all (maximum) k-1 elements Complexity: O(k*n) = O(n)

步骤2可以通过计算所有(最大)k-1元素的频率来实现:O(k*n) = O(n)

Overall complexity: O(n) + O(n) = O(n). (there's a small detail that was different from the implementation, a difference of 1 element, this happens because we want to also cover the case of frequency exactly n/k repetitions in the pseudocode, if not, we could allow one more iteration be possible with there are exactly k different elements, not necessarily more than k)

总复杂度:O(n) + O(n) = O(n)(有个小细节是不同的实现,不同的元素,这是因为我们还想要覆盖的情况下频率完全n / k伪代码重复,如果不是,我们可以让一个迭代可能有完全k不同元素,不一定比k)

II) The second algorithm uses the selection algorithm in linear time http://en.wikipedia.org/wiki/Selection_algorithm and the partition algorithm which also runs in linear time. Using them, you break your array in k-1 buckets, with the invariant that any element in the ith bucket is smaller or equal than any element in the jth bucket for j > i in O(n). But note that the elements are not sorted inside each bucket.

第二种算法使用线性时间的选择算法http://en.wikipedia.org/wiki/Selection_algorithm和同样在线性时间运行的分区算法。使用它们,你可以在k-1桶中打破你的数组,在第i个桶中,任何元素都比第j桶中的任何元素都要小或相等。但是请注意,元素不是在每个bucket中排序的。

Now, you use the fact that each bucket has n/(k-1) elements, and you're looking for an element that repeats itself at least (n/k), and (n/k) > n/(2*(k-1)). This suffices to use the majority theorem, which states that if an element is the majority (more frequent than number of elements divided by 2), then it's also the median of the array. You can get the median again using the selection algorithm.

现在,您使用了每个bucket都有n/(k-1)元素的事实,并且您正在寻找一个至少重复自己的元素(n/k)和(n/k) > n/(2*(k-1))。这足以使用多数定理,它指出如果一个元素是多数元素(比元素数量除以2更频繁),那么它也是数组的中位数。你可以使用选择算法得到中位数。

So, you just test all medians and all pivots for the partitions, which you need to test them because they may split equal values in two different buckets, there are k-1 + k values, complexity O((2*k-1)*n)) = O(n).

因此,您只需测试分区的所有中介和所有数据透视,您需要对它们进行测试,因为它们可能在两个不同的bucket中分割相等的值,有k-1 + k值,复杂度O(2*k-1)*n) = O(n)。

#4


0  

A simple O(n) algorithm would be to keep a hashed map from the number found to the number of instances found. Using a hashed map is important for maintaining O(n). The, a final pass over the map will reveal the answers. This pass is also O(n) since the worst case scenario is that every element appeared only once and hence the map is the same size as the original array.

一个简单的O(n)算法是将一个散列的映射从找到的实例数中保留下来。使用散列映射对于维护O(n)很重要。最后通过地图将会揭晓答案。这个传递也是O(n),因为最坏的情况是每个元素只出现一次,因此映射的大小与原始数组相同。

#5


0  

I don't know if you are restricted on which additional data structures you can use.

我不知道您是否受限于可以使用哪些附加数据结构。

What about creating a hashmap with 'elements' <--> count mapping. Insertion is O(log N). Lookup is O(1). For each element, lookup on hashtable, insert if does not exist with count 1. If exists, check if count < (n/k). It will stay O(n).

创建一个带有“elements”<——>计数映射的hashmap怎么样?插入是O(log N),查找是O(1)。对于每个元素,在hashtable上查找,如果不存在count 1,则插入。如果存在,检查count是否< (n/k)。它将保持O(n)。

EDIT:

编辑:

I forgot the constant memory restriction. It's preallocating hash map entries with N elements permitted?

我忘记了恒存限制。它在预分配允许有N个元素的散列映射条目?

#6


0  

This is my implementation of Jerky algorithm described above:

这是我在上面描述的Jerky算法的实现:

#include <map>
#include <vector>
#include <iostream>
#include <algorithm>

std::vector<int> repeatingElements(const std::vector<int>& a, int k)
{
    if (a.empty())
        return std::vector<int>();

    std::map<int, int> candidateMap; //value, count;

    for (int i = 0; i < a.size(); i++)
    {
        if (candidateMap.find(a[i]) != candidateMap.end())
        {
            candidateMap[a[i]]++;
        }
        else
        {
            if (candidateMap.size() < k-1)
            {
                candidateMap[a[i]] = 1;
            }
            else
            {
                for (std::map<int, int>::iterator iter = candidateMap.begin();
                     iter != candidateMap.end();)
                {
                    (iter->second)--;

                    if (iter->second == 0)
                    {
                        iter = candidateMap.erase(iter);
                    }
                    else
                    {
                        iter++;
                    }
                }   
            }
        }
    }

    std::vector<int> ret;

    for (std::map<int, int>::iterator iter = candidateMap.begin();
         iter != candidateMap.end(); iter++)
    {
        int candidate = iter->first;

        if (std::count(a.begin(), a.end(), candidate) > (a.size() / k))
        {
            ret.push_back(candidate);
        }
    }

    return ret;
}

int main()
{
    std::vector<int> a = { 1, 1, 4, 2, 2, 3, 3 };   
    int k = 4;

    std::vector<int> repeating_elements = repeatingElements(a, k);

    for (int elem : repeating_elements)
    {
        std::cout << "Repeating more than n/" << k << " : " << elem << std::endl;
    }

    return 0;
}

And the output is:

和输出是:

Repeating more than n/4 : 1

重复超过n/4: 1

Repeating more than n/4 : 2

重复超过n/4: 2

Repeating more than n/4 : 3

重复n/4: 3。