计算数组中相同元素对数的有效算法

时间:2023-02-01 21:42:37

For example,

例如,

int num[] = {1, 5, 3, 12, 5, 1, 4};
int len = 7;
int count = 0;

(Assuming there are no more than 2 of identical elements in the array)
Then I would do

(假设数组中相同的元素不超过2个)我就会这么做

for(int i=0; i<len-1; i++) {
  for(int j=i+1; j<len; j++) {
    if(num[i] == num[j]) {
      count++;
    }
  }
}  

Then count would be 2.

那么计数就是2。

But this algorithm results in the efficiency of O(N^2).
Is there a more efficient way?
Thank you in advance!

但该算法的结果在O(N ^ 2)的效率。有没有更有效的方法?提前谢谢你!

4 个解决方案

#1


4  

Faster than an O(n log n) sort, use a hash table. Its construction is linear or O(n); every time you insert, as you walk through your input array, you can test (with constant, O(1) cost) if there is that key already in your hash table — i.e., a "pair". When you find this "match" case, you increment a counter. Once you have gone through your array, your counter tells you the answer.

比O(n log n)排序快,使用哈希表。其构造为线性或O(n);每次插入时,当您遍历输入数组时,如果哈希表中已经有这个键,您就可以测试(以常量,O(1)代价)。“一对”。当您发现这个“匹配”情况时,您将增加一个计数器。一旦你通过了你的数组,你的计数器就会告诉你答案。

#2


1  

you can not use a normal hash table. I used dynamic programming and a hashtable.

您不能使用普通的散列表。我使用了动态编程和哈希表。

You should set up your hash-table like this:

你应该这样设置你的哈希表:

[value of array, Number of repetition, number of identical pair]

[数组值,重复次数,同对数]

For example, we can run my algorithm on the array A = [3, 3, 3, 3, 3].

例如,我们可以在数组A =[3, 3, 3, 3, 3]上运行我的算法。

we walk through Array. for first number in A, the new row is inserted to the hashtable

我们穿过数组。对于A中的第一个数字,新行被插入到hashtable中

  3,0,0 /* A[i], the number of repetition of 3 so far (Rep[i]),
                    the number of identical pair so far ( Iden[i]). */

then for second 3 in A:

然后是A中的第二个3:

 3,1,1 

for third 3 in A:

三分之一A:

 3,2,3 

for 4th 3 in A:

在A中的4 / 3:

 3,3,6 

6 is the number of identical pairs in this array. In general, we can calculate the number of identical pair by following formula:

6是这个数组中相同对的个数。一般来说,我们可以通过以下公式来计算同对的个数:

Iden[i] = Rep[i] + Iden[i-1]

Iden[i] = Rep[i] + Iden[i-1]

Here's a code sample in C#:

下面是c#中的代码示例:

     public static int solution(int[] A)
    {
         int identical = 0;
        Dictionary<int, KeyValuePair<int, int>> dic = new Dictionary<int, KeyValuePair<int, int>>(); /* A[i], the number of repetition of 3 so far (Rep[i]),
                    the number of identical pair so far ( Iden[i]). */
        for (int i = 0; i < A.Length; i++)
        {
            if (!dic.ContainsKey(A[i]))
                dic.Add(A[i], new KeyValuePair<int, int>(0,0));
            else
            { 
               KeyValuePair<int,int> valDic = dic[A[i]];

                KeyValuePair<int, int> newVal;
                if (valDic.Key < 1)
                    newVal = new KeyValuePair<int, int>(1, 1);
                else
                {
                    int preIdenticalPair = valDic.Value;
                    int preReptation = valDic.Key;
                    int newRepetation = ++preReptation;
                    int newIdenticalPair = preIdenticalPair + newRepetation;
                    newVal = new KeyValuePair<int, int>(newRepetation, newIdenticalPair);
                }

                dic[A[i]] = newVal;
            }
        }

        //summation of all identical pairs
        foreach (KeyValuePair<int, KeyValuePair<int, int>> pair in dic)
            identical += pair.Value.Value;
        return identical;
    }

#3


0  

you can try this way:

你可以这样尝试:

#define MAX 99999   // consider your array have largest elemnt < 99999 and >=0
    int main () {
    int num[] = {1, 5, 3, 12, 5, 1, 4,3};
    int len = 8;
    int count[MAX+1u] = {0};
    int i,pair=0;
    for( i=0;i<len;i++){
         count[num[i]]++;   // here count the frequecy of each number
    }
    for(i=0;i<MAX;i++){
     if(count[i]>1){               // if frequecy is > 1      
    printf("%d occurs %d times \n",i,count[i]);
    pair++;         // increment pair
    }

    }
    printf("%d pairs ",pair);   // print pair
       return 0;
    }

#4


0  

Basically, the core idea remains the same : you have to count the number of occurences of each element in your input array.

基本上,核心思想保持不变:您必须计算输入数组中每个元素出现的次数。

The key problem is how to implement this counting process. Your solution is perfectly valid, but as you felt it, it can be improved. You received a comment suggesting sorting the array and then perform a sweep on it to count the number of pairs : O(n.ln n).

关键问题是如何实现这个计数过程。你的解决方案是完全有效的,但正如你感觉的那样,它可以改进。您收到一条建议对数组进行排序的注释,然后对其执行一次扫描,以计数对的数量:O(n)。ln n)。

You could also use a hash table, as suggested by @AlexReynolds answer. But you must deal with collisions, as different integers could hash to the same key. To do so, you could use a bucket for each key. This bucket would store every integer that hashed to its key, plus the number of occurences of the said integer.

您还可以使用一个哈希表,如@AlexReynolds answer建议的那样。但是必须处理冲突,因为不同的整数可以哈希到相同的键。为此,您可以为每个键使用一个桶。这个桶将存储哈希到其键的每个整数,加上该整数出现的次数。

How to implement these buckets :

如何实现这些桶:

  • With Lists ? If you have a few collisions, this is good enough.
  • 列表吗?如果有一些碰撞,这就足够了。
  • With Heaps (like binary heap) : logarithmic access and insertion, but fixed size and you might need reallocation if too much collisions occur. This is good if you don't underestimate the number of collisions but increases memory usage if you overestimate it.
  • 对于堆(如二进制堆):对数访问和插入,但是大小是固定的,如果发生太多冲突,可能需要重新分配。如果您不低估冲突的数量,但是如果您过高估计内存使用量,那么这是很好的。
  • With Balanced Binary Trees : logarithmic lookup and insertion, but spatial overhead due to the pointer tracking going on "inside". But here, you avoid the estimation problem heaps induced.
  • 使用平衡的二叉树:对数查找和插入,但是由于指针跟踪在“内部”进行而造成的空间开销。但是这里,您可以避免引起的估计问题堆。

Once your hashtable is built, with the occurences of each element present in your array, you must count the pairs. But you could keep a counter while populating the data structure to avoid performing this extra sweep. This is easy. Here are the basic operations used to update the table with an element e taken from the array :

一旦构建了hashtable,并且数组中出现了每个元素,那么必须对它们进行计数。但是,您可以在填充数据结构时保留一个计数器,以避免执行这种额外的扫描。这是很容易的。下面是使用数组中的元素e更新表的基本操作:

  1. Hash e : we get a Hash key k
  2. 哈希e:我们得到哈希键k
  3. Get the bucket of the hash key k : b
  4. 拿一桶哈希键k: b
  5. Look up the bucket to see if e is in there :
    • If so : increment the element occurrence counter. If the counter is 2, increment the pair counter, if the counter is 3, decrement the pair counter.
    • 如果是:增加元素出现计数器。如果计数器是2,则增加对计数器,如果计数器是3,则减量这对计数器。
    • If not : add the element to the bucket, with an occurrence counter set to 1
    • 如果不是:将元素添加到桶中,出现计数器设置为1
  6. 查找桶,看看e是否在那里:如果是:增加元素出现计数器。如果计数器是2,增加对计数器,如果计数器是3,减少对计数器。如果不是:将元素添加到桶中,出现计数器设置为1
  7. Get the next element in the input array in e and repeat until the array has been fully sweeped.
  8. 在e中获取输入数组中的下一个元素并重复,直到数组被完全清除。

Is performing one if statement at each counter update faster than performing a simple check in the end ? I am not sure, just be aware you can test both.

在每个计数器更新时执行一个if语句是否比最后执行一个简单的检查要快?我不确定,但请注意你可以同时测试这两个。

Let k be the number of slots in the hash table. With a uniformly distributed hash function, you would get n/k elements per slot. This leads to a n²/k time complexity with lists, which is O(n²) ... But if k is close to n you get really close to a linear time.

设k是哈希表中的槽数。对于一个均匀分布的哈希函数,每个槽可以得到n/k个元素。这导致n²/ k与列表,时间复杂度是O(n²)……但是如果k接近n,就会接近一个线性时间。

The same goes for heap/tree buckets, except that in the end you get a O(n. ln n) asymptotic complexity. If k is large enough, again, you will be close to linear time.

堆/树桶也是一样,只不过最后得到O(n)ln n)渐近的复杂性。如果k足够大,就会接近线性时间。

#1


4  

Faster than an O(n log n) sort, use a hash table. Its construction is linear or O(n); every time you insert, as you walk through your input array, you can test (with constant, O(1) cost) if there is that key already in your hash table — i.e., a "pair". When you find this "match" case, you increment a counter. Once you have gone through your array, your counter tells you the answer.

比O(n log n)排序快,使用哈希表。其构造为线性或O(n);每次插入时,当您遍历输入数组时,如果哈希表中已经有这个键,您就可以测试(以常量,O(1)代价)。“一对”。当您发现这个“匹配”情况时,您将增加一个计数器。一旦你通过了你的数组,你的计数器就会告诉你答案。

#2


1  

you can not use a normal hash table. I used dynamic programming and a hashtable.

您不能使用普通的散列表。我使用了动态编程和哈希表。

You should set up your hash-table like this:

你应该这样设置你的哈希表:

[value of array, Number of repetition, number of identical pair]

[数组值,重复次数,同对数]

For example, we can run my algorithm on the array A = [3, 3, 3, 3, 3].

例如,我们可以在数组A =[3, 3, 3, 3, 3]上运行我的算法。

we walk through Array. for first number in A, the new row is inserted to the hashtable

我们穿过数组。对于A中的第一个数字,新行被插入到hashtable中

  3,0,0 /* A[i], the number of repetition of 3 so far (Rep[i]),
                    the number of identical pair so far ( Iden[i]). */

then for second 3 in A:

然后是A中的第二个3:

 3,1,1 

for third 3 in A:

三分之一A:

 3,2,3 

for 4th 3 in A:

在A中的4 / 3:

 3,3,6 

6 is the number of identical pairs in this array. In general, we can calculate the number of identical pair by following formula:

6是这个数组中相同对的个数。一般来说,我们可以通过以下公式来计算同对的个数:

Iden[i] = Rep[i] + Iden[i-1]

Iden[i] = Rep[i] + Iden[i-1]

Here's a code sample in C#:

下面是c#中的代码示例:

     public static int solution(int[] A)
    {
         int identical = 0;
        Dictionary<int, KeyValuePair<int, int>> dic = new Dictionary<int, KeyValuePair<int, int>>(); /* A[i], the number of repetition of 3 so far (Rep[i]),
                    the number of identical pair so far ( Iden[i]). */
        for (int i = 0; i < A.Length; i++)
        {
            if (!dic.ContainsKey(A[i]))
                dic.Add(A[i], new KeyValuePair<int, int>(0,0));
            else
            { 
               KeyValuePair<int,int> valDic = dic[A[i]];

                KeyValuePair<int, int> newVal;
                if (valDic.Key < 1)
                    newVal = new KeyValuePair<int, int>(1, 1);
                else
                {
                    int preIdenticalPair = valDic.Value;
                    int preReptation = valDic.Key;
                    int newRepetation = ++preReptation;
                    int newIdenticalPair = preIdenticalPair + newRepetation;
                    newVal = new KeyValuePair<int, int>(newRepetation, newIdenticalPair);
                }

                dic[A[i]] = newVal;
            }
        }

        //summation of all identical pairs
        foreach (KeyValuePair<int, KeyValuePair<int, int>> pair in dic)
            identical += pair.Value.Value;
        return identical;
    }

#3


0  

you can try this way:

你可以这样尝试:

#define MAX 99999   // consider your array have largest elemnt < 99999 and >=0
    int main () {
    int num[] = {1, 5, 3, 12, 5, 1, 4,3};
    int len = 8;
    int count[MAX+1u] = {0};
    int i,pair=0;
    for( i=0;i<len;i++){
         count[num[i]]++;   // here count the frequecy of each number
    }
    for(i=0;i<MAX;i++){
     if(count[i]>1){               // if frequecy is > 1      
    printf("%d occurs %d times \n",i,count[i]);
    pair++;         // increment pair
    }

    }
    printf("%d pairs ",pair);   // print pair
       return 0;
    }

#4


0  

Basically, the core idea remains the same : you have to count the number of occurences of each element in your input array.

基本上,核心思想保持不变:您必须计算输入数组中每个元素出现的次数。

The key problem is how to implement this counting process. Your solution is perfectly valid, but as you felt it, it can be improved. You received a comment suggesting sorting the array and then perform a sweep on it to count the number of pairs : O(n.ln n).

关键问题是如何实现这个计数过程。你的解决方案是完全有效的,但正如你感觉的那样,它可以改进。您收到一条建议对数组进行排序的注释,然后对其执行一次扫描,以计数对的数量:O(n)。ln n)。

You could also use a hash table, as suggested by @AlexReynolds answer. But you must deal with collisions, as different integers could hash to the same key. To do so, you could use a bucket for each key. This bucket would store every integer that hashed to its key, plus the number of occurences of the said integer.

您还可以使用一个哈希表,如@AlexReynolds answer建议的那样。但是必须处理冲突,因为不同的整数可以哈希到相同的键。为此,您可以为每个键使用一个桶。这个桶将存储哈希到其键的每个整数,加上该整数出现的次数。

How to implement these buckets :

如何实现这些桶:

  • With Lists ? If you have a few collisions, this is good enough.
  • 列表吗?如果有一些碰撞,这就足够了。
  • With Heaps (like binary heap) : logarithmic access and insertion, but fixed size and you might need reallocation if too much collisions occur. This is good if you don't underestimate the number of collisions but increases memory usage if you overestimate it.
  • 对于堆(如二进制堆):对数访问和插入,但是大小是固定的,如果发生太多冲突,可能需要重新分配。如果您不低估冲突的数量,但是如果您过高估计内存使用量,那么这是很好的。
  • With Balanced Binary Trees : logarithmic lookup and insertion, but spatial overhead due to the pointer tracking going on "inside". But here, you avoid the estimation problem heaps induced.
  • 使用平衡的二叉树:对数查找和插入,但是由于指针跟踪在“内部”进行而造成的空间开销。但是这里,您可以避免引起的估计问题堆。

Once your hashtable is built, with the occurences of each element present in your array, you must count the pairs. But you could keep a counter while populating the data structure to avoid performing this extra sweep. This is easy. Here are the basic operations used to update the table with an element e taken from the array :

一旦构建了hashtable,并且数组中出现了每个元素,那么必须对它们进行计数。但是,您可以在填充数据结构时保留一个计数器,以避免执行这种额外的扫描。这是很容易的。下面是使用数组中的元素e更新表的基本操作:

  1. Hash e : we get a Hash key k
  2. 哈希e:我们得到哈希键k
  3. Get the bucket of the hash key k : b
  4. 拿一桶哈希键k: b
  5. Look up the bucket to see if e is in there :
    • If so : increment the element occurrence counter. If the counter is 2, increment the pair counter, if the counter is 3, decrement the pair counter.
    • 如果是:增加元素出现计数器。如果计数器是2,则增加对计数器,如果计数器是3,则减量这对计数器。
    • If not : add the element to the bucket, with an occurrence counter set to 1
    • 如果不是:将元素添加到桶中,出现计数器设置为1
  6. 查找桶,看看e是否在那里:如果是:增加元素出现计数器。如果计数器是2,增加对计数器,如果计数器是3,减少对计数器。如果不是:将元素添加到桶中,出现计数器设置为1
  7. Get the next element in the input array in e and repeat until the array has been fully sweeped.
  8. 在e中获取输入数组中的下一个元素并重复,直到数组被完全清除。

Is performing one if statement at each counter update faster than performing a simple check in the end ? I am not sure, just be aware you can test both.

在每个计数器更新时执行一个if语句是否比最后执行一个简单的检查要快?我不确定,但请注意你可以同时测试这两个。

Let k be the number of slots in the hash table. With a uniformly distributed hash function, you would get n/k elements per slot. This leads to a n²/k time complexity with lists, which is O(n²) ... But if k is close to n you get really close to a linear time.

设k是哈希表中的槽数。对于一个均匀分布的哈希函数,每个槽可以得到n/k个元素。这导致n²/ k与列表,时间复杂度是O(n²)……但是如果k接近n,就会接近一个线性时间。

The same goes for heap/tree buckets, except that in the end you get a O(n. ln n) asymptotic complexity. If k is large enough, again, you will be close to linear time.

堆/树桶也是一样,只不过最后得到O(n)ln n)渐近的复杂性。如果k足够大,就会接近线性时间。