从重复的数组元素中删除

时间:2022-08-26 14:14:59

What is the best way to remove from the array elements that are repeated. For example, from the array

从重复的数组元素中删除元素的最佳方式是什么?例如,来自数组

a = [4, 3, 3, 1, 6, 6]

need to get

需要

a = [4, 1]

My method works to too slowly with big amount of elements.

我的方法处理大量元素的速度太慢。

arr = [4, 3, 3, 1, 6, 6]
puts arr.join(" ")
nouniq = []
l = arr.length
uniq = nil
for i in 0..(l-1)
  for j in 0..(l-1) 
    if (arr[j] == arr[i]) and ( i != j )
      nouniq << arr[j]
    end
  end
end
arr = (arr - nouniq).compact

puts arr.join(" ")

5 个解决方案

#1


4  

a = [4, 3, 3, 1, 6, 6]
a.select{|b| a.count(b) == 1}
#=> [4, 1]

More complicated but faster solution (O(n) I believe :))

更复杂但更快的解决方案(O(n)我相信:)

a = [4, 3, 3, 1, 6, 6]
ar = []
add = proc{|to, form| to << from[1] if form.uniq.size == from.size }
a.sort!.each_cons(3){|b| add.call(ar, b)}
ar << a[0] if a[0] != a[1]; ar << a[-1] if a[-1] != a[-2]

#2


4  

arr = [4, 3, 3, 1, 6, 6]

arr.
  group_by {|e| e }.
  map {|e, es| [e, es.length] }.
  reject {|e, count| count > 1 }.
  map(&:first)
# [4, 1]

#3


2  

Without introducing the need for a separate copy of the original array and using inject:

不需要单独复制原始数组并使用inject:

[4, 3, 3, 1, 6, 6].inject({}) {|s,v| s[v] ? s.merge({v=>s[v]+1}) : s.merge({v=>1})}.select {|k,v| k if v==1}.keys
 => [4, 1] 

#4


1  

I needed something like this, so tested a few different approaches. These all return an array of the items that are duplicated in the original array:

我需要这样的东西,所以测试了几种不同的方法。这些都返回原始数组中重复的项目的数组:

module Enumerable
def dups
  inject({}) {|h,v| h[v]=h[v].to_i+1; h}.reject{|k,v| v==1}.keys
end
def only_duplicates
  duplicates = []
  self.each {|each| duplicates << each if self.count(each) > 1}
  duplicates.uniq
end
def dups_ej
  inject(Hash.new(0)) {|h,v| h[v] += 1; h}.reject{|k,v| v==1}.keys
end
def dedup
  duplicates = self.dup
  self.uniq.each { |v| duplicates[self.index(v)] = nil }
  duplicates.compact.uniq
end
end

Benchark results for 100,000 iterations, first with an array of integers, then an array of strings. Performance will vary depending on the numer of duplicates found, but these tests are with a fixed number of duplicates (~ half array entries are duplicates):

Benchark的结果是100,000次迭代,首先是整数数组,然后是字符串数组。性能将根据所发现的重复的数字而有所不同,但是这些测试是有固定数量的重复的(~半数组条目是重复的):

test_benchmark_integer
                                    user     system      total        real
Enumerable.dups                 2.560000   0.040000   2.600000 (  2.596083)
Enumerable.only_duplicates      6.840000   0.020000   6.860000 (  6.879830)
Enumerable.dups_ej              2.300000   0.030000   2.330000 (  2.329113)
Enumerable.dedup                1.700000   0.020000   1.720000 (  1.724220)
test_benchmark_strings
                                    user     system      total        real
Enumerable.dups                 4.650000   0.030000   4.680000 (  4.722301)
Enumerable.only_duplicates     47.060000   0.150000  47.210000 ( 47.478509)
Enumerable.dups_ej              4.060000   0.030000   4.090000 (  4.123402)
Enumerable.dedup                3.290000   0.040000   3.330000 (  3.334401)
..
Finished in 73.190988 seconds.

So of these approaches, it seems the Enumerable.dedup algorithm is the best:

在这些方法中,似乎最优的是可枚举的。dedup算法:

  • dup the original array so it is immutable
  • 使用原始数组,所以它是不可变的。
  • gets the uniq array elements
  • 获取uniq数组元素
  • for each unique element: nil the first occurence in the dup array
  • 对于每个唯一元素:nil表示出现在dup数组中的第一个元素
  • compact the result
  • 紧凑的结果

If only (array - array.uniq) worked correctly! (it doesn't - it removes everything)

如果(数组- array.uniq)正确工作!(它没有——它删除了一切)

#5


0  

Here's my spin on a solution used by Perl programmers using a hash to accumulate counts for each element in the array:

下面是我对Perl程序员使用散列为数组中的每个元素积累计数的解决方案的看法:

ary = [4, 3, 3, 1, 6, 6]

ary.inject({}) { |h,a| 
  h[a] ||= 0
  h[a] += 1
  h 
}.select{ |k,v| v == 1 }.keys # => [4, 1]

It could be on one line, if that's at all important, by judicious use of semicolons between the lines in the map.

它可以在一行上,如果这一点很重要的话,在地图上的线之间合理地使用分号。

A little different way is:

一种不同的方式是:

ary.inject({}) { |h,a| h[a] ||= 0; h[a] += 1; h }.map{ |k,v| k if (v==1) }.compact # => [4, 1]

It replaces the select{...}.keys with map{...}.compact so it's not really an improvement, and, to me is a bit harder to understand.

它取代了选择{…}。带有map{…}.compact的键,所以它并不是一个真正的改进,而且对我来说有点难以理解。

#1


4  

a = [4, 3, 3, 1, 6, 6]
a.select{|b| a.count(b) == 1}
#=> [4, 1]

More complicated but faster solution (O(n) I believe :))

更复杂但更快的解决方案(O(n)我相信:)

a = [4, 3, 3, 1, 6, 6]
ar = []
add = proc{|to, form| to << from[1] if form.uniq.size == from.size }
a.sort!.each_cons(3){|b| add.call(ar, b)}
ar << a[0] if a[0] != a[1]; ar << a[-1] if a[-1] != a[-2]

#2


4  

arr = [4, 3, 3, 1, 6, 6]

arr.
  group_by {|e| e }.
  map {|e, es| [e, es.length] }.
  reject {|e, count| count > 1 }.
  map(&:first)
# [4, 1]

#3


2  

Without introducing the need for a separate copy of the original array and using inject:

不需要单独复制原始数组并使用inject:

[4, 3, 3, 1, 6, 6].inject({}) {|s,v| s[v] ? s.merge({v=>s[v]+1}) : s.merge({v=>1})}.select {|k,v| k if v==1}.keys
 => [4, 1] 

#4


1  

I needed something like this, so tested a few different approaches. These all return an array of the items that are duplicated in the original array:

我需要这样的东西,所以测试了几种不同的方法。这些都返回原始数组中重复的项目的数组:

module Enumerable
def dups
  inject({}) {|h,v| h[v]=h[v].to_i+1; h}.reject{|k,v| v==1}.keys
end
def only_duplicates
  duplicates = []
  self.each {|each| duplicates << each if self.count(each) > 1}
  duplicates.uniq
end
def dups_ej
  inject(Hash.new(0)) {|h,v| h[v] += 1; h}.reject{|k,v| v==1}.keys
end
def dedup
  duplicates = self.dup
  self.uniq.each { |v| duplicates[self.index(v)] = nil }
  duplicates.compact.uniq
end
end

Benchark results for 100,000 iterations, first with an array of integers, then an array of strings. Performance will vary depending on the numer of duplicates found, but these tests are with a fixed number of duplicates (~ half array entries are duplicates):

Benchark的结果是100,000次迭代,首先是整数数组,然后是字符串数组。性能将根据所发现的重复的数字而有所不同,但是这些测试是有固定数量的重复的(~半数组条目是重复的):

test_benchmark_integer
                                    user     system      total        real
Enumerable.dups                 2.560000   0.040000   2.600000 (  2.596083)
Enumerable.only_duplicates      6.840000   0.020000   6.860000 (  6.879830)
Enumerable.dups_ej              2.300000   0.030000   2.330000 (  2.329113)
Enumerable.dedup                1.700000   0.020000   1.720000 (  1.724220)
test_benchmark_strings
                                    user     system      total        real
Enumerable.dups                 4.650000   0.030000   4.680000 (  4.722301)
Enumerable.only_duplicates     47.060000   0.150000  47.210000 ( 47.478509)
Enumerable.dups_ej              4.060000   0.030000   4.090000 (  4.123402)
Enumerable.dedup                3.290000   0.040000   3.330000 (  3.334401)
..
Finished in 73.190988 seconds.

So of these approaches, it seems the Enumerable.dedup algorithm is the best:

在这些方法中,似乎最优的是可枚举的。dedup算法:

  • dup the original array so it is immutable
  • 使用原始数组,所以它是不可变的。
  • gets the uniq array elements
  • 获取uniq数组元素
  • for each unique element: nil the first occurence in the dup array
  • 对于每个唯一元素:nil表示出现在dup数组中的第一个元素
  • compact the result
  • 紧凑的结果

If only (array - array.uniq) worked correctly! (it doesn't - it removes everything)

如果(数组- array.uniq)正确工作!(它没有——它删除了一切)

#5


0  

Here's my spin on a solution used by Perl programmers using a hash to accumulate counts for each element in the array:

下面是我对Perl程序员使用散列为数组中的每个元素积累计数的解决方案的看法:

ary = [4, 3, 3, 1, 6, 6]

ary.inject({}) { |h,a| 
  h[a] ||= 0
  h[a] += 1
  h 
}.select{ |k,v| v == 1 }.keys # => [4, 1]

It could be on one line, if that's at all important, by judicious use of semicolons between the lines in the map.

它可以在一行上,如果这一点很重要的话,在地图上的线之间合理地使用分号。

A little different way is:

一种不同的方式是:

ary.inject({}) { |h,a| h[a] ||= 0; h[a] += 1; h }.map{ |k,v| k if (v==1) }.compact # => [4, 1]

It replaces the select{...}.keys with map{...}.compact so it's not really an improvement, and, to me is a bit harder to understand.

它取代了选择{…}。带有map{…}.compact的键,所以它并不是一个真正的改进,而且对我来说有点难以理解。