在Java中删除数组中重复项的最佳方法是什么?

时间:2022-11-24 16:10:06

I have an Array of Objects that need the duplicates removed/filtered. I was going to just override equals & hachCode on the Object elements, and then stick them in a Set... but I figured I should at least poll * to see if there was another way, perhaps some clever method of some other API?

我有一个对象数组需要删除/过滤重复项。我只是在Object元素上覆盖equals&hachCode,然后将它们粘贴在Set中......但我认为我至少应该轮询*以查看是否有其他方法,或许某些其他API的聪明方法?

9 个解决方案

#1


20  

I would agree with your approach to override hashCode() and equals() and use something that implements Set.

我同意你的方法来覆盖hashCode()和equals()并使用实现Set的东西。

Doing so also makes it absolutely clear to any other developers that the non-duplicate characteristic is required.

这样做也使任何其他开发人员都清楚地知道需要非重复的特性。

Another reason - you get to choose an implementation that meets your needs best now:

另一个原因 - 您现在可以选择最符合您需求的实施方案:

and you don't have to change your code to change the implementation in the future.

并且您不必更改代码以在将来更改实施。

#2


8  

I found this in the web

我在网上发现了这个

Here are two methods that allow you to remove duplicates in an ArrayList. removeDuplicate does not maintain the order where as removeDuplicateWithOrder maintains the order with some performance overhead.

以下两种方法允许您删除ArrayList中的重复项。 removeDuplicate不维护removeDuplicateWithOrder维护订单的顺序,其中包含一些性能开销。

  1. The removeDuplicate Method:

    removeDuplicate方法:

    /** List order not maintained **/
    public static void removeDuplicate(ArrayList arlList)
    {
     HashSet h = new HashSet(arlList);
     arlList.clear();
     arlList.addAll(h);
    }
    
  2. The removeDuplicateWithOrder Method:

    removeDuplicateWithOrder方法:

    /** List order maintained **/
    public static void removeDuplicateWithOrder(ArrayList arlList)
    {
       Set set = new HashSet();
       List newList = new ArrayList();
       for (Iterator iter = arlList.iterator(); iter.hasNext();) {
          Object element = iter.next();
          if (set.add(element))
             newList.add(element);
       }
       arlList.clear();
       arlList.addAll(newList);
    }
    

#3


3  

Overriding equals and hashCode and creating a set was my first thought too. It's good practice to have some overridden version of these methods anyway in your inheritance hierarchy.

覆盖equals和hashCode并创建一个集合也是我的第一个想法。在继承层次结构中,无论如何都要对这些方法进行一些重写,这是一种很好的做法。

I think that if you use a LinkedHashSet you'll even preserve order of unique elements...

我认为如果你使用LinkedHashSet,你甚至会保留独特元素的顺序......

#4


2  

Basically, you want a LinkedHashSet<T> implementation that supports the List<T> interface for random access. Hence, this is what you need:

基本上,您需要一个LinkedHashSet 实现,它支持List 接口以进行随机访问。因此,这就是你需要的:

public class LinkedHashSetList<T> extends LinkedHashSet<T> implements List<T> {

public class LinkedHashSetList extends LinkedHashSet 实现List {

// Implementations for List<T> methods here ...

//这里的List 方法的实现......

}

The implementation of the List<T> methods would access and manipulate the underlying LinkedHashSet<T>. The trick is to have this class behave correctly when one attempts to add duplicates via the List<T> add methods (throwing an exception or re-adding the item at a different index would be options: which you can either choose one of or make configurable by users of the class).

List 方法的实现将访问和操作底层的LinkedHashSet 。当一个人试图通过List 添加方法添加重复项时(抛出异常或在不同的索引处重新添加项目)将是选项:你可以选择其中一个或者make可由班级用户配置)。

#5


2  

Use a List distinctList to record element at the first time iterator stumble into it, returns the distinctList as list removed all duplicates

使用List distinctList在第一次迭代器偶然发现时记录元素,返回distinctList作为列表删除所有重复项


 private List removeDups(List list) {
        Set tempSet = new HashSet();
        List distinctList = new ArrayList();
        for(Iterator  it = list.iterator(); it.hasNext();) {
            Object next = it.next();
            if(tempSet.add(next)) {
                distinctList.add(next);
            } 
        }
        return distinctList;
   } 

#6


1  

I'd like to reiterate the point made by Jason in the comments:

我想重申杰森在评论中提出的观点:

Why place yourself at that point at all?

为什么要把自己放在那一点上?

Why use an array for a data structure that shouldn't hold duplicates at all?

为什么要将数组用于不应该重复的数据结构?

Use a Set or a SortedSet (when the elements have a natural order as well) at all times to hold the elements. If you need to keep the insertion order, then you can use the LinkedHashSet as it has been pointed out.

使用Set或SortedSet(当元素也具有自然顺序时)始终保持元素。如果您需要保持插入顺序,那么您可以使用已指出的LinkedHashSet。

Having to post-process some data structure is often a hint that you should have choosen a different one to begin with.

必须对一些数据结构进行后处理通常是一种暗示,你应该选择一个不同的数据结构。

#7


1  

Of course the original post begs the question, "How did you get that array (that might contain duplicated entries) in the first place?"

当然,最初的帖子提出了一个问题,“你是如何获得那个阵列(可能包含重复的条目)?”

Do you need the array (with duplicates) for other purposes, or could you simply use a Set from the beginning?

你是否需要将数组(带有重复数据)用于其他目的,或者你可以从一开始就使用Set?

Alternately, if you need to know the number of occurrences of each value, you could use a Map<CustomObject, Integer> to track counts. Also, the Google Collections definition of the Multimap classes may be of use.

或者,如果您需要知道每个值的出现次数,可以使用Map 来跟踪计数。此外,Multimap类的Google Collections定义可能有用。 ,integer>

#8


0  

A Set is definitely your best bet. The only way to remove things from an array (without creating a new one) is to null them out, and then you end up with a lot of null-checks later.

套装绝对是您最好的选择。从数组中删除东西(不创建新数据)的唯一方法是将它们清空,然后最后进行大量的空检查。

#9


0  

Speaking from a general programming standard you could always double enumerate the collections then the compare the source and target.

从通用编程标准来看,您可以始终双重枚举集合,然后比较源和目标。

And if your inner enumeration always starts one entry after the source, it's fairly efficient (pseudo code to follow)

如果你的内部枚举总是在源之后开始一个条目,那么它是相当有效的(伪代码可以遵循)

foreach ( array as source )
{
    // keep track where we are in the array
    place++;
    // loop the array starting at the entry AFTER the current one we are comparing to
    for ( i=place+1; i < max(array); i++ )
    {
        if ( source === array[place] )
        {
            destroy(array[i]);
        }
    }
}

You could arguably add a break; statement after the destroy but then you only discover the first duplicate, but if that's all you will ever have, then it would be a nice small optimization.

你可以说可以加一个休息时间;在破坏之后的声明,但是你只发现了第一个重复,但如果这是你将拥有的所有,那么这将是一个不错的小优化。

#1


20  

I would agree with your approach to override hashCode() and equals() and use something that implements Set.

我同意你的方法来覆盖hashCode()和equals()并使用实现Set的东西。

Doing so also makes it absolutely clear to any other developers that the non-duplicate characteristic is required.

这样做也使任何其他开发人员都清楚地知道需要非重复的特性。

Another reason - you get to choose an implementation that meets your needs best now:

另一个原因 - 您现在可以选择最符合您需求的实施方案:

and you don't have to change your code to change the implementation in the future.

并且您不必更改代码以在将来更改实施。

#2


8  

I found this in the web

我在网上发现了这个

Here are two methods that allow you to remove duplicates in an ArrayList. removeDuplicate does not maintain the order where as removeDuplicateWithOrder maintains the order with some performance overhead.

以下两种方法允许您删除ArrayList中的重复项。 removeDuplicate不维护removeDuplicateWithOrder维护订单的顺序,其中包含一些性能开销。

  1. The removeDuplicate Method:

    removeDuplicate方法:

    /** List order not maintained **/
    public static void removeDuplicate(ArrayList arlList)
    {
     HashSet h = new HashSet(arlList);
     arlList.clear();
     arlList.addAll(h);
    }
    
  2. The removeDuplicateWithOrder Method:

    removeDuplicateWithOrder方法:

    /** List order maintained **/
    public static void removeDuplicateWithOrder(ArrayList arlList)
    {
       Set set = new HashSet();
       List newList = new ArrayList();
       for (Iterator iter = arlList.iterator(); iter.hasNext();) {
          Object element = iter.next();
          if (set.add(element))
             newList.add(element);
       }
       arlList.clear();
       arlList.addAll(newList);
    }
    

#3


3  

Overriding equals and hashCode and creating a set was my first thought too. It's good practice to have some overridden version of these methods anyway in your inheritance hierarchy.

覆盖equals和hashCode并创建一个集合也是我的第一个想法。在继承层次结构中,无论如何都要对这些方法进行一些重写,这是一种很好的做法。

I think that if you use a LinkedHashSet you'll even preserve order of unique elements...

我认为如果你使用LinkedHashSet,你甚至会保留独特元素的顺序......

#4


2  

Basically, you want a LinkedHashSet<T> implementation that supports the List<T> interface for random access. Hence, this is what you need:

基本上,您需要一个LinkedHashSet 实现,它支持List 接口以进行随机访问。因此,这就是你需要的:

public class LinkedHashSetList<T> extends LinkedHashSet<T> implements List<T> {

public class LinkedHashSetList extends LinkedHashSet 实现List {

// Implementations for List<T> methods here ...

//这里的List 方法的实现......

}

The implementation of the List<T> methods would access and manipulate the underlying LinkedHashSet<T>. The trick is to have this class behave correctly when one attempts to add duplicates via the List<T> add methods (throwing an exception or re-adding the item at a different index would be options: which you can either choose one of or make configurable by users of the class).

List 方法的实现将访问和操作底层的LinkedHashSet 。当一个人试图通过List 添加方法添加重复项时(抛出异常或在不同的索引处重新添加项目)将是选项:你可以选择其中一个或者make可由班级用户配置)。

#5


2  

Use a List distinctList to record element at the first time iterator stumble into it, returns the distinctList as list removed all duplicates

使用List distinctList在第一次迭代器偶然发现时记录元素,返回distinctList作为列表删除所有重复项


 private List removeDups(List list) {
        Set tempSet = new HashSet();
        List distinctList = new ArrayList();
        for(Iterator  it = list.iterator(); it.hasNext();) {
            Object next = it.next();
            if(tempSet.add(next)) {
                distinctList.add(next);
            } 
        }
        return distinctList;
   } 

#6


1  

I'd like to reiterate the point made by Jason in the comments:

我想重申杰森在评论中提出的观点:

Why place yourself at that point at all?

为什么要把自己放在那一点上?

Why use an array for a data structure that shouldn't hold duplicates at all?

为什么要将数组用于不应该重复的数据结构?

Use a Set or a SortedSet (when the elements have a natural order as well) at all times to hold the elements. If you need to keep the insertion order, then you can use the LinkedHashSet as it has been pointed out.

使用Set或SortedSet(当元素也具有自然顺序时)始终保持元素。如果您需要保持插入顺序,那么您可以使用已指出的LinkedHashSet。

Having to post-process some data structure is often a hint that you should have choosen a different one to begin with.

必须对一些数据结构进行后处理通常是一种暗示,你应该选择一个不同的数据结构。

#7


1  

Of course the original post begs the question, "How did you get that array (that might contain duplicated entries) in the first place?"

当然,最初的帖子提出了一个问题,“你是如何获得那个阵列(可能包含重复的条目)?”

Do you need the array (with duplicates) for other purposes, or could you simply use a Set from the beginning?

你是否需要将数组(带有重复数据)用于其他目的,或者你可以从一开始就使用Set?

Alternately, if you need to know the number of occurrences of each value, you could use a Map<CustomObject, Integer> to track counts. Also, the Google Collections definition of the Multimap classes may be of use.

或者,如果您需要知道每个值的出现次数,可以使用Map 来跟踪计数。此外,Multimap类的Google Collections定义可能有用。 ,integer>

#8


0  

A Set is definitely your best bet. The only way to remove things from an array (without creating a new one) is to null them out, and then you end up with a lot of null-checks later.

套装绝对是您最好的选择。从数组中删除东西(不创建新数据)的唯一方法是将它们清空,然后最后进行大量的空检查。

#9


0  

Speaking from a general programming standard you could always double enumerate the collections then the compare the source and target.

从通用编程标准来看,您可以始终双重枚举集合,然后比较源和目标。

And if your inner enumeration always starts one entry after the source, it's fairly efficient (pseudo code to follow)

如果你的内部枚举总是在源之后开始一个条目,那么它是相当有效的(伪代码可以遵循)

foreach ( array as source )
{
    // keep track where we are in the array
    place++;
    // loop the array starting at the entry AFTER the current one we are comparing to
    for ( i=place+1; i < max(array); i++ )
    {
        if ( source === array[place] )
        {
            destroy(array[i]);
        }
    }
}

You could arguably add a break; statement after the destroy but then you only discover the first duplicate, but if that's all you will ever have, then it would be a nice small optimization.

你可以说可以加一个休息时间;在破坏之后的声明,但是你只发现了第一个重复,但如果这是你将拥有的所有,那么这将是一个不错的小优化。