如何使矢量元素唯一?(删除非相邻重复)

I have a vector containing few non-adjacent duplicates.

我有一个包含很少非相邻重复的向量。

As a simple example, consider:

举个简单的例子:

2 1 6 1 4 6 2 1 1

I am trying to make this vector unique by removing the non-adjacent duplicates and maintaining the order of elements.

我试图通过删除不相邻的重复并保持元素的顺序来使这个向量唯一。

Result would be:

结果将是:

2 1 6 4

The solutions I tried are:

我尝试的解决方法是:

Inserting into a std::set but the problem with this approach is that it will disturb the order of elements.
将插入到std::set中，但是这种方法的问题是会打乱元素的顺序。
Use the combination of std::sort and std::unique. But again same order problem.
使用std::sort和std::unique的组合。但是同样的顺序问题。

Manual duplicate elimination:

手动复制消除:

    Define a temporary vector TempVector.
    for (each element in a vector)
    {
        if (the element does not exists in TempVector)
        {
            add to TempVector;
        }
    }
    swap orginial vector with TempVector.

My question is:

我的问题是:

Is there any STL algorithm which can remove the non-adjacent duplicates from the vector ? what is its complexity?

是否有任何STL算法可以从向量中移除非相邻的重复项?它的复杂性是什么?

11 个解决方案

#1

I think you would do it like this:

我想你会这样做:

I would use two iterators on the vector :

我将在向量上使用两个迭代器:

The first of one reads the data and inserts it a temporary set.

第一个读取数据并插入一个临时集。

When the read data was not in the set you copy it from the first iterator to the second and increment it.

当读取数据不在集合中时，您将它从第一个迭代器复制到第二个迭代器，并对其进行递增。

At the end you keep only the data up to the second iterator.

最后，只保留数据到第二个迭代器。

The complexity is O( n .log( n ) ) as the lookup for duplicated elements uses the set, not the vector.

复杂性是O(n .log(n))，因为查找重复的元素使用集合，而不是向量。

#include <vector>
#include <set>
#include <iostream>

int main(int argc, char* argv[])
{
    std::vector< int > k ;

    k.push_back( 2 );
    k.push_back( 1 );
    k.push_back( 6 );
    k.push_back( 1 );
    k.push_back( 4 );
    k.push_back( 6 );
    k.push_back( 2 );
    k.push_back( 1 );
    k.push_back( 1 );

{
    std::vector< int >::iterator r , w ;

    std::set< int > tmpset ;

    for( r = k.begin() , w = k.begin() ; r != k.end() ; ++r )
    {
        if( tmpset.insert( *r ).second )
        {
            *w++ = *r ;
        }
    }

    k.erase( w , k.end() );
}


    {
        std::vector< int >::iterator r ;

        for( r = k.begin() ; r != k.end() ; ++r )
        {
            std::cout << *r << std::endl ;
        }
    }
}

#2

Without using a temporary set it's possible to do this with (possibly) some loss of performance:

如果不使用临时集，就有可能(可能)造成性能损失:

template<class Iterator>
Iterator Unique(Iterator first, Iterator last)
{
    while (first != last)
    {
        Iterator next(first);
        last = std::remove(++next, last, *first);
        first = next;
    }

    return last;
}

used as in:

用作:

vec.erase( Unique( vec.begin(), vec.end() ), vec.end() );

For smaller data sets, the implementation simplicity and lack of extra allocation required may offset the theoretical higher complexity of using an additional set. Measurement with a representative input is the only way to be sure, though.

对于较小的数据集，实现的简单性和所需的额外分配的缺乏可能会抵消使用附加数据集的理论上更高的复杂性。

#3

You can remove some of the loops in fa's answer using remove_copy_if:

您可以使用remove_copy_if删除fa答案中的一些循环:

class NotSeen : public std::unary_function <int, bool>
{
public:
  NotSeen (std::set<int> & seen) : m_seen (seen) { }

  bool operator ()(int i) const  {
    return (m_seen.insert (i).second);
  }

private:
  std::set<int> & m_seen;
};

void removeDups (std::vector<int> const & iv, std::vector<int> & ov) {
  std::set<int> seen;
  std::remove_copy_if (iv.begin ()
      , iv.end ()
      , std::back_inserter (ov)
      , NotSeen (seen));
}

This has no affect on the complexity of the algorithm (ie. as written it's also O(n log n)). You can improve upon this using unordered_set, or if the range of your values is small enough you could simply use an array or a bitarray.

这对算法的复杂性没有影响。写成O(n log n)您可以使用unordered_set对此进行改进，或者如果值的范围足够小，您可以使用数组或位数组。

#4

As the question was "is there any STL algorithm...? what is its complexity?" it makes sense to implement the function like std::unique:

问题是“有没有STL算法…?”它的复杂性是什么?

template <class FwdIterator>
inline FwdIterator stable_unique(FwdIterator first, FwdIterator last)
{
    FwdIterator result = first;
    std::unordered_set<typename FwdIterator::value_type> seen;

    for (; first != last; ++first)
        if (seen.insert(*first).second)
            *result++ = *first;
    return result;
}

So this is how std::unique is implemented plus an extra set. The unordered_set shall be faster than a regular set. All elements are removed that compare equal to the element right preceding them (the first element is kept because we cannot unify to nothing). The iterator returned points to the new end within the range [first,last).

这就是std::unique的实现方式，加上一个额外的集合，unordered_set应该比常规集合要快，所有与前面的元素比较的元素都被删除(第一个元素被保留，因为我们无法统一为无)。迭代器返回范围内的新端点[first,last]。

EDIT: The last sentence means that the container itself is NOT modified by unique. This can be confusing. The following example actually reduces the container to the unified set.

编辑:最后一句话表示容器本身没有被unique修改。这可以让人困惑。下面的示例实际上将容器减少为统一集。

1: std::vector<int> v(3, 5);
2: v.resize(std::distance(v.begin(), unique(v.begin(), v.end())));
3: assert(v.size() == 1);

Line 1 creates a vector { 5, 5, 5 }. In line 2 calling unique returns an iterator to the 2nd element, which is the first element that is not unique. Hence distance returns 1 and resize prunes the vector.

第1行创建一个向量{5,5,5}。在第2行调用unique返回一个迭代器到第二个元素，第一个元素不是唯一的。因此距离返回1并重新调整矢量的大小。

#5

There's no STL algorithm doing what you want preserving the sequence's original order.

没有STL算法做你想做的事保持序列的原始顺序。

You could create a std::set of iterators or indexes into the vector, with a comparison predicate that uses the referenced data rather than the iterators/indexes to sort stuff. Then you delete everything from the vector that isn't referenced in the set. (Of course, you could just as well use another std::vector of iterators/indexes, std::sort and std::unique that, and use this as a reference as to what to keep.)

您可以将一组迭代器或索引创建到向量中，并使用一个比较谓词，该谓词使用引用的数据而不是迭代器/索引对内容进行排序。然后从集合中没有引用的向量中删除所有内容(当然，您也可以使用另一个std:::迭代器/索引的向量，std: sort和std: unique，然后使用这个作为要保存的内容的引用)。

#6

Based on the answer of @fa. It can also get rewritten using the STL algorithm std::stable_partition:

基于@fa的答案。还可以使用STL算法std::stable_partition:

struct dupChecker_ {
    inline dupChecker_() : tmpSet() {}
    inline bool operator()(int i) {
        return tmpSet.insert(i).second;
    }
private:
    std::set<int> tmpSet;
};

k.erase(std::stable_partition(k.begin(), k.end(), dupChecker_()), k.end());

This way it is more compact and we don't need to care of the iterators.

这样它更紧凑，我们不需要关心迭代器。

It seems even not to introduce to much performance penalty. I use it in my project which needs to handle quite large vectors of complex types often and it makes no real difference.

它似乎甚至没有带来太多的性能损失。我在我的项目中经常需要处理复杂类型的大向量，它没有什么区别。

Another nice feature is, that it is possible to adjust the uniqueness by using std::set<int, myCmp_> tmpSet;. For instance, in my project to ignore certain rounding errors.

另一个不错的特性是，可以通过使用std::set tmpSet，来调整惟一性。例如，在我的项目中忽略某些舍入错误。 ,>

#7

There is a nice article by John Torjo which deals with this very question in a systematic way. The result he comes up with seems more generic and more efficient than any of the solutions suggested here so far:

约翰·托乔写了一篇很好的文章，系统地论述了这个问题。他提出的结果似乎比迄今为止提出的任何解决方案都更普遍、更有效:

http://www.builderau.com.au/program/java/soa/C-Removing-duplicates-from-a-range/0,339024620,320271583,00.htm

http://www.builderau.com.au/program/java/soa/C-Removing-duplicates-from-a-range/0,339024620,320271583,00. htm

https://web.archive.org/web/1/http://articles.techrepublic%2ecom%2ecom/5100-10878_11-1052159.html

https://web.archive.org/web/1/http:/ / articles.techrepublic % 2易康姆% 2易康姆/ 5100 - 10878 - _11 - 1052159. - html

Unfortunately, the complete code of John's solution seems to be no longer available, and John did not respond to may email. Therefore, I wrote my own code which is based on similar grounds like his, but intentionally differs in some details. Feel free to contact me (vschoech think-cell com) and discuss the details if you wish.

不幸的是，John的解决方案的完整代码似乎不再可用，John也没有回复may的邮件。因此，我写了我自己的代码，基于类似的理由，但故意在一些细节上有所不同。如果您愿意，请随时联系我(vschoech思想库com)并讨论细节。

To make the code compile for you, I added some of my own library stuff which I use regularly. Also, instead of going with plain stl, I use boost a lot to create more generic, more efficient, and more readable code.

为了编译代码，我添加了一些我自己经常使用的库内容。此外，我不使用普通的stl，而是大量使用boost来创建更通用、更高效、更可读的代码。

Have fun!

玩得开心!

#include <vector>
#include <functional>

#include <boost/bind.hpp>
#include <boost/range.hpp>
#include <boost/iterator/counting_iterator.hpp>

/////////////////////////////////////////////////////////////////////////////////////////////
// library stuff

template< class Rng, class Func >
Func for_each( Rng& rng, Func f ) {
    return std::for_each( boost::begin(rng), boost::end(rng), f );
};

template< class Rng, class Pred >
Rng& sort( Rng& rng, Pred pred ) {
    std::sort( boost::begin( rng ), boost::end( rng ), pred );
    return rng; // to allow function chaining, similar to operator+= et al.
}

template< class T >
boost::iterator_range< boost::counting_iterator<T> > make_counting_range( T const& tBegin, T const& tEnd ) {
    return boost::iterator_range< boost::counting_iterator<T> >( tBegin, tEnd );
}

template< class Func >
class compare_less_impl {
private:
    Func m_func;
public:
    typedef bool result_type;
    compare_less_impl( Func func ) 
    :   m_func( func )
    {}
    template< class T1, class T2 > bool operator()( T1 const& tLeft, T2 const& tRight ) const {
        return m_func( tLeft ) < m_func( tRight );
    }
};

template< class Func >
compare_less_impl<Func> compare_less( Func func ) {
    return compare_less_impl<Func>( func );
}


/////////////////////////////////////////////////////////////////////////////////////////////
// stable_unique

template<class forward_iterator, class predicate_type>
forward_iterator stable_unique(forward_iterator itBegin, forward_iterator itEnd, predicate_type predLess) {
    typedef std::iterator_traits<forward_iterator>::difference_type index_type;
    struct SIteratorIndex {
        SIteratorIndex(forward_iterator itValue, index_type idx) : m_itValue(itValue), m_idx(idx) {}
        std::iterator_traits<forward_iterator>::reference Value() const {return *m_itValue;}
        index_type m_idx;
    private:
        forward_iterator m_itValue;
    };

    // {1} create array of values (represented by iterators) and indices
    std::vector<SIteratorIndex> vecitidx;
    vecitidx.reserve( std::distance(itBegin, itEnd) );
    struct FPushBackIteratorIndex {
        FPushBackIteratorIndex(std::vector<SIteratorIndex>& vecitidx) : m_vecitidx(vecitidx) {}
        void operator()(forward_iterator itValue) const {
            m_vecitidx.push_back( SIteratorIndex(itValue, m_vecitidx.size()) );
        }
    private:
        std::vector<SIteratorIndex>& m_vecitidx;
    };
    for_each( make_counting_range(itBegin, itEnd), FPushBackIteratorIndex(vecitidx) );

    // {2} sort by underlying value
    struct FStableCompareByValue {
        FStableCompareByValue(predicate_type predLess) : m_predLess(predLess) {}
        bool operator()(SIteratorIndex const& itidxA, SIteratorIndex const& itidxB) {
            return m_predLess(itidxA.Value(), itidxB.Value())
                // stable sort order, index is secondary criterion
                || !m_predLess(itidxB.Value(), itidxA.Value()) && itidxA.m_idx < itidxB.m_idx;
        }
    private:
        predicate_type m_predLess;
    };
    sort( vecitidx, FStableCompareByValue(predLess) );

    // {3} apply std::unique to the sorted vector, removing duplicate values
    vecitidx.erase(
        std::unique( vecitidx.begin(), vecitidx.end(),
            !boost::bind( predLess,
                // redundand boost::mem_fn required to compile
                boost::bind(boost::mem_fn(&SIteratorIndex::Value), _1),
                boost::bind(boost::mem_fn(&SIteratorIndex::Value), _2)
            )
        ),
        vecitidx.end()
    );

    // {4} re-sort by index to match original order
    sort( vecitidx, compare_less(boost::mem_fn(&SIteratorIndex::m_idx)) );

    // {5} keep only those values in the original range that were not removed by std::unique
    std::vector<SIteratorIndex>::iterator ititidx = vecitidx.begin();
    forward_iterator itSrc = itBegin;
    index_type idx = 0;
    for(;;) {
        if( ititidx==vecitidx.end() ) {
            // {6} return end of unique range
            return itSrc;
        }
        if( idx!=ititidx->m_idx ) {
            // original range must be modified
            break;
        }
        ++ititidx;
        ++idx;
        ++itSrc;
    }
    forward_iterator itDst = itSrc;
    do {
        ++idx;
        ++itSrc;
        // while there are still items in vecitidx, there must also be corresponding items in the original range
        if( idx==ititidx->m_idx ) {
            std::swap( *itDst, *itSrc ); // C++0x move
            ++ititidx;
            ++itDst;
        }
    } while( ititidx!=vecitidx.end() );

    // {6} return end of unique range
    return itDst;
}

template<class forward_iterator>
forward_iterator stable_unique(forward_iterator itBegin, forward_iterator itEnd) {
    return stable_unique( itBegin, itEnd, std::less< std::iterator_traits<forward_iterator>::value_type >() );
}

void stable_unique_test() {
    std::vector<int> vecn;
    vecn.push_back(1);
    vecn.push_back(17);
    vecn.push_back(-100);
    vecn.push_back(17);
    vecn.push_back(1);
    vecn.push_back(17);
    vecn.push_back(53);
    vecn.erase( stable_unique(vecn.begin(), vecn.end()), vecn.end() );
    // result: 1, 17, -100, 53
}

#8

My question is:

我的问题是:

Is there any STL algorithm which can remove the non-adjacent duplicates from the vector ? what is its complexity?

是否有任何STL算法可以从向量中移除非相邻的重复项?它的复杂性是什么?

The STL options are the ones you mentioned: put the items in a std::set, or call std::sort, std::unique and calling erase() on the container. Neither of these fulfills your requirement of "removing the non-adjacent duplicates and maintaining the order of elements."

STL选项是您提到的:将项目放在std::set中，或者调用std::sort, std::unique并在容器中调用erase()。这两种方法都不能满足您“删除非相邻副本并保持元素的顺序”的要求。

So why doesn't the STL offer some other option? No standard library will offer everything for every user's needs. The STL's design considerations include "be fast enough for nearly all users," "be useful for nearly all users," and "provide exception safety as much as possible" (and "be small enough for the Standards Committee" as the library Stepanov originally wrote was much larger, and Stroustrup axed out something like 2/3 of it).

那么为什么STL不提供其他选择呢?没有一个标准的图书馆会为每个用户的需要提供一切。STL的设计因素包括“足够快的几乎所有用户”,“为几乎所有用户是有用的,”和“提供尽可能异常安全”(和“足够小的标准委员会”作为图书馆Stepanov最初写的是更大的,和Stroustrup取消之类的2/3)。

The simplest solution I can think of would look like this:

我能想到的最简单的解决办法是这样的:

// Note:  an STL-like method would be templatized and use iterators instead of
// hardcoding std::vector<int>
std::vector<int> stable_unique(const std::vector<int>& input)
{
    std::vector<int> result;
    result.reserve(input.size());
    for (std::vector<int>::iterator itor = input.begin();
                                    itor != input.end();
                                    ++itor)
        if (std::find(result.begin(), result.end(), *itor) == result.end())
            result.push_back(*itor);
        return result;
}

This solution should be O(M * N) where M is the number of unique elements and N is the total number of elements.

这个解应该是O(M * N)，其中M是唯一元素的数量，N是元素的总数。

#9

As far as i know there is none in stl. Look up reference.

据我所知，stl里没有。查阅参考。

#10

Based on the @Corden's answer, but uses lambda expression and removes duplicates instead of writing them in the output vector

基于@Corden的答案，但是使用lambda表达式并删除重复，而不是将它们写入输出向量中

    set<int> s;
    vector<int> nodups;
    remove_copy_if(v.begin(), v.end(), back_inserter(nodups), 
        [&s](int x){ 
            return !s.insert(x).second; //-> .second false means already exists
        } );

#11

Given your your input is in vector<int> foo you can use remove to do the leg work for you here, then if you want to shrink the vector just use erase otherwise just use last as your one-past-the-end iterator when you want the vector with duplicates removed but order preserved:

如果你的输入是向量 foo，你可以使用remove来完成这里的工作，如果你想收缩矢量，只需使用delete否则，当你想要删除重复的矢量但保持顺序时，使用last作为单遍历器:

auto last = end(foo);

for(auto first = begin(foo); first < last; ++first) last = remove(next(first), last, *first);

foo.erase(last, end(foo));

Live Example

生活的例子

As far as time complexity this will be O(nm). Where n is the number of elements and m is the number of unique elements. As far as space complexity this will use O(n) because it does the removal in place.

就时间复杂度而言，这将是O(nm)。其中n为元素个数，m为唯一元素个数。就空间复杂度而言，这将使用O(n)，因为它在适当的地方进行删除。

#1