在编译时使用c++ 11变量模板快速排序

I just implemented the quick sort algorithm by using C++11 variadic templates to evaluate it at compilation time. However, I encounter a performance issue when the data set is too large.

我刚刚实现了快速排序算法，使用c++ 11变量模板在编译时对其进行评估。但是，当数据集太大时，会遇到性能问题。

#include <iostream>

using namespace std;

template<int... vs>
struct Seq
{}; 
template<int v1, int...vs>
struct Seq<v1, vs...>{
};


template<typename newT, typename srcT>
struct PushFront{
};
template<int vadded, int...vs>
struct PushFront<Seq<vadded>, Seq<vs...>>{
  typedef Seq<vadded, vs...> ResultType;
};

template<typename T>
struct PopFront{
};
template<int v1, int...vs>
struct PopFront<Seq<v1, vs...>>{
  typedef Seq<vs...> RemaindType;
  typedef Seq<v1>    ResultType;
};

template<typename T1, typename T2>
struct CatSeq{};
template<int...v, int...us>
struct CatSeq<Seq<v...>, Seq<us...>>{
  typedef Seq< v..., us... >  ResultType;
};


template<bool c, typename NewT, typename TrueClsT, typename FalseClsT>
struct Classify{
};
template<typename NewT, typename TrueClsT, typename FalseClsT>
struct Classify<true, NewT, TrueClsT, FalseClsT>{
  typedef typename PushFront<NewT, TrueClsT>::ResultType NewTrueClsT;
  typedef FalseClsT  NewFalseClsT;
};
template<typename NewT, typename TrueClsT, typename FalseClsT>
struct Classify<false, NewT, TrueClsT, FalseClsT>{
  typedef TrueClsT  NewTrueClsT;
  typedef typename PushFront<NewT, FalseClsT>::ResultType NewFalseClsT;
};

template<typename T1, typename T2>
struct Compare{};
template<int v1, int v2>
struct Compare<Seq<v1>, Seq<v2>>{
  static const bool result=(v1>=v2); 
};


template<typename AnchorT, typename SeqT, typename GESet, typename LSet>
struct PartitionImpl{};
template<typename GESet, typename LSet, int anchorv, int v1>
struct PartitionImpl<Seq<anchorv>, Seq<v1>, GESet, LSet>{
  static const bool isge=Compare<typename PopFront<Seq<v1>>::ResultType, Seq<anchorv>>::result;
  typedef typename Classify<isge, Seq<v1>, GESet, LSet>::NewTrueClsT  RstGESet;
  typedef typename Classify<isge, Seq<v1>, GESet, LSet>::NewFalseClsT  RstLSet;  
};
template<typename GESet, typename LSet, int anchorv, int v1, int...vs>
struct PartitionImpl<Seq<anchorv>, Seq<v1, vs...>, GESet, LSet>{
  static const bool isge=Compare<typename PopFront<Seq<v1, vs...>>::ResultType, Seq<anchorv>>::result;
  typedef typename Classify<isge, Seq<v1>, GESet, LSet>::NewTrueClsT  TmpRstGESet;
  typedef typename Classify<isge, Seq<v1>, GESet, LSet>::NewFalseClsT  TmpRstLSet;

  typedef typename PartitionImpl<Seq<anchorv>, Seq<vs...>, TmpRstGESet, TmpRstLSet>::RstGESet RstGESet;
  typedef typename PartitionImpl<Seq<anchorv>, Seq<vs...>, TmpRstGESet, TmpRstLSet>::RstLSet  RstLSet;
};


template<typename T>
struct Partition{
};
template<int v1, int v2, int...vs>
struct Partition<Seq<v1, v2, vs...>>{
  typedef Seq<v1> AnchorType;
  typedef Seq<> GESet;
  typedef Seq<> LSet;
  typedef typename PartitionImpl<AnchorType, Seq<v1, v2, vs...>, GESet, LSet>::RstGESet  RstGESet;
  typedef typename PartitionImpl<AnchorType, Seq<v1, v2, vs...>, GESet, LSet>::RstLSet   RstLSet;
};

//why introduce this? refer to Sort
template<typename SrcT, typename GESet, typename LSet, template<typename > class SortOp>
struct SortSub{  
  typedef typename SortOp<GESet>::ResultType  TmpGESet2;
  typedef typename SortOp<LSet>::ResultType   TmpLSet2;
};
template<typename SrcT, typename LSet, template<typename> class SortOp>
struct SortSub<SrcT, SrcT, LSet, SortOp>{
  typedef SrcT  TmpGESet2;
  typedef typename SortOp<LSet>::ResultType   TmpLSet2;
};
template<typename SrcT, typename GESet, template<typename> class SortOp>
struct SortSub<SrcT, GESet, SrcT, SortOp>{
  typedef typename SortOp<GESet>::ResultType  TmpGESet2;
  typedef SrcT   TmpLSet2;
};

template<typename T>
struct Sort;
template<>
struct Sort<Seq<>>{
  typedef Seq<> ResultType;
};
template<int v>
struct Sort< Seq<v> >{
  typedef Seq<v> ResultType;
};
template<int v1, int...vs>
struct Sort< Seq<v1, vs...> >{
  typedef Seq<v1, vs...> SrcType;
  typedef typename Partition< Seq<v1, vs...> >::RstGESet TmpGESet;
  typedef typename Partition< Seq<v1, vs...> >::RstLSet TmpLSet;

  //to by pass the case SrcType <==> TmpGESet or  SrcType <==> TmpLSet
  typedef typename SortSub<SrcType, TmpGESet, TmpLSet, Sort>::TmpGESet2  TmpGESet2;
  typedef typename SortSub<SrcType, TmpGESet, TmpLSet, Sort>::TmpLSet2   TmpLSet2;

  typedef typename CatSeq<TmpGESet2, TmpLSet2>::ResultType ResultType;
};


void dumpSeqTypeImpl(Seq<> ){
}
template<int v1>
void dumpSeqTypeImpl(Seq<v1> ){
  cout<<v1<<" ";
}
template<int v1, int...vs>
void dumpSeqTypeImpl(Seq<v1, vs...> ){
  cout<<v1<<" ";
  dumpSeqTypeImpl( Seq<vs...>() );
}
template<int...vs>
void dumpSeqType(Seq<vs...> ){
  cout<<"Seq type < ";
  dumpSeqTypeImpl( Seq<vs...>() );
  cout<<" >"<<endl;
}

    //test data
#include "qsort_input.txt"

int main(){
  //Seq<>  s0;// aggregate ‘Seq<> s0’ has incomplete type and cannot be defined
  Seq<1> s1;
  Seq<1, 2> s2;

  typedef Seq<5, 5, 5> TestType_SAME;
  TestType_SAME same;
  dumpSeqType( same );
  typename Partition< TestType_SAME >::RstGESet _ts1;
  typename Partition< TestType_SAME >::RstLSet _ts2;
  dumpSeqType( _ts1 );
  dumpSeqType( _ts2 );

#if 1
  typedef Seq<4, 7, 3, 9, 1, 2, 5, 5, 19, 5> TestType;
  TestType s3;
  dumpSeqType( s3 );
  typename Partition< TestType >::RstGESet ts1;
  typename Partition< TestType >::RstLSet ts2;
  dumpSeqType( ts1 );
  dumpSeqType( ts2 );

  typename Sort<TestType>::ResultType so1;
  dumpSeqType( so1 );
#endif 

#if 1
  typedef Seq<TEST_DATA_100> TAdvanceType;
  typename Sort<TAdvanceType>::ResultType soadvance;
  dumpSeqType(soadvance);
#endif

  return 0;
}

When the data set is TEST_DATA_100, it takes 1.7s to compile.
When the data set is TEST_DATA_1000, the compiler seems to halt....

当数据集是TEST_DATA_100时，编译需要1.7秒。TEST_DATA_1000数据集时,编译器似乎停止....

I'm using gcc 4.6.0.

我用gcc 4.6.0。

2 个解决方案

#1

Have you also looked at its memory consumption? Note that quicksort itself is worse than linear, with a quite bad worse case runtime. This multiplies with the worse than linear runtime behaviour of certain steps of template compilation and instantiation (sometimes those are exponentional). You should maybe graph your compiletime for various datasets to observe the real complexity class for your code. Usually template metaprogramming with such large datasets is not feasible.

你是否也看过它的内存消耗?请注意，快速排序本身比线性更糟糕，运行时更糟糕。这将使模板编译和实例化的某些步骤的线性运行时行为变得更糟(有时这些步骤是指数型的)。您应该绘制不同数据集的编译时间图，以观察代码的实际复杂性类。通常使用如此大的数据集进行模板元编程是不可行的。

Edit: Out of curiosity I tried the code out and found that up until ~500 it follows roughly the formula pow(N*log(N),1.47)*0.0004+0.6 but then starts to get incredibly much slower, with 155 seconds for 700 items. Also at around that it starts eating very much ram (3GiB for 600) which leads me to the conclusion that for 1000 elements it will need more ram than most people have and will take hours to compile.

编辑:出于好奇，我尝试了一下代码，发现直到大约500秒，它大致遵循了公式pow(N*log(N)，1.47)*0.0004+0.6，但随后开始变得非常慢，700个条目的时间是155秒。同时，它开始吃大量的ram(600)，这使我得出结论:对于1000个元素，它需要的ram比大多数人都要多，需要几个小时来编译。

Further note that the code does not work when not every element is unique.

进一步注意，当不是每个元素都是唯一的时，代码就不起作用了。

#2

You are using recursive metafunctions to build your quicksort. What exactly did you expect to happen when you tried to shove 1000 recursive instantiations at the compiler?

您正在使用递归元表达式来构建快速排序。当您试图在编译器中插入1000个递归实例时，您期望会发生什么?

Just because a function can theoretically take arbitrary numbers of arguments does not mean that the compiler actually can handle arbitrary numbers of arguments. Compilers have limits.

仅仅因为一个函数理论上可以接受任意数量的参数，并不意味着编译器实际上可以处理任意数量的参数。编译器的限制。

Besides: what's the point of compile-time sorting? You could do that off-line and copy the data into the .cpp file. Or just run std::sort one time when the program starts.

此外:编译时排序的意义是什么?您可以离线执行，并将数据复制到.cpp文件中。或者只在程序启动时运行std::sort一次。

#1