c++实现之 -- 汉语词语的简单处理

好了，我们现在已经会怎样读入了，然后就是研究一下如何存储等一些细节上的的问题了。

首先，比较函数是不能传入char*的地址的，但是可以接受一个string类。

然而，如果是两个比较长的string类，要进行比较的话，时间复杂度会上升至O(min（length）)，非常不合算。于是采用双哈希的办法，用h1、h2两个哈希值来表示特定字符串，冲突概率可以下降至基本忽略不计。不难发现双哈希的单词比较复杂度是O(2)的，大大减少了时间复杂度。

然后，就是采用什么容器进行存储。一般有两种：（不妨设哈希的使用的素数分别为p1和p2）

第一种是二维数组，第一维表示h1，第二维表示h2。为了节省空间第二维用vector进行存储，于是插入和查询的时间复杂度都是O(log(p2))。

第二种嘛，直接丢到map里，插入、查询的时间复杂度都是O(log(cnt)) （其中cnt表示不同单词个数）

于是我直接用了第二种，因为实现起来简单，而且复杂度基本相同。（因为vector常数大）

另外，c++的cin读入是非常喜闻乐见的慢，所以使用" ios::sync_with_stdio(false);"这句话关闭cin与stdio之间的同步缓冲，于是cin的速度和scanf就相差无几了。

 #include <cstdio>

 #include <iostream>

 #include <string>

 #include <cstring>

 #include <algorithm>

 #include <map>

 #define TF second

 using namespace std;

 const int tot_file = ;

 const int mod1 = ;

 const int mod2 = ;

 const int bin =  << ;

 struct Word {

     string st;

     int h1, h2;

     inline bool operator < (const Word &x) const {

         return h1 == x.h1 ? h2 < x.h2 : h1 < x.h1;

     }

     #define x (int) st[i]

     #define Weight 3001

     inline void calc_hash() {

         int len = st.length(), tmp, i;

         for (i = tmp = ; i < len; ++i)

             ((tmp *= Weight) += (x <  ? x + bin : x)) %= mod1;

         h1 = tmp;

         for (i = tmp = ; i < len; ++i)

             ((tmp *= Weight) += (x <  ? x + bin : x)) %= mod2;

         h2 = tmp;

     }

     #undef x

     #undef Weight

 };

 typedef map <Word, int> map_for_words;

 typedef map_for_words :: iterator iter_for_words;

 map_for_words passage;

 Word w;

 string st;

 void read_in() {

     ios::sync_with_stdio(false);

     while (cin >> w.st) {

         w.calc_hash();

         passage[w] += ;

     }

 }

 int main() {

     freopen("test.in", "r", stdin);

     read_in();

     iter_for_words it;

     for (it = passage.begin(); it != passage.end(); ++it)

         cout << it -> first.st << ' ' << it -> TF << endl;

     return ;

 }

效果（貌似还可以的说）：

输入：

c++实现之 -- 汉语词语的简单处理

输出：

c++实现之 -- 汉语词语的简单处理

（不要问我这界面怎么那么搞笑。。。这是终端的说）

秒客网

c++实现之 -- 汉语词语的简单处理

相关文章