自然语言处理工具:中文 word2vec 开源项目,教程,数据集

时间:2021-06-23 00:30:32

word2vec

word2vec/glove/swivel binary file on chinese corpus

word2vec: https://code.google.com/p/word2vec/

glove: http://nlp.stanford.edu/projects/glove/

swivel: https://github.com/tensorflow/models/tree/master/swivel

http://arxiv.org/abs/1602.02215

开源项目

wordvectors

Pre-trained word vectors of 30+ languages

https://github.com/Kyubyong/wordvectors

chinese-word2vec

word2vec/glove/swivel binary file on chinese corpus

https://github.com/to-shimo/chinese-word2vec

教程

*语料中的词语相似度探索

http://www.52nlp.cn/tag/gensim

利用word2vec对关键词进行聚类

http://blog.csdn.net/zhaoxinfan/article/details/11069485

Training Word2Vec Model on English Wikipedia by Gensim

http://textminingonline.com/training-word2vec-model-on-english-wikipedia-by-gensim

数据集

wiki

https://dumps.wikimedia.org/zhwiki/latest/zhwiki-latest-pages-articles.xml.bz2

sogou

http://www.sogou.com/labs/resource/list_news.php

更多机器学习教程:http://www.tensorflownews.com/