lucene拼写检查模块

Lucene是Apache发布的开源搜索引擎开发工具包，不仅提供了核心的搜索功能，还提供了许多其他功能插件，例如：拼写检查功能模块。

搜索拼写检查模块实现类在lucene-suggest-x.xx.x.jar包中，package名为org.apache.lucene.search.spell，其中拼写检查功能的核心实现有3个类，

分别为：SpellChecker、DirectSpellChecker、WordBreakSpellChecker;

3个类提供了不同的拼写检查方式，区别如下：

SpellChecker：提供了原始的拼写检查功能，在拼写检查前需要重新建立索引（根据txt字典文件建立索引或者已有索引文件的某个字段建立索引），然后才可以进行拼写检查；

SpellChecker源码分析查看如下网站：http://www.tuicool.com/articles/naIBjm

DirectSpellChecker：提供了改进的拼写检查功能，可以直接利用已有索引文件进行拼写检查，不需要重新建立索引（solr系统默认采用此种方式进行拼写检查）；

WordBreakSpellChecker：也不需要重新建索引，可以利用已有索引进行拼写检查。

SpellChecker使用：

建立索引有三种方式：

PlainTextDictionary：用txt文件初始化索引

LuceneDictionary：用现有索引的某一个字段初始化索引

HighFrequencyDictionary：用现有索引的某个字段初始化索引，但每个索引条目必须满足一定的出现率

 //新索引目录

 String spellIndexPath = “D:\\newPath”；

 //已有索引目录

 String oriIndexPath = "D:\\oriPath";

 //字典文件

 String dicFilePath = “D:\\txt\\dic.txt”；

 //目录

 Directory directory = FSDirectory.open((new File(spellIndexPath)).toPath());

 SpellChecker spellChecker = new SpellChecker(directory);

 //以下几步用来初始化索引

 IndexReader reader = DirectoryReader.open(FSDirectory.open((new File(oriIndexPath)).toPath()));

 //利用已有索引

 Dictionary dictionary = new LuceneDictionary(reader, fieldName);

 //或者利用txt字典文件

 //Dictionary dictionary = new PlainTextDictionary((new File(dicFilePath)).toPath());

 IndexWriterConfig config = new IndexWriterConfig(new StandardAnalyzer());

 spellChecker.indexDictionary(dictionary, config, true);

 String queryWord = "beijink";

 int numSug = 10;

 //拼写检查

 String[] suggestions = spellChecker.suggestSimilar(queryWord, numSug);

 reader.close();

 spellChecker.close();

 directory.close();

DirectSpellChecker使用：

 DirectSpellChecker checker = new DirectSpellChecker();

 String readerPath = "D:\\path";

 IndexReader reader = DirectoryReader.open(FSDirectory.open(

                     (new File(readerPath)).toPath()));

 Term term = new Term("fieldname", "querytext");

 int numSug = 10;

 SuggestWord[] suggestions = checker.suggestSimilar(term, numSug, reader);

秒客网

lucene拼写检查模块

相关文章