Lucene实现自定义相似度计算公式

时间:2021-08-23 03:10:20

假如我们在特殊的应用场景中,需要忽略tf、df所产生的影响,可以如下实现:

1、实现自己的相似度计算方式:

public class MySimilarity extends DefaultSimilarity {
@Override
public float tf(float freq) {
return 1.0f;
}

/** Implemented as <code>log(numDocs/(docFreq+1)) + 1</code>. */
@Override
public float idf(long docFreq, long numDocs) {
return 1.0f;
}
}
2、在创建索引时IndexWriterConfig中指定相似度计算方式如下:

		Analyzer analyzer = new MyAnalyzer(0);
MySimilarity sim = new MySimilarity();

IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_48, analyzer);
iwc.setOpenMode(OpenMode.CREATE);
iwc.setSimilarity(sim);

IndexWriter writer = new IndexWriter(indexDir, iwc);
3、在搜索时指定相似度计算方式:

                MySimilarity sim = new MySimilarity();
IndexSearcher searcher = new IndexSearcher(reader);
searcher.setSimilarity(sim);