什么是最好和最活跃的开源。net搜索技术?

时间:2021-07-02 03:04:31

I'm trying to decide on an open source search/indexing technology for a .Net project. It seems like the standard out there for Java projects is Lucene, but as far as .Net is concerned, the Lucene.Net project seems to be pretty inactive. Is this still the best option out there? Or are there other viable alternatives?

我正在尝试为。net项目决定一种开源搜索/索引技术。看起来Java项目的标准是Lucene,但是对于。net来说,就是Lucene。Net项目似乎相当不活跃。这仍然是最好的选择吗?或者还有其他可行的替代方案吗?

11 个解决方案

#1


23  

While they were no 'full blown' releases (i.e. full documentation, web site updates) of Lucene.Net for quite some time, there are still fresh commits to its SVN repository. The latest release (2.3.2) for example was tagged in 07/24/09 (see here). Since the development is still active I would use it for new full-text-search projects.

虽然它们不是Lucene的“完整”版本(即完整的文档、网站更新)。Net在相当长的一段时间内,仍然有对SVN存储库的新提交。例如,最新的版本(2.3.2)标记在07/24/09(见这里)。由于开发仍在进行中,我将在新的全文搜索项目中使用它。

#2


11  

I know this isn't open-source, but it is a free and very comprehensive offering from Microsoft:

我知道这不是开源的,但这是微软提供的免费的、非常全面的服务:

Microsoft Search Server 2008 Express

微软搜索服务器2008 Express。

  • Out-of-the-box relevancy.

    开箱即用的相关性。

    Localized interface.

    本地化的界面。

    Extensible search experience.

    可扩展的搜索体验。

    No preset document limits.

    没有预设文档限制。

    Continuous propagation indexing.

    连续传播索引。

    Out-of-the-box indexing connectors

    开箱即用的索引连接器

    Content summaries.

    内容摘要。

    Hit highlighting.

    高亮显示。

    Best bets and definitions.

    最好的赌注和定义。

    Query correction.

    查询校正。

    Duplicate collapsing.

    重复的崩溃。

    Filter by property.

    过滤器的财产。

    Filter by language.

    过滤器的语言。

    Sort by date.

    按日期进行排序。

    E-mail/RSS alerts

    电子邮件/ RSS警报

#3


6  

lucene.net will necessarily lag the java one since it is a port. I also don't like how the lucene port is a straight copy although it does make it easier on the docs I suppose. Something to consider is using Solr if you don't need super tight (binary) integration. I have used it before with good success. It is still powered by Lucene but I think it is better since it has some better features. You can use it from .net via an HTTP endpoint.

因为它是一个端口,所以lucen .net必然会落后于java。我也不喜欢lucene端口是一个直接拷贝,尽管我认为它在文档中更容易实现。如果不需要超紧(二进制)集成,则需要考虑使用Solr。我曾经很成功地使用过它。它仍然由Lucene提供动力,但是我认为它更好,因为它有一些更好的特性。您可以通过一个HTTP端点从。net中使用它。

One question to ask yourself is what you really need/want in a search solution. There are a lot of ways to go about implementing search and not all solutions work for every situation.

要问自己的一个问题是你在搜索解决方案中真正需要/想要什么。实现搜索有很多方法,并不是所有的解决方案都适用于每种情况。

#4


6  

SQLite has FTS3 (Full Text Search 3) that may do what you want it to do. I don't have direct experience with it, but I believe it was developed explicitly to do what Lucene does, at least in the simple case. I don't believe you can alter the tokenizer or anything (without modifying source code, anyway), but it's an option.

SQLite有FTS3(全文搜索3),可以做您想做的事情。我对它没有直接的经验,但我相信它是明确开发出来的,至少在简单的情况下是如此。我不相信你能改变记号赋予器或任何东西(无论如何,不需要修改源代码),但这是一个选项。

#5


5  

After having used Lucene.Net in a couple projects, I'd also like to add the suggestion of compiling the Java version of lucene into .net code with IKVM.NET. It works wonderfully, and you never have to worry about being out-of-date with respect to the Java version. You also have the option of compiling all the extra libraries and using them as well (I'm using the GIS search stuff in one project).

后使用Lucene。在几个项目中,我还希望将lucene的Java版本编译成使用IKVM.NET的。它工作得很好,您不必担心Java版本过时。您还可以选择编译所有额外的库并使用它们(我正在一个项目中使用GIS搜索内容)。

#6


4  

Lucene.net is implemented in nHibernate, so if you also are looking for an O/R mapper, the combination may be worth a deeper check.

lucen .net是在nHibernate中实现的,所以如果您也在寻找O/R映射器,那么组合可能值得进行更深入的检查。

We currently develop a prototype and configuring Lucene is done in a bunch of minutes (we use fluent nhibernate).

我们目前开发了一个原型并在几分钟内配置Lucene(我们使用fluent nhibernate)。

#7


3  

Although its not .net i would recommend using Solr as its built on lucene and will be simple to integrate given the fact it returns XML/HTTP and JSON

尽管它不是。net,但我还是建议使用Solr作为它在lucene上的构建,并且考虑到它返回的是XML/HTTP和JSON,因此集成起来很简单

#8


3  

As I understand, you need "just" a full-text index on your existing database, and SQL Server full-text search in principle worked for you, but your current implementation/setup is too slow.

正如我所理解的,您需要“仅仅”在现有数据库上建立一个全文本索引,而SQL Server全文本搜索在原则上对您有效,但是您当前的实现/设置太慢了。

If I were you, I wouldn't go for a completely different approach (just think about the mess to keep an external index in sync with your database, or join query results from both etc.). Try to fix the performance issue with SQL Server, as nobody would seriously assume that 6sec for searching 7k rows is the final word for a enterprise class solution that is used for some of the largest databases around... Maybe try to ask a new question about common pitfalls with this feature (I'm not an expert on this), and you might end up with a simple fix instead of a complete rebuild of your search architecture ;)

如果我是您,我就不会采用完全不同的方法(只要考虑一下如何保持外部索引与您的数据库同步,或者两者的连接查询结果等等)。尝试修复SQL Server的性能问题,因为没有人会认真地认为搜索7k行6秒是企业类解决方案的最终答案,而企业类解决方案用于周围一些最大的数据库……也许尝试问一个关于这个特性的常见缺陷的新问题(我不是这方面的专家),您可能会得到一个简单的修复,而不是完整地重新构建您的搜索架构;

#9


2  

Have a look at www.searcharoo.net. It has a crawler, and features like work stemming, indexing office documents/PDFs. The author is very active on the codeproject articles and responds to questions pretty quickly.

看看www.searcharoo.net。它有一个爬行器,以及工作词干、索引office文档/PDFs等特性。作者在codeproject文章中非常活跃,对问题的回答非常迅速。

#10


1  

I used to use DotLucene but ran into a number of problems. a major one was the fact that it required full trust to run.

我过去常常使用DotLucene,但遇到了很多问题。一个主要的原因是它需要完全的信任才能运行。

I have since moved to using SearchAroo: http://www.searcharoo.net/

此后,我开始使用SearchAroo: http://www.searcharoo.net/。

it uses an XML data store, and i have found its performance to be VERY similar to dot lucene.

它使用一个XML数据存储,我发现它的性能非常类似于dot lucene。

if you are looking for another option, i'd definitely take a look.

如果你正在寻找另一个选择,我肯定会去看看。

#11


0  

If you don't really insist on .Net you can give Sphinx a try. Open source and available for all platforms (Windows / Linux).

如果你不坚持。net,你可以试试Sphinx。开放源码,适用于所有平台(Windows / Linux)。

#1


23  

While they were no 'full blown' releases (i.e. full documentation, web site updates) of Lucene.Net for quite some time, there are still fresh commits to its SVN repository. The latest release (2.3.2) for example was tagged in 07/24/09 (see here). Since the development is still active I would use it for new full-text-search projects.

虽然它们不是Lucene的“完整”版本(即完整的文档、网站更新)。Net在相当长的一段时间内,仍然有对SVN存储库的新提交。例如,最新的版本(2.3.2)标记在07/24/09(见这里)。由于开发仍在进行中,我将在新的全文搜索项目中使用它。

#2


11  

I know this isn't open-source, but it is a free and very comprehensive offering from Microsoft:

我知道这不是开源的,但这是微软提供的免费的、非常全面的服务:

Microsoft Search Server 2008 Express

微软搜索服务器2008 Express。

  • Out-of-the-box relevancy.

    开箱即用的相关性。

    Localized interface.

    本地化的界面。

    Extensible search experience.

    可扩展的搜索体验。

    No preset document limits.

    没有预设文档限制。

    Continuous propagation indexing.

    连续传播索引。

    Out-of-the-box indexing connectors

    开箱即用的索引连接器

    Content summaries.

    内容摘要。

    Hit highlighting.

    高亮显示。

    Best bets and definitions.

    最好的赌注和定义。

    Query correction.

    查询校正。

    Duplicate collapsing.

    重复的崩溃。

    Filter by property.

    过滤器的财产。

    Filter by language.

    过滤器的语言。

    Sort by date.

    按日期进行排序。

    E-mail/RSS alerts

    电子邮件/ RSS警报

#3


6  

lucene.net will necessarily lag the java one since it is a port. I also don't like how the lucene port is a straight copy although it does make it easier on the docs I suppose. Something to consider is using Solr if you don't need super tight (binary) integration. I have used it before with good success. It is still powered by Lucene but I think it is better since it has some better features. You can use it from .net via an HTTP endpoint.

因为它是一个端口,所以lucen .net必然会落后于java。我也不喜欢lucene端口是一个直接拷贝,尽管我认为它在文档中更容易实现。如果不需要超紧(二进制)集成,则需要考虑使用Solr。我曾经很成功地使用过它。它仍然由Lucene提供动力,但是我认为它更好,因为它有一些更好的特性。您可以通过一个HTTP端点从。net中使用它。

One question to ask yourself is what you really need/want in a search solution. There are a lot of ways to go about implementing search and not all solutions work for every situation.

要问自己的一个问题是你在搜索解决方案中真正需要/想要什么。实现搜索有很多方法,并不是所有的解决方案都适用于每种情况。

#4


6  

SQLite has FTS3 (Full Text Search 3) that may do what you want it to do. I don't have direct experience with it, but I believe it was developed explicitly to do what Lucene does, at least in the simple case. I don't believe you can alter the tokenizer or anything (without modifying source code, anyway), but it's an option.

SQLite有FTS3(全文搜索3),可以做您想做的事情。我对它没有直接的经验,但我相信它是明确开发出来的,至少在简单的情况下是如此。我不相信你能改变记号赋予器或任何东西(无论如何,不需要修改源代码),但这是一个选项。

#5


5  

After having used Lucene.Net in a couple projects, I'd also like to add the suggestion of compiling the Java version of lucene into .net code with IKVM.NET. It works wonderfully, and you never have to worry about being out-of-date with respect to the Java version. You also have the option of compiling all the extra libraries and using them as well (I'm using the GIS search stuff in one project).

后使用Lucene。在几个项目中,我还希望将lucene的Java版本编译成使用IKVM.NET的。它工作得很好,您不必担心Java版本过时。您还可以选择编译所有额外的库并使用它们(我正在一个项目中使用GIS搜索内容)。

#6


4  

Lucene.net is implemented in nHibernate, so if you also are looking for an O/R mapper, the combination may be worth a deeper check.

lucen .net是在nHibernate中实现的,所以如果您也在寻找O/R映射器,那么组合可能值得进行更深入的检查。

We currently develop a prototype and configuring Lucene is done in a bunch of minutes (we use fluent nhibernate).

我们目前开发了一个原型并在几分钟内配置Lucene(我们使用fluent nhibernate)。

#7


3  

Although its not .net i would recommend using Solr as its built on lucene and will be simple to integrate given the fact it returns XML/HTTP and JSON

尽管它不是。net,但我还是建议使用Solr作为它在lucene上的构建,并且考虑到它返回的是XML/HTTP和JSON,因此集成起来很简单

#8


3  

As I understand, you need "just" a full-text index on your existing database, and SQL Server full-text search in principle worked for you, but your current implementation/setup is too slow.

正如我所理解的,您需要“仅仅”在现有数据库上建立一个全文本索引,而SQL Server全文本搜索在原则上对您有效,但是您当前的实现/设置太慢了。

If I were you, I wouldn't go for a completely different approach (just think about the mess to keep an external index in sync with your database, or join query results from both etc.). Try to fix the performance issue with SQL Server, as nobody would seriously assume that 6sec for searching 7k rows is the final word for a enterprise class solution that is used for some of the largest databases around... Maybe try to ask a new question about common pitfalls with this feature (I'm not an expert on this), and you might end up with a simple fix instead of a complete rebuild of your search architecture ;)

如果我是您,我就不会采用完全不同的方法(只要考虑一下如何保持外部索引与您的数据库同步,或者两者的连接查询结果等等)。尝试修复SQL Server的性能问题,因为没有人会认真地认为搜索7k行6秒是企业类解决方案的最终答案,而企业类解决方案用于周围一些最大的数据库……也许尝试问一个关于这个特性的常见缺陷的新问题(我不是这方面的专家),您可能会得到一个简单的修复,而不是完整地重新构建您的搜索架构;

#9


2  

Have a look at www.searcharoo.net. It has a crawler, and features like work stemming, indexing office documents/PDFs. The author is very active on the codeproject articles and responds to questions pretty quickly.

看看www.searcharoo.net。它有一个爬行器,以及工作词干、索引office文档/PDFs等特性。作者在codeproject文章中非常活跃,对问题的回答非常迅速。

#10


1  

I used to use DotLucene but ran into a number of problems. a major one was the fact that it required full trust to run.

我过去常常使用DotLucene,但遇到了很多问题。一个主要的原因是它需要完全的信任才能运行。

I have since moved to using SearchAroo: http://www.searcharoo.net/

此后,我开始使用SearchAroo: http://www.searcharoo.net/。

it uses an XML data store, and i have found its performance to be VERY similar to dot lucene.

它使用一个XML数据存储,我发现它的性能非常类似于dot lucene。

if you are looking for another option, i'd definitely take a look.

如果你正在寻找另一个选择,我肯定会去看看。

#11


0  

If you don't really insist on .Net you can give Sphinx a try. Open source and available for all platforms (Windows / Linux).

如果你不坚持。net,你可以试试Sphinx。开放源码,适用于所有平台(Windows / Linux)。