Neo4j使用通配符搜索属性进行索引

时间:2021-05-29 18:04:43

We are using Neo4j Community Edition 2.3.1, and have many queries which use wildcards. For example, we search for all 'User' that have the word 'cars' in a couple larger properties: Profile and Bio ( ie: sentences or small paragraphs ).

我们正在使用Neo4j Community Edition 2.3.1,并且有很多使用通配符的查询。例如,我们在一些较大的属性中搜索所有具有“cars”字样的“User”:Profile和Bio(即:句子或小段落)。

MATCH (user:User) 
WHERE (user.Profile =~ '(?i).* cars .*') OR (user.Bio =~ '(?i).* cars .*')
RETURN user SKIP 0 LIMIT 20;

The number of 'User' nodes is over 1.6 million.

“用户”节点的数量超过160万。

The queries are relatively slow, although this is known by us, as Neo4j uses a AllNodesScan, as there are no indexes on the properties. We'd like to create an index for this query, but Neo4j 'new' indexes do not work with wildcards.

查询相对较慢,尽管我们知道这一点,因为Neo4j使用AllNodesScan,因为属性上没有索引。我们想为此查询创建索引,但Neo4j“新”索引不适用于通配符。

We are thinking to use the 'old' fulltext Neo4j indexes with Lucene. Also thinking of making Bio and Profile into labeled Nodes, instead of Properties, then using a Schema index on them.

我们正在考虑使用Lucene的“旧”全文Neo4j索引。还考虑将Bio和Profile制作成标记的节点,而不是属性,然后在它们上使用Schema索引。

I'm concerned implementing 'legacy' indexes, as they are, well 'legacy' and I'm thinking about whether they could be deprecated at some point.

我担心实现'遗留'索引,因为它们是'遗留',我正在考虑是否可以在某些时候弃用它们。

Suggestions on improving performance on the wildcard search above?

关于提高上述通配符搜索性能的建议?

2 个解决方案

#1


1  

Regarding using 'CONTAINS', it cannot be used with new Schema Indexes as per the Neo4j docs. But thanks for the suggestion.

关于使用'CONTAINS',根据Neo4j文档,它不能与新的Schema Indexes一起使用。但感谢你的建议。

I'm going to answer, and mark this as 'answered'. Our team implemented Legacy Indexing in Neo4j, and it is working wonders. Simple queries execution times are down from ~6seconds to <100ms.

我要回答,并将其标记为“已回答”。我们的团队在Neo4j中实施了Legacy Indexing,它正在创造奇迹。简单查询执行时间从大约6秒到<100毫秒。

#2


0  

UPDATE: Suggested using CONTAINS but only STARTS WITH currently uses schema indexes. This might change in future versions.

更新:建议使用CONTAINS,但只有STARTS WITH当前使用模式索引。这可能会在未来版本中发生变化

Have you tried using the new CONTAINS operator added in Neo4j 2.3?

您是否尝试过在Neo4j 2.3中添加新的CONTAINS运算符?

MATCH (user:User) 
WHERE user.Profile CONTAINS "cars" OR user.Bio CONTAINS "cars"
RETURN user SKIP 0 LIMIT 20;

You should a schema index on every String property you want to do string filtering on.

您应该对要对其进行字符串筛选的每个String属性进行架构索引。

#1


1  

Regarding using 'CONTAINS', it cannot be used with new Schema Indexes as per the Neo4j docs. But thanks for the suggestion.

关于使用'CONTAINS',根据Neo4j文档,它不能与新的Schema Indexes一起使用。但感谢你的建议。

I'm going to answer, and mark this as 'answered'. Our team implemented Legacy Indexing in Neo4j, and it is working wonders. Simple queries execution times are down from ~6seconds to <100ms.

我要回答,并将其标记为“已回答”。我们的团队在Neo4j中实施了Legacy Indexing,它正在创造奇迹。简单查询执行时间从大约6秒到<100毫秒。

#2


0  

UPDATE: Suggested using CONTAINS but only STARTS WITH currently uses schema indexes. This might change in future versions.

更新:建议使用CONTAINS,但只有STARTS WITH当前使用模式索引。这可能会在未来版本中发生变化

Have you tried using the new CONTAINS operator added in Neo4j 2.3?

您是否尝试过在Neo4j 2.3中添加新的CONTAINS运算符?

MATCH (user:User) 
WHERE user.Profile CONTAINS "cars" OR user.Bio CONTAINS "cars"
RETURN user SKIP 0 LIMIT 20;

You should a schema index on every String property you want to do string filtering on.

您应该对要对其进行字符串筛选的每个String属性进行架构索引。