用于双引号的Solr PatternTokenizer正则表达式

时间:2022-09-15 13:32:51

I would like to use " as a token seperatior for the input by using PatternTokenizer. My setting in schema.xml is of the following

我想通过使用PatternTokenizer使用“作为输入的令牌seperatior。我在schema.xml中的设置如下

<tokenizer class="solr.PatternTokenizerFactory" pattern="[\s\.,!(){\[\]:}\"]+"/>

But this one failed since the second " is mistook for the closing of pattern (Solr cannot start with it). How can I achieve my desired output?

但是这个失败了,因为第二个“错误地关闭模式(Solr不能从它开始)。我怎样才能达到我想要的输出?

1 个解决方案

#1


2  

You need to update the line to

您需要将行更新为

pattern="[\s.,!(){\[\]:}&quot;]+"

The literal quote must be replaced with the XML entity.

必须用XML实体替换文字引号。

As an alternative, you may use \u0022 that will be correctly parsed by the regex engine as a literal double quote.

作为替代方案,您可以使用\ u0022将正则表达式引擎正确解析为文字双引号。

#1


2  

You need to update the line to

您需要将行更新为

pattern="[\s.,!(){\[\]:}&quot;]+"

The literal quote must be replaced with the XML entity.

必须用XML实体替换文字引号。

As an alternative, you may use \u0022 that will be correctly parsed by the regex engine as a literal double quote.

作为替代方案,您可以使用\ u0022将正则表达式引擎正确解析为文字双引号。