Regex匹配的单词在文本中出现2次

时间:2021-03-06 19:37:44

I need match a word in English text that appears 2 times in the text. I tried

我需要匹配一个在英语文本中出现两次的单词。我试着

(^|\ )([^\ ][^\b]*\b).*\ \2\b

but this doesn't match all lines.

但这并不符合所有的线。

1 个解决方案

#1


3  

There are a few problems with your regex. For example, \b word boundaries cannot be used in a character class, so [^\b]* will not work as intended.

您的regex有一些问题。例如,\ b单词边界不能用于一个字符类,所以b[^ \]*不会按预期工作。

You probably want something like

你可能想要类似的东西

(?s)\b(\w+)\b.*\b\1\b

This will match the entire text from the first occurrence of the word to the last. This might not be what you actually intended.

这将匹配从第一个单词到最后一个单词的整个文本。这可能不是你真正想要的。

Another idea:

另一个想法:

(?s)\b(\w+)\b.*?\b\1\b

This will match only the text from the first occurrence of the word to the next.

这将只匹配从第一个单词到下一个单词的文本。

The problem with both these approaches is that for example in a text like

这两种方法的问题是,例如在文本中

foo bar bar foo

the regex will match from foo to foo, blindly ignoring that there is a duplicate bar in-between.

regex将从foo匹配到foo,盲目地忽略中间有一个重复的bar。

So if you actually want to find all words that occur in duplicate, then use

所以,如果你想要找到所有重复出现的单词,然后使用。

(?s)\b(\w+)\b(?=.*?\b\1\b)

Explanation:

解释:

(?s)       # Allow the dot to match newlines
\b(\w+)\b  # Match an entire word
(?=        # Assert that the following regex can be matched from here:
 .*?       #  Any number of characters
 \b\1\b    #  followed by the word that was previously captured
)          # End of lookahead

#1


3  

There are a few problems with your regex. For example, \b word boundaries cannot be used in a character class, so [^\b]* will not work as intended.

您的regex有一些问题。例如,\ b单词边界不能用于一个字符类,所以b[^ \]*不会按预期工作。

You probably want something like

你可能想要类似的东西

(?s)\b(\w+)\b.*\b\1\b

This will match the entire text from the first occurrence of the word to the last. This might not be what you actually intended.

这将匹配从第一个单词到最后一个单词的整个文本。这可能不是你真正想要的。

Another idea:

另一个想法:

(?s)\b(\w+)\b.*?\b\1\b

This will match only the text from the first occurrence of the word to the next.

这将只匹配从第一个单词到下一个单词的文本。

The problem with both these approaches is that for example in a text like

这两种方法的问题是,例如在文本中

foo bar bar foo

the regex will match from foo to foo, blindly ignoring that there is a duplicate bar in-between.

regex将从foo匹配到foo,盲目地忽略中间有一个重复的bar。

So if you actually want to find all words that occur in duplicate, then use

所以,如果你想要找到所有重复出现的单词,然后使用。

(?s)\b(\w+)\b(?=.*?\b\1\b)

Explanation:

解释:

(?s)       # Allow the dot to match newlines
\b(\w+)\b  # Match an entire word
(?=        # Assert that the following regex can be matched from here:
 .*?       #  Any number of characters
 \b\1\b    #  followed by the word that was previously captured
)          # End of lookahead