Regex匹配字符串中多个单词的开头

时间:2022-06-15 21:38:04

In Javascript i want to be able to match strings that begin with a certain phrase. However, I want it to be able to match the start of any word in the phrase, not just the beginning of the phrase.

在Javascript中,我希望能够匹配以某个短语开头的字符串。但是,我希望它能够匹配短语中任何单词的开头,而不仅仅是短语的开头。

For example:

例如:

Phrase: "This is the best"

短语:“这是最好的”

Need to Match: "th"

需要匹配:“th”

Result: Matches Th and th

结果:符合Th和Th

EDIT: \b works great however it proposes another issue:

编辑:\b工作很好,但它提出了另一个问题:

It will also match characters after foreign ones. For example if my string is "Männ", and i search for "n", it will match the n after Mä...Any ideas?

它也将匹配字符后的外国。例如,如果我的字符串是“Mann”,我搜索“n”,它将匹配Ma之后的n…什么好主意吗?

4 个解决方案

#1


23  

"This is the best moth".match(/\bth/gi);

or with a variable for your string

或者是字符串的变量

var string = "This is the best moth";
alert(string.match(/\bth/gi));

\b in a regex is a word boundary so \bth will only match a th that at the beginning of a word.

在正则表达式中,\b是一个单词边界,所以\bth只匹配一个单词的开头。

gi is for a global match (look for all occurrences) and case insensitive

gi用于全局匹配(查找所有出现的情况)和大小写不敏感

(I threw moth in there to as a reminder to check that it is not matched)

(我往里面扔飞蛾,想提醒一下它不匹配)

jsFiddle example

jsFiddle例子


Edit:

编辑:

So, the above only returns the part that you match (th). If you want to return the entire words, you have to match the entire word.

因此,上面只返回匹配的部分(th)。如果你想要返回整个单词,你必须匹配整个单词。

This is where things get tricky fast. First with no HTML entity letter:

这就是事情变得棘手的地方。首先,没有HTML实体的字母:

string.match(/\bth[^\b]*?\b/gi);

Example

例子

To match the entire word go from the word boundary \b grab the th followed by non word boundaries [^\b] until you get to another word boundary \b. The * means you want to look for 0 or more of the previous (non word boundaries) the ? mark means that this is a lazy match. In other words it doesn't expand to as big as would be possible, but stops at the first opportunity.

匹配整个单词从词界\ b抓住th随后非单词边界[^ \ b]直到你到达另一个词边界\ b。*表示您希望查找0或以上的前一个(非单词边界)?马克的意思是这是一个懒惰的匹配。换句话说,它不会扩展到尽可能大的程度,而是在第一次机会时停止。

If you have HTML entity characters like ä (ä) things get complicated really fast, and you have to use whitespace or whitespace and a set of defined characters that may be at word boundaries.

如果您有HTML实体字符,比如a (ä),事情会变得非常复杂,而且您必须使用空格或空格以及一组可能位于单词边界的已定义字符。

string.match(/\sth[^\s]*|^th[^\s]*/gi);

Example with HTML entities.

与HTML实体例子。

Since we're not using word boundaries, we have to take care of the beginning of the string separately (|^).

因为我们不使用单词边界,我们必须单独照顾字符串的开始(| ^)。

The above will capture the white space at the beginning of words. Using \b will not capture white space, since \b has no width.

上面的空格将会在单词的开头出现。使用\b不会捕获空白,因为\b没有宽度。

#2


1  

Use this:

用这个:

string.match(/^th|\sth/gi);

Examples:

例子:

'is this is a string'.match(/^th|\sth/gi);


'the string: This is a string'.match(/^th|\sth/gi);

Results:

结果:

["th", " Th"]

(“th”、“th”)

["th"]

[" th "]

#3


1  

Use the g flag in the regex. It stands for "global", I think, and it searches for all matches instead of only the first one.

在regex中使用g标志。它代表“全球”,我认为,它搜索所有匹配,而不仅仅是第一个。

You should also use the i flag for case-insensitive matching.

您还应该使用i标志进行不区分大小写的匹配。

You add flags to the end of the regex (/<regex>/<flags>) or as a second parameter to new RegExp(pattern, flags)

将标志添加到regex的末尾(/ / )或作为新RegExp的第二个参数(模式,标志)

For instance:

例如:

var matches = "This is the best".match(/\bth/gi);

or, using RegExp objects:

或者,使用正则表达式对象:

var re = new RegExp("\\bth", "gi");
var matches = re.exec("This is the best");

EDIT: Use \b in the regex to match the boundary of a word. Note that it does not really match any specific character, but the beginning or end of a word or the string.

编辑:在regex中使用\b来匹配单词的边界。请注意,它并不真正匹配任何特定的字符,而是单词或字符串的开头或结尾。

#4


1  

var matches = "This is the best".match(/\bth/ig);

returns:

返回:

["Th", "th"]

The regular expression means: Match "th" ignoring case and globally (meaning, don't stop at just one match) if "th" is the first word in the string or if "th" is preceded by a space character.

正则表达式的意思是:如果“th”是字符串中的第一个单词,或者“th”前面有空格字符,则匹配“th”忽略大小写和全局(意思是,不要只在一个匹配处停止)。

#1


23  

"This is the best moth".match(/\bth/gi);

or with a variable for your string

或者是字符串的变量

var string = "This is the best moth";
alert(string.match(/\bth/gi));

\b in a regex is a word boundary so \bth will only match a th that at the beginning of a word.

在正则表达式中,\b是一个单词边界,所以\bth只匹配一个单词的开头。

gi is for a global match (look for all occurrences) and case insensitive

gi用于全局匹配(查找所有出现的情况)和大小写不敏感

(I threw moth in there to as a reminder to check that it is not matched)

(我往里面扔飞蛾,想提醒一下它不匹配)

jsFiddle example

jsFiddle例子


Edit:

编辑:

So, the above only returns the part that you match (th). If you want to return the entire words, you have to match the entire word.

因此,上面只返回匹配的部分(th)。如果你想要返回整个单词,你必须匹配整个单词。

This is where things get tricky fast. First with no HTML entity letter:

这就是事情变得棘手的地方。首先,没有HTML实体的字母:

string.match(/\bth[^\b]*?\b/gi);

Example

例子

To match the entire word go from the word boundary \b grab the th followed by non word boundaries [^\b] until you get to another word boundary \b. The * means you want to look for 0 or more of the previous (non word boundaries) the ? mark means that this is a lazy match. In other words it doesn't expand to as big as would be possible, but stops at the first opportunity.

匹配整个单词从词界\ b抓住th随后非单词边界[^ \ b]直到你到达另一个词边界\ b。*表示您希望查找0或以上的前一个(非单词边界)?马克的意思是这是一个懒惰的匹配。换句话说,它不会扩展到尽可能大的程度,而是在第一次机会时停止。

If you have HTML entity characters like ä (&auml;) things get complicated really fast, and you have to use whitespace or whitespace and a set of defined characters that may be at word boundaries.

如果您有HTML实体字符,比如a (ä),事情会变得非常复杂,而且您必须使用空格或空格以及一组可能位于单词边界的已定义字符。

string.match(/\sth[^\s]*|^th[^\s]*/gi);

Example with HTML entities.

与HTML实体例子。

Since we're not using word boundaries, we have to take care of the beginning of the string separately (|^).

因为我们不使用单词边界,我们必须单独照顾字符串的开始(| ^)。

The above will capture the white space at the beginning of words. Using \b will not capture white space, since \b has no width.

上面的空格将会在单词的开头出现。使用\b不会捕获空白,因为\b没有宽度。

#2


1  

Use this:

用这个:

string.match(/^th|\sth/gi);

Examples:

例子:

'is this is a string'.match(/^th|\sth/gi);


'the string: This is a string'.match(/^th|\sth/gi);

Results:

结果:

["th", " Th"]

(“th”、“th”)

["th"]

[" th "]

#3


1  

Use the g flag in the regex. It stands for "global", I think, and it searches for all matches instead of only the first one.

在regex中使用g标志。它代表“全球”,我认为,它搜索所有匹配,而不仅仅是第一个。

You should also use the i flag for case-insensitive matching.

您还应该使用i标志进行不区分大小写的匹配。

You add flags to the end of the regex (/<regex>/<flags>) or as a second parameter to new RegExp(pattern, flags)

将标志添加到regex的末尾(/ / )或作为新RegExp的第二个参数(模式,标志)

For instance:

例如:

var matches = "This is the best".match(/\bth/gi);

or, using RegExp objects:

或者,使用正则表达式对象:

var re = new RegExp("\\bth", "gi");
var matches = re.exec("This is the best");

EDIT: Use \b in the regex to match the boundary of a word. Note that it does not really match any specific character, but the beginning or end of a word or the string.

编辑:在regex中使用\b来匹配单词的边界。请注意,它并不真正匹配任何特定的字符,而是单词或字符串的开头或结尾。

#4


1  

var matches = "This is the best".match(/\bth/ig);

returns:

返回:

["Th", "th"]

The regular expression means: Match "th" ignoring case and globally (meaning, don't stop at just one match) if "th" is the first word in the string or if "th" is preceded by a space character.

正则表达式的意思是:如果“th”是字符串中的第一个单词,或者“th”前面有空格字符,则匹配“th”忽略大小写和全局(意思是,不要只在一个匹配处停止)。