如何将重叠的字符串与regex匹配?

时间:2021-06-11 09:10:00

Let's say I have the string

假设我有绳子

"12345"

If I .match(/\d{3}/g), I only get one match, "123". Why don't I get [ "123", "234", "345" ]?

如果我。match(/\d{3}/g),我只能得到一个match,“123”。为什么我得不到[123]、[234]、[345]?

5 个解决方案

#1


9  

You can't do this with a regex alone, but you can get pretty close:

你不能只使用正则表达式,但你可以非常接近:

var pat = /(?=(\d{3}))\d/g;
var results = [];
var match;

while ( (match = pat.exec( '1234567' ) ) != null ) { 
  results.push( match[1] );
}

console.log(results);

In other words, you capture all three digits inside the lookahead, then go back and match one character in the normal way just to advance the match position. It doesn't matter how you consume that character; . works just as well \d. And if you're really feeling adventurous, you can use just the lookahead and let JavaScript handle the bump-along.

换句话说,您将捕获前视中的所有三个数字,然后以常规方式返回并匹配一个字符,以提高匹配位置。你如何消费这个角色并不重要;。就像\d一样有效。如果你真的有冒险的感觉,你可以只使用前视,让JavaScript来处理问题。

This code is adapted from this answer. I would have flagged this question as a duplicate of that one, but the OP accepted another, lesser answer.

这段代码是根据这个答案改编的。我本想把这个问题标记为那个问题的重复,但是OP接受了另一个更小的答案。

#2


10  

When an expression matches, it usually consumes the characters it matched. So, after the expression matched 123, only 45 is left, which doesn't match the pattern.

当一个表达式匹配时,它通常使用它匹配的字符。因此,在表达式匹配123之后,只剩下45,这与模式不匹配。

#3


10  

The string#match with a global flag regex returns an array of matched substrings. The /\d{3}/g regex matches and consumes (=reads into the buffer and advances its index to the position right after the currently matched character) 3 digit sequence. Thus, after "eating up" 123, the index is located after 3, and the only substring left for parsing is 45 - no match here.

字符串#匹配的全局标志regex返回一个匹配的子字符串数组。/\d {3} /g regex匹配和消费(=读取缓冲区并在当前匹配字符后将其索引提前到该位置)3位数字序列。因此,在“吃掉”123之后,索引位于3之后,惟一要解析的子字符串是45—这里没有匹配。

I think the technique used at regex101.com is also worth considering here: use a zero-width assertion (a positive lookahead with a capturing group) to test all positions inside the input string. After each test, the RegExp.lastIndex (it's a read/write integer property of regular expressions that specifies the index at which to start the next match) is advanced "manually" to avoid infinite loop.

我认为在regex101.com中使用的技术在这里也值得考虑:使用一个零宽度断言(捕获组的一个正前视)来测试输入字符串中的所有位置。每次测试之后,RegExp。lastIndex(它是正则表达式的一个读/写整型属性,指定开始下一个匹配的索引)是“手动”的,以避免无限循环。

Note it is a technique implemented in .NET (Regex.Matches), Python (re.findall), PHP (preg_match_all).

注意,这是在。net (Regex.Matches)、Python (re.findall)、PHP (preg_match_all)中实现的一种技术。

Here is a demo:

这是一个演示:

var re = /(?=(\d{3}))/g; 
var str = '12345';
var res = [];
var m;
 
while ((m = re.exec(str)) !== null) {
    if (m.index === re.lastIndex) {
        re.lastIndex++;
    }
    res.push(m[1]);
}

document.body.innerHTML = JSON.stringify(res);

Here is a regex101.com demo

这是一个regex101.com演示

#4


2  

To answer the "How", you can manually change the index of the last match (requires a loop) :

要回答“如何”,您可以手动更改上一次匹配的索引(需要循环):

var input = '12345', 
    re = /\d{3}/g, 
    r = [], 
    m;
while (m = re.exec(input)) {
    re.lastIndex -= m[0].length - 1;
    r.push(m[0]);
}
r; // ["123", "234", "345"]

Here is a function for convenience :

这里有一个方便的功能:

function matchOverlap(input, re) {
    var r = [], m;
    // prevent infinite loops
    if (!re.global) re = new RegExp(
        re.source, (re+'').split('/').pop() + 'g'
    );
    while (m = re.exec(input)) {
        re.lastIndex -= m[0].length - 1;
        r.push(m[0]);
    }
    return r;
}

Usage examples :

使用例子:

matchOverlap('12345', /\D{3}/)      // []
matchOverlap('12345', /\d{3}/)      // ["123", "234", "345"]
matchOverlap('12345', /\d{3}/g)     // ["123", "234", "345"]
matchOverlap('1234 5678', /\d{3}/)  // ["123", "234", "567", "678"]
matchOverlap('LOLOL', /lol/)        // []
matchOverlap('LOLOL', /lol/i)       // ["LOL", "LOL"]

#5


0  

Use (?=(\w{3}))

使用(? =(\ w { 3 }))

(3 being the number of letters in the sequence)

(3为序列中的字母数)

#1


9  

You can't do this with a regex alone, but you can get pretty close:

你不能只使用正则表达式,但你可以非常接近:

var pat = /(?=(\d{3}))\d/g;
var results = [];
var match;

while ( (match = pat.exec( '1234567' ) ) != null ) { 
  results.push( match[1] );
}

console.log(results);

In other words, you capture all three digits inside the lookahead, then go back and match one character in the normal way just to advance the match position. It doesn't matter how you consume that character; . works just as well \d. And if you're really feeling adventurous, you can use just the lookahead and let JavaScript handle the bump-along.

换句话说,您将捕获前视中的所有三个数字,然后以常规方式返回并匹配一个字符,以提高匹配位置。你如何消费这个角色并不重要;。就像\d一样有效。如果你真的有冒险的感觉,你可以只使用前视,让JavaScript来处理问题。

This code is adapted from this answer. I would have flagged this question as a duplicate of that one, but the OP accepted another, lesser answer.

这段代码是根据这个答案改编的。我本想把这个问题标记为那个问题的重复,但是OP接受了另一个更小的答案。

#2


10  

When an expression matches, it usually consumes the characters it matched. So, after the expression matched 123, only 45 is left, which doesn't match the pattern.

当一个表达式匹配时,它通常使用它匹配的字符。因此,在表达式匹配123之后,只剩下45,这与模式不匹配。

#3


10  

The string#match with a global flag regex returns an array of matched substrings. The /\d{3}/g regex matches and consumes (=reads into the buffer and advances its index to the position right after the currently matched character) 3 digit sequence. Thus, after "eating up" 123, the index is located after 3, and the only substring left for parsing is 45 - no match here.

字符串#匹配的全局标志regex返回一个匹配的子字符串数组。/\d {3} /g regex匹配和消费(=读取缓冲区并在当前匹配字符后将其索引提前到该位置)3位数字序列。因此,在“吃掉”123之后,索引位于3之后,惟一要解析的子字符串是45—这里没有匹配。

I think the technique used at regex101.com is also worth considering here: use a zero-width assertion (a positive lookahead with a capturing group) to test all positions inside the input string. After each test, the RegExp.lastIndex (it's a read/write integer property of regular expressions that specifies the index at which to start the next match) is advanced "manually" to avoid infinite loop.

我认为在regex101.com中使用的技术在这里也值得考虑:使用一个零宽度断言(捕获组的一个正前视)来测试输入字符串中的所有位置。每次测试之后,RegExp。lastIndex(它是正则表达式的一个读/写整型属性,指定开始下一个匹配的索引)是“手动”的,以避免无限循环。

Note it is a technique implemented in .NET (Regex.Matches), Python (re.findall), PHP (preg_match_all).

注意,这是在。net (Regex.Matches)、Python (re.findall)、PHP (preg_match_all)中实现的一种技术。

Here is a demo:

这是一个演示:

var re = /(?=(\d{3}))/g; 
var str = '12345';
var res = [];
var m;
 
while ((m = re.exec(str)) !== null) {
    if (m.index === re.lastIndex) {
        re.lastIndex++;
    }
    res.push(m[1]);
}

document.body.innerHTML = JSON.stringify(res);

Here is a regex101.com demo

这是一个regex101.com演示

#4


2  

To answer the "How", you can manually change the index of the last match (requires a loop) :

要回答“如何”,您可以手动更改上一次匹配的索引(需要循环):

var input = '12345', 
    re = /\d{3}/g, 
    r = [], 
    m;
while (m = re.exec(input)) {
    re.lastIndex -= m[0].length - 1;
    r.push(m[0]);
}
r; // ["123", "234", "345"]

Here is a function for convenience :

这里有一个方便的功能:

function matchOverlap(input, re) {
    var r = [], m;
    // prevent infinite loops
    if (!re.global) re = new RegExp(
        re.source, (re+'').split('/').pop() + 'g'
    );
    while (m = re.exec(input)) {
        re.lastIndex -= m[0].length - 1;
        r.push(m[0]);
    }
    return r;
}

Usage examples :

使用例子:

matchOverlap('12345', /\D{3}/)      // []
matchOverlap('12345', /\d{3}/)      // ["123", "234", "345"]
matchOverlap('12345', /\d{3}/g)     // ["123", "234", "345"]
matchOverlap('1234 5678', /\d{3}/)  // ["123", "234", "567", "678"]
matchOverlap('LOLOL', /lol/)        // []
matchOverlap('LOLOL', /lol/i)       // ["LOL", "LOL"]

#5


0  

Use (?=(\w{3}))

使用(? =(\ w { 3 }))

(3 being the number of letters in the sequence)

(3为序列中的字母数)