在字符串中匹配/组重复字符。

时间:2022-08-22 13:08:02

I need a regular expression that will match groups of characters in a string. Here's an example string:

我需要一个正则表达式,它将匹配字符串中的字符组。这里有一个例子字符串:

qwwwwwwwwweeeeerrtyyyyyqqqqwEErTTT

qwwwwwwwwweeeeerrtyyyyyqqqqwEErTTT

It should match

它应该匹配

(match group) "result"

(匹配组)“结果”

(1) "q"

(1)“q”

(2) "wwwwwwwww"

(2)“wwwwwwwww”

(3) "eeeee"

(3)“eeeee”

(4) "rr"

(4)“农达”

(5) "t"

(5)“t”

(6) "yyyyy"

(6)“yyyyy”

(7) "qqqq"

(7)“qqqq”

(8) "w"

(8)“w”

(9) "EE"

(9)“EE”

(10) "r"

(10)“r”

(11) "TTT"

(11)“双塔”

after doing some research, this is the best I could come up with

在做了一些研究之后,这是我能想到的最好的方法。

/(.)(\1*)/g

/(。)(\ 1 *)/ g

The problem I'm having is that the only way to use the \1 back-reference is to capture the character first. If I could reference the result of a non capturing group I could solve this problem but after researching I don't think it's possible.

我遇到的问题是,使用\1反向引用的惟一方法是先捕获字符。如果我可以参考一个非捕获组的结果,我可以解决这个问题,但在研究之后,我认为这是不可能的。

4 个解决方案

#1


3  

Looks like you need to use a Matcher in a loop:

看起来您需要在循环中使用Matcher:

Pattern p = Pattern.compile("((.)\\2*)");
Matcher m = p.matcher("qwwwwwwwwweeeeerrtyyyyyqqqqwEErTTT");
while (m.find()) {
    System.out.println(m.group(1));
}

Outputs:

输出:

q
wwwwwwwww
eeeee
rr
t
yyyyy
qqqq
w
EE
r
TTT

#2


2  

How about /((.)(\2*))/g (untested)? That way, you match the group as a whole (I'm assuming that that's what you want, and that it's what lacking from the solution you found).

如何/((。)(\ 2 *))/ g(未测试)?这样一来,你就能把整个团队(我认为这就是你想要的,而不是你找到的解决方案)匹配起来。

#3


1  

Assuming what @cruncher said as a premise is true: "we want to catch repeating letter groups without knowing beforehand which letter should be repeating" then:

假设@cruncher说的前提是正确的:“我们想要在不事先知道哪个字母应该重复的情况下,抓住重复的字母组合,”然后:

/((a*?+)|(b*?+)|(c*?+)|(d*?+)|(e*?+)|(f*?+)|(g*?+)|(h*?+))/

The above RegEx should allow the capture of repeating letter groups without hardcoding a particular order in which they would occur.

上面的RegEx应该允许捕获重复的字母组,而不需要硬编码它们将发生的特定顺序。

The ?+ is a reluctant possesive quantifier which helps us not waste RAM space by not saving previously valid backtracking cases if the current case is valid.

这个?+是一个不愿意使用的量词,它帮助我们不浪费内存空间,因为如果当前的情况有效的话,它不会保存以前有效的回溯案例。

#4


-1  

Since you did tag java, I'll give an alternative non-regex solution(I believe in requirements being the end product, not the method by which you get there).

既然您已经标记了java,那么我将给出一个替代的非regex解决方案(我相信需求是最终产品,而不是您到达那里的方法)。

String repeat = "";
char c = '';
for(int i = 0 ; i < s.length() ; i++)
{
    if(s.charAt(i) == c)
    {
        repeat += c;
    }
    else
    {
        if(!repeat.isEmpty()) 
            doSomething(repeat); //add to an array if you want
        c = s.charAt(i);
        repeat = "" + c;
    }
}
doSomething(repeat);

#1


3  

Looks like you need to use a Matcher in a loop:

看起来您需要在循环中使用Matcher:

Pattern p = Pattern.compile("((.)\\2*)");
Matcher m = p.matcher("qwwwwwwwwweeeeerrtyyyyyqqqqwEErTTT");
while (m.find()) {
    System.out.println(m.group(1));
}

Outputs:

输出:

q
wwwwwwwww
eeeee
rr
t
yyyyy
qqqq
w
EE
r
TTT

#2


2  

How about /((.)(\2*))/g (untested)? That way, you match the group as a whole (I'm assuming that that's what you want, and that it's what lacking from the solution you found).

如何/((。)(\ 2 *))/ g(未测试)?这样一来,你就能把整个团队(我认为这就是你想要的,而不是你找到的解决方案)匹配起来。

#3


1  

Assuming what @cruncher said as a premise is true: "we want to catch repeating letter groups without knowing beforehand which letter should be repeating" then:

假设@cruncher说的前提是正确的:“我们想要在不事先知道哪个字母应该重复的情况下,抓住重复的字母组合,”然后:

/((a*?+)|(b*?+)|(c*?+)|(d*?+)|(e*?+)|(f*?+)|(g*?+)|(h*?+))/

The above RegEx should allow the capture of repeating letter groups without hardcoding a particular order in which they would occur.

上面的RegEx应该允许捕获重复的字母组,而不需要硬编码它们将发生的特定顺序。

The ?+ is a reluctant possesive quantifier which helps us not waste RAM space by not saving previously valid backtracking cases if the current case is valid.

这个?+是一个不愿意使用的量词,它帮助我们不浪费内存空间,因为如果当前的情况有效的话,它不会保存以前有效的回溯案例。

#4


-1  

Since you did tag java, I'll give an alternative non-regex solution(I believe in requirements being the end product, not the method by which you get there).

既然您已经标记了java,那么我将给出一个替代的非regex解决方案(我相信需求是最终产品,而不是您到达那里的方法)。

String repeat = "";
char c = '';
for(int i = 0 ; i < s.length() ; i++)
{
    if(s.charAt(i) == c)
    {
        repeat += c;
    }
    else
    {
        if(!repeat.isEmpty()) 
            doSomething(repeat); //add to an array if you want
        c = s.charAt(i);
        repeat = "" + c;
    }
}
doSomething(repeat);