如何在字符串中搜索正则表达式模式的重叠匹配

时间:2022-09-13 12:12:33

I have this string

我有这个字符串

my $line = "MZEFSRGGRMEAZFE*MQZEFFMAEZF*"

and I want to find every substring starting with M and ending with * and add it to an array. This means that the above string would give me 6 elements in my array.

我想找到以M开头并以*结尾的每个子字符串,并将其添加到数组中。这意味着上面的字符串会在我的数组中给出6个元素。

I have this code

我有这个代码

foreach ( $line =~ m/M.*?\*/g ) {
    push @ORF, $_;
}

but it only gives me two elements in my array since it ignores overlapping strings.

但它只给了我数组中的两个元素,因为它忽略了重叠的字符串。

Is there any way to get all matches? I tried googling but could not find an answer.

有没有办法获得所有比赛?我试过谷歌搜索但找不到答案。

2 个解决方案

#1


4  

Can use code within re and Backtracking control verbs for a little magic:

可以在re和Backtracking控制动词中使用代码来获得一些魔力:

#!/usr/bin/env perl

use strict;
use warnings;

my $line = "MZEFSRGGRMEAZFE*MQZEFFMAEZF*";

local our @match;

$line =~ m/(M.*\*)(?{ push @match, $1 })(*FAIL)/;

use Data::Dump;

dd @match;

Outputs:

(
  "MZEFSRGGRMEAZFE*MQZEFFMAEZF*",
  "MZEFSRGGRMEAZFE*",
  "MEAZFE*MQZEFFMAEZF*",
  "MEAZFE*",
  "MQZEFFMAEZF*",
  "MAEZF*",
)

#2


1  

I don't believe it's possible to create a single regex pattern that will match all such substrings, because you're asking for both a greedy and a non-greedy match at the same time, and everything else in-between

我不相信有可能创建一个匹配所有这些子串的单一正则表达式模式,因为你要求同时进行贪婪和非贪婪的匹配,以及其他所有内容

I suggest you store all possible start and end positions of these substrings and use a double loop to combine all start positions with all end positions

我建议您存储这些子串的所有可能的开始和结束位置,并使用双循环将所有起始位置与所有结束位置组合

This program demonstrates

这个程序演示

use strict;
use warnings 'all';
use feature 'say';

my $line = 'MZEFSRGGRMEAZFE*MQZEFFMAEZF*';

my @orf;

{
    my (@s, @e);
    push @s, $-[0] while $line =~/M/g;
    push @e, $+[0] while $line =~/\*/g;

    for my $s ( @s ) {
        for my $e ( @e ) {
            push @orf, substr $line, $s, $e-$s if $e > $s;
        }
    }
}

say for @orf;

output

MZEFSRGGRMEAZFE*
MZEFSRGGRMEAZFE*MQZEFFMAEZF*
MEAZFE*
MEAZFE*MQZEFFMAEZF*
MQZEFFMAEZF*
MAEZF*

#1


4  

Can use code within re and Backtracking control verbs for a little magic:

可以在re和Backtracking控制动词中使用代码来获得一些魔力:

#!/usr/bin/env perl

use strict;
use warnings;

my $line = "MZEFSRGGRMEAZFE*MQZEFFMAEZF*";

local our @match;

$line =~ m/(M.*\*)(?{ push @match, $1 })(*FAIL)/;

use Data::Dump;

dd @match;

Outputs:

(
  "MZEFSRGGRMEAZFE*MQZEFFMAEZF*",
  "MZEFSRGGRMEAZFE*",
  "MEAZFE*MQZEFFMAEZF*",
  "MEAZFE*",
  "MQZEFFMAEZF*",
  "MAEZF*",
)

#2


1  

I don't believe it's possible to create a single regex pattern that will match all such substrings, because you're asking for both a greedy and a non-greedy match at the same time, and everything else in-between

我不相信有可能创建一个匹配所有这些子串的单一正则表达式模式,因为你要求同时进行贪婪和非贪婪的匹配,以及其他所有内容

I suggest you store all possible start and end positions of these substrings and use a double loop to combine all start positions with all end positions

我建议您存储这些子串的所有可能的开始和结束位置,并使用双循环将所有起始位置与所有结束位置组合

This program demonstrates

这个程序演示

use strict;
use warnings 'all';
use feature 'say';

my $line = 'MZEFSRGGRMEAZFE*MQZEFFMAEZF*';

my @orf;

{
    my (@s, @e);
    push @s, $-[0] while $line =~/M/g;
    push @e, $+[0] while $line =~/\*/g;

    for my $s ( @s ) {
        for my $e ( @e ) {
            push @orf, substr $line, $s, $e-$s if $e > $s;
        }
    }
}

say for @orf;

output

MZEFSRGGRMEAZFE*
MZEFSRGGRMEAZFE*MQZEFFMAEZF*
MEAZFE*
MEAZFE*MQZEFFMAEZF*
MQZEFFMAEZF*
MAEZF*