如何处理Perl正则表达式中的特殊字符?

时间:2023-01-13 22:11:03

I'm using a Perl program to extract text from a file. I have an array of strings which I use as delimiters for the text, e.g:

我正在使用Perl程序从文件中提取文本。我有一个字符串数组,我用它作为文本的分隔符,例如:

$pat = $arr[1] . '(.*?)' . $arr[2];

if ( $src =~ /$pat/ ) {
   print $1;
}

However, two of the strings in the array are $450 and (Buy now). The problem with these is that the symbols in the strings represent end-of-string and capture group in Perl regular expressions, so the text doesn't parse as I intend.

但是,阵列中的两个字符串是450美元和(立即购买)。这些问题是字符串中的符号表示Perl正则表达式中的字符串结尾和捕获组,因此文本不会按照我的意图进行解析。

Is there a way around this?

有没有解决的办法?

3 个解决方案

#1


11  

Try Perl's quotemeta function. Alternatively, use \Q and \E in your regex to turn off interpolation of values in the regex. See perlretut for more on \Q and \E - they may not be what you're looking for.

试试Perl的quotemeta功能。或者,在正则表达式中使用\ Q和\ E来关闭正则表达式中值的插值。有关\ Q和\ E的更多信息,请参阅perlretut - 它们可能不是您正在寻找的。

#2


9  

quotemeta escapes meta-characters so they are interpreted as literals. As a shortcut, you can use \Q...\E in double-quotish context to surround stuff that should be quoted:

quotemeta转义元字符,因此它们被解释为文字。作为一种快捷方式,您可以在双引号上下文中使用\ Q ... \ E来包围应该引用的内容:

$pat = quotemeta($arr[1]).'(.*?)'.quotemeta($arr[2]);
if($src=~$pat) { print $1 }

or

要么

$pat = "\Q$arr[1]\E(.*?)\Q$arr[2]";  # \E not necessary at the end
if($src=~$pat) { print $1 }

or just

要不就

if ( $src =~ /\Q$arr[1]\E(.*?)\Q$arr[2]/ ) { print $1 }

Note that this isn't limited to interpolated variables; literal characters are affected too:

请注意,这不仅限于插值变量;文字字符也会受到影响:

perl -wle'print "\Q.+?"'
\.\+\?

though obviously it happens after variable interpolation, so "\Q$foo" doesn't become '\$foo'.

虽然很明显它发生在变量插值之后,所以“\ Q $ foo”不会变成'\ $ foo'。

#3


4  

Use quotemeta:

使用quotemeta:

$pat = quotemeta($arr[1]) . '(.*?)' . quotemeta($arr[2]);
if ($src =~ $pat) 
    print $1;

#1


11  

Try Perl's quotemeta function. Alternatively, use \Q and \E in your regex to turn off interpolation of values in the regex. See perlretut for more on \Q and \E - they may not be what you're looking for.

试试Perl的quotemeta功能。或者,在正则表达式中使用\ Q和\ E来关闭正则表达式中值的插值。有关\ Q和\ E的更多信息,请参阅perlretut - 它们可能不是您正在寻找的。

#2


9  

quotemeta escapes meta-characters so they are interpreted as literals. As a shortcut, you can use \Q...\E in double-quotish context to surround stuff that should be quoted:

quotemeta转义元字符,因此它们被解释为文字。作为一种快捷方式,您可以在双引号上下文中使用\ Q ... \ E来包围应该引用的内容:

$pat = quotemeta($arr[1]).'(.*?)'.quotemeta($arr[2]);
if($src=~$pat) { print $1 }

or

要么

$pat = "\Q$arr[1]\E(.*?)\Q$arr[2]";  # \E not necessary at the end
if($src=~$pat) { print $1 }

or just

要不就

if ( $src =~ /\Q$arr[1]\E(.*?)\Q$arr[2]/ ) { print $1 }

Note that this isn't limited to interpolated variables; literal characters are affected too:

请注意,这不仅限于插值变量;文字字符也会受到影响:

perl -wle'print "\Q.+?"'
\.\+\?

though obviously it happens after variable interpolation, so "\Q$foo" doesn't become '\$foo'.

虽然很明显它发生在变量插值之后,所以“\ Q $ foo”不会变成'\ $ foo'。

#3


4  

Use quotemeta:

使用quotemeta:

$pat = quotemeta($arr[1]) . '(.*?)' . quotemeta($arr[2]);
if ($src =~ $pat) 
    print $1;