如何处理Perl正则表达式中的每个ASCII字符(包括正则表达式特殊字符)?

时间:2023-01-13 22:48:40

I have the following code in Perl:

我在Perl中有以下代码:

if (index ($retval, $_[2]) != -1) {
    @fs = split ($_[2], $_[1]);

$_[2] is the delimiter variable and $_[1] is the string that the delimiter may exist in. ($_[0] is used elsewhere) You may have guessed that this code is in a subroutine by those variable names.

$ _ [2]是分隔符变量,$ _ [1]是分隔符可能存在的字符串。($ _ [0]在别处使用)您可能已经猜到这些代码在这些变量名称的子例程中。

Anyway, onto my question, when my delimiter is something innocuous like 'a' or ':' the code works like it should. However, when it is something that would get parsed by Perl regex, like a '\' character, then it does not work like it is supposed to. This makes sense because in the split function Perl would see something like:

无论如何,在我的问题上,当我的分隔符是像'a'或':'这样无害的代码时,代码就像它应该的那样工作。但是,当它被Perl正则表达式解析时,就像'\'字符一样,那么它就不会像它应该的那样工作。这是有道理的,因为在拆分函数中,Perl会看到如下内容:

split (/\/, $_[1]); 

which makes no sense to it at all because it would want this:

这完全没有意义,因为它会想要这个:

split (/\//, $_[1]);

So with all of that in mind my question, that I cannot answer, is this: "How do I make it so that any delimiter that I put into $_[2], or all the ASCII characters, gets treated as the character it is supposed to be and not interpreted as something else?"

因此,考虑到所有这一点,我无法回答的问题是:“我如何制作它以便将我放入$ _ [2]或所有ASCII字符的任何分隔符视为字符对象本来应该被解释为其他东西?“

Thanks in advance,

提前致谢,

Robert

3 个解决方案

#1


13  

You can use quotemeta to escape $_[2] properly so it will work in the regex without getting mangled. This should do it:

你可以使用quotemeta来正确地转义$ _ [2],这样它就可以在正则表达式中运行而不会被破坏。这应该这样做:

my $quoted = quotemeta $_[2];
@fs = split( $quoted, $_[1] );

Alternatively, you can use \Q in your regex to escape it. See "Escape Sequences" in perlre.

或者,您可以在正则表达式中使用\ Q来逃避它。请参阅perlre中的“转义序列”。

#2


6  

split /\Q$_[2]/, $_[1]

#3


1  

As a side note, I'm suspecting that the $_[1] and $_[2] variables refer to the automatically passed in @_ array of a sub.

作为旁注,我怀疑$ _ [1]和$ _ [2]变量是指自动传入的子数据@_数组。

It's helpful - would have saved you quite some explaining here and made your code more understandable by itself - and common practice to use something like the following at the beginning of the sub:

它很有帮助 - 本来可以为你节省一些解释并使你的代码本身更容易理解 - 并且通常的做法是在sub的开头使用类似下面的内容:

sub mysub {
  my ($param1, $string, $delim) = @_;
  # ...
}

#1


13  

You can use quotemeta to escape $_[2] properly so it will work in the regex without getting mangled. This should do it:

你可以使用quotemeta来正确地转义$ _ [2],这样它就可以在正则表达式中运行而不会被破坏。这应该这样做:

my $quoted = quotemeta $_[2];
@fs = split( $quoted, $_[1] );

Alternatively, you can use \Q in your regex to escape it. See "Escape Sequences" in perlre.

或者,您可以在正则表达式中使用\ Q来逃避它。请参阅perlre中的“转义序列”。

#2


6  

split /\Q$_[2]/, $_[1]

#3


1  

As a side note, I'm suspecting that the $_[1] and $_[2] variables refer to the automatically passed in @_ array of a sub.

作为旁注,我怀疑$ _ [1]和$ _ [2]变量是指自动传入的子数据@_数组。

It's helpful - would have saved you quite some explaining here and made your code more understandable by itself - and common practice to use something like the following at the beginning of the sub:

它很有帮助 - 本来可以为你节省一些解释并使你的代码本身更容易理解 - 并且通常的做法是在sub的开头使用类似下面的内容:

sub mysub {
  my ($param1, $string, $delim) = @_;
  # ...
}