在PHP regex中转义反斜杠[\]的正确方法?

时间:2022-10-22 11:29:11

Just out of curiosity, I'm trying to figure out which exactly is the right way to escape a backslash for use in a PHP regular expression pattern like so:

出于好奇,我试图找出在PHP正则表达式模式中使用反斜杠的正确方式,比如:

TEST 01: (3 backslashes)

测试1:(3反斜杠)

$pattern = "/^[\\\]{1,}$/";
$string = '\\';

// ----- RETURNS A MATCH -----

TEST 02: (4 backslashes)

测试2:(4反斜杠)

$pattern = "/^[\\\\]{1,}$/";
$string = '\\';

// ----- ALSO RETURNS A MATCH -----

According to the articles below, 4 is supposedly the right way but what confuses me is that both tests returned a match. If both are right, then is 4 the preferred way?

根据下面的文章,4应该是正确的,但让我困惑的是两个测试都返回了一个匹配。如果两者都是对的,那么4是首选的方式吗?

RESOURCES:

资源:

5 个解决方案

#1


4  

The thing is, you're using a character class, [], so it doesn't matter how many literal backslashes are embedded in it, it'll be treated as a single backslash.

问题是,您使用的是一个字符类[],因此无论在其中嵌入了多少个文本反斜杠,它都将被视为单个反斜杠。

e.g. the following two regexes:

例如以下两个regex:

/[a]/
/[aa]/

are for all intents and purposes identical as far as the regex engine is concerned. Character classes take a list of characters and "collapse" them down to match a single character, along the lines of "for the current character being considered, is it any of the characters listed inside the []?". If you list two backslashes in the class, then it'll be "is the char a blackslash or is it a backslash?".

对于regex引擎而言,所有意图和目的都是相同的。字符类获取字符列表并“折叠”它们以匹配单个字符,其行与“对于正在考虑的当前字符,它是[]中列出的任何字符吗?”如果在类中列出两个反斜杠,那么它将是“char是一个黑斜杠还是它是一个反斜杠?”

#2


36  

// PHP 5.4.1

// Either three or four \ can be used to match a '\'.
echo preg_match( '/\\\/', '\\' );        // 1
echo preg_match( '/\\\\/', '\\' );       // 1

// Match two backslashes `\\`.
echo preg_match( '/\\\\\\/', '\\\\' );   // Warning: No ending delimiter '/' found
echo preg_match( '/\\\\\\\/', '\\\\' );  // 1
echo preg_match( '/\\\\\\\\/', '\\\\' ); // 1

// Match one backslash using a character class.
echo preg_match( '/[\\]/', '\\' );       // 0
echo preg_match( '/[\\\]/', '\\' );      // 1  
echo preg_match( '/[\\\\]/', '\\' );     // 1

When using three backslashes to match a '\' the pattern below is interpreted as match a '\' followed by an 's'.

当使用三个反斜杠来匹配一个'\'时,下面的模式被解释为匹配'\'后面跟着's'。

echo preg_match( '/\\\\s/', '\\ ' );    // 0  
echo preg_match( '/\\\\s/', '\\s' );    // 1  

When using four backslashes to match a '\' the pattern below is interpreted as match a '\' followed by a space character.

当使用四个反斜杠来匹配一个'\'时,下面的模式被解释为匹配一个'\'后面跟着一个空格字符。

echo preg_match( '/\\\\\s/', '\\ ' );   // 1
echo preg_match( '/\\\\\s/', '\\s' );   // 0

The same applies if inside a character class.

如果是在字符类中,也是如此。

echo preg_match( '/[\\\\s]/', ' ' );   // 0 
echo preg_match( '/[\\\\\s]/', ' ' );  // 1 

None of the above results are affected by enclosing the strings in double instead of single quotes.

上面的结果都不会受到将字符串用双引号而不是单引号括起来的影响。

Conclusions:
Whether inside or outside a bracketed character class, a literal backslash can be matched using just three backslashes '\\\' unless the next character in the pattern is also backslashed, in which case the literal backslash must be matched using four backslashes.

结论:无论是在带括号的字符类内部还是外部,都可以使用三个反斜杠'\\\ \\'来匹配文字反斜杠,除非模式中的下一个字符也是反斜杠,在这种情况下,必须使用四个反斜杠来匹配文字反斜杠。

Recommendation:
Always use four backslashes '\\\\' in a regex pattern when seeking to match a backslash.

建议:在寻找匹配反斜杠的时候,要在regex模式中使用四个反斜杠。

Escape sequences.

转义序列。

#3


9  

To avoid this kind of unclear code you can use \x5c Like this :)

为了避免这种不清晰的代码,你可以像这样使用\x5c:)

echo preg_replace( '/\x5c\w+\.php$/i', '<b>${0}</b>', __FILE__ );

#4


0  

I've studied this years ago. That's because 1st backslash escapes the 2nd one and they together form a 'true baclkslash' character in pattern and this true one escapes the 3rd one. So it magically makes 3 backslashes work.

我几年前就学过了。这是因为第一个反斜杠从第二个转义,它们在模式中形成一个“真正的baclkslash”字符,而这个真正的反斜杠从第三个转义。所以它神奇地做了3个反斜杠。

However, normal suggestion is to use 4 backslashes instead of the ambiguous 3 backslashes.

然而,通常的建议是使用4个反斜杠而不是模棱两可的3个反斜杠。

If I'm wrong about anything, please feel free to correct me.

如果我说错了什么,请随时纠正我。

#5


0  

You can also use the following

您还可以使用以下内容

$regexp = <<<EOR
schemaLocation\s*=\s*["'](.*?)["']
EOR;
preg_match_all("/".$regexp."/", $xml, $matches);
print_r($matches);

keywords: dochere, nowdoc

关键词:dochere nowdoc

#1


4  

The thing is, you're using a character class, [], so it doesn't matter how many literal backslashes are embedded in it, it'll be treated as a single backslash.

问题是,您使用的是一个字符类[],因此无论在其中嵌入了多少个文本反斜杠,它都将被视为单个反斜杠。

e.g. the following two regexes:

例如以下两个regex:

/[a]/
/[aa]/

are for all intents and purposes identical as far as the regex engine is concerned. Character classes take a list of characters and "collapse" them down to match a single character, along the lines of "for the current character being considered, is it any of the characters listed inside the []?". If you list two backslashes in the class, then it'll be "is the char a blackslash or is it a backslash?".

对于regex引擎而言,所有意图和目的都是相同的。字符类获取字符列表并“折叠”它们以匹配单个字符,其行与“对于正在考虑的当前字符,它是[]中列出的任何字符吗?”如果在类中列出两个反斜杠,那么它将是“char是一个黑斜杠还是它是一个反斜杠?”

#2


36  

// PHP 5.4.1

// Either three or four \ can be used to match a '\'.
echo preg_match( '/\\\/', '\\' );        // 1
echo preg_match( '/\\\\/', '\\' );       // 1

// Match two backslashes `\\`.
echo preg_match( '/\\\\\\/', '\\\\' );   // Warning: No ending delimiter '/' found
echo preg_match( '/\\\\\\\/', '\\\\' );  // 1
echo preg_match( '/\\\\\\\\/', '\\\\' ); // 1

// Match one backslash using a character class.
echo preg_match( '/[\\]/', '\\' );       // 0
echo preg_match( '/[\\\]/', '\\' );      // 1  
echo preg_match( '/[\\\\]/', '\\' );     // 1

When using three backslashes to match a '\' the pattern below is interpreted as match a '\' followed by an 's'.

当使用三个反斜杠来匹配一个'\'时,下面的模式被解释为匹配'\'后面跟着's'。

echo preg_match( '/\\\\s/', '\\ ' );    // 0  
echo preg_match( '/\\\\s/', '\\s' );    // 1  

When using four backslashes to match a '\' the pattern below is interpreted as match a '\' followed by a space character.

当使用四个反斜杠来匹配一个'\'时,下面的模式被解释为匹配一个'\'后面跟着一个空格字符。

echo preg_match( '/\\\\\s/', '\\ ' );   // 1
echo preg_match( '/\\\\\s/', '\\s' );   // 0

The same applies if inside a character class.

如果是在字符类中,也是如此。

echo preg_match( '/[\\\\s]/', ' ' );   // 0 
echo preg_match( '/[\\\\\s]/', ' ' );  // 1 

None of the above results are affected by enclosing the strings in double instead of single quotes.

上面的结果都不会受到将字符串用双引号而不是单引号括起来的影响。

Conclusions:
Whether inside or outside a bracketed character class, a literal backslash can be matched using just three backslashes '\\\' unless the next character in the pattern is also backslashed, in which case the literal backslash must be matched using four backslashes.

结论:无论是在带括号的字符类内部还是外部,都可以使用三个反斜杠'\\\ \\'来匹配文字反斜杠,除非模式中的下一个字符也是反斜杠,在这种情况下,必须使用四个反斜杠来匹配文字反斜杠。

Recommendation:
Always use four backslashes '\\\\' in a regex pattern when seeking to match a backslash.

建议:在寻找匹配反斜杠的时候,要在regex模式中使用四个反斜杠。

Escape sequences.

转义序列。

#3


9  

To avoid this kind of unclear code you can use \x5c Like this :)

为了避免这种不清晰的代码,你可以像这样使用\x5c:)

echo preg_replace( '/\x5c\w+\.php$/i', '<b>${0}</b>', __FILE__ );

#4


0  

I've studied this years ago. That's because 1st backslash escapes the 2nd one and they together form a 'true baclkslash' character in pattern and this true one escapes the 3rd one. So it magically makes 3 backslashes work.

我几年前就学过了。这是因为第一个反斜杠从第二个转义,它们在模式中形成一个“真正的baclkslash”字符,而这个真正的反斜杠从第三个转义。所以它神奇地做了3个反斜杠。

However, normal suggestion is to use 4 backslashes instead of the ambiguous 3 backslashes.

然而,通常的建议是使用4个反斜杠而不是模棱两可的3个反斜杠。

If I'm wrong about anything, please feel free to correct me.

如果我说错了什么,请随时纠正我。

#5


0  

You can also use the following

您还可以使用以下内容

$regexp = <<<EOR
schemaLocation\s*=\s*["'](.*?)["']
EOR;
preg_match_all("/".$regexp."/", $xml, $matches);
print_r($matches);

keywords: dochere, nowdoc

关键词:dochere nowdoc