何时使用正则表达式中的原始字符串?

时间:2023-02-07 20:15:46

From the documentation on regular expression I understand that it's recommended to use "raw" strings for patterns to make sure backslashes are not handled in any special way:

从正则表达式的文档中我了解到,建议对模式使用“原始”字符串,以确保不以任何特殊方式处理反斜杠:

Regular expressions use the backslash character ('\') to indicate special forms or to allow special characters to be used without invoking their special meaning. This collides with Python’s usage of the same character for the same purpose in string literals; for example, to match a literal backslash, one might have to write '\\\\' as the pattern string, because the regular expression must be \\, and each backslash must be expressed as \\ inside a regular Python string literal.

正则表达式使用反斜杠字符('\')来表示特殊形式或允许使用特殊字符而不调用它们的特殊含义。这与Python对字符串文字中相同用途的相同字符的使用相冲突;例如,要匹配文字反斜杠,可能必须将'\\\\'写为模式字符串,因为正则表达式必须为\\,并且每个反斜杠必须在常规Python字符串文字中表示为\\。

The solution is to use Python’s raw string notation for regular expression patterns; backslashes are not handled in any special way in a string literal prefixed with 'r'.

解决方案是使用Python的原始字符串表示法来表示正则表达式模式;在前缀为'r'的字符串文字中,不会以任何特殊方式处理反斜杠。

I wonder what other cases (apart from the literal backslash) may require using raw strings?

我想知道其他情况(除了字面反斜杠)可能需要使用原始字符串?

1 个解决方案

#1


2  

One another example is sequences like \1, \2 which are octal escapes in Python strings, but reference captured groups in regular expressions.

另一个例子是像\ 1,\ 2这样的序列,它们是Python字符串中的八进制转义符,但是在正则表达式中引用捕获的组。

>>> re.search(r"(\w+) \1", "the the")
<_sre.SRE_Match object; span=(0, 7), match='the the'>
>>> re.search("(\w+) \1", "the the")
>>> 

#1


2  

One another example is sequences like \1, \2 which are octal escapes in Python strings, but reference captured groups in regular expressions.

另一个例子是像\ 1,\ 2这样的序列,它们是Python字符串中的八进制转义符,但是在正则表达式中引用捕获的组。

>>> re.search(r"(\w+) \1", "the the")
<_sre.SRE_Match object; span=(0, 7), match='the the'>
>>> re.search("(\w+) \1", "the the")
>>>