在C风格的regex字符串中,\0(“\\0”)是c++正则表达式中的有效转义序列吗?

时间:2022-09-30 22:28:04

NOTE: When I say the regex [\0] I mean the regex [\0] (not contained in a C-style string, which would then be "[\\0]"). If I haven't put quotes around it, it's not a C-style string, and the backslashes shouldn't be interpreted as escaping a C-style string.

注意:当我说regex[\0]时,我指的是regex[\0](不包含在c风格的字符串中,它将是“[\0]”)。如果我没有在它周围加上引号,它不是一个c风格的字符串,反斜杠不应该被解释为转义一个c风格的字符串。

Inspired by this question and my investigation, I tried the following code in clang 3.4:

受到这个问题和我的调查的启发,我在clang 3.4中尝试了以下代码:

#include <regex>
#include <string>

int main()
{
    std::string input = "foobar";
    std::regex regex("[^\\0]*"); // Note, this is "\\0", not "\0"!

    return std::regex_match(input, regex);
}

Apparently, clang doesn't like this, as it throws:

显然,clang不喜欢这样,因为它抛出:

std::__1::regex_error: The expression contained an invalid escaped character, or a trailing escape.

std::__1::regex_error:表达式包含一个无效的转义字符,或一个尾转义符。

It seems to be the [^\0] part (changing it to [^\n] or something similar works fine). It seems to be an invalid escape character. I want to clarify that I'm not talking about the '\0' character (null-character) or '\n' character (newline character). In C-style strings, what I'm talking about is "\\0" (a string containing backslash zero) and "\\n" (a string containing backslash n). "\\n" seems to get transformed into "\n" by the regex engine, but it chokes on "\\0".

似乎[^ \ 0]部分(改变[^ \ n]或类似工作正常)。它似乎是一个无效的转义字符。我想澄清的是,我说的不是‘\0’字符(null-character)或‘\n’字符(换行字符)。在c型字符串中,我说的是“\0”(一个包含反斜杠0的字符串)和“\n”(一个包含反斜杠n的字符串)。“\n”似乎被regex引擎转换成“\n”,但它会被“\0”卡住。

The C++11 standard says in section 28.13 [re.grammar] that:

C+ 11标准在第28.13节[re]中规定。语法):

The regular expression grammar recognized by basic_regex objects constructed with the ECMAScript flag is that specified by ECMA-262, except as specified below.

由使用ECMAScript标志构造的basic_regex对象识别的正则表达式语法是由ECMA-262指定的语法,下面指定的除外。

I'm no expert on ECMA-262, but I tried the regular expression on JSFiddle and it's working fine there in JavaScript land.

我不是ECMA-262的专家,但是我尝试了JSFiddle的正则表达式,它在JavaScript领域运行得很好。

So now I'm wondering if the regex [^\0] is valid in ECMA-262 and the C++11 standard removed support for it (in the stuff following ... except as specified below.).

现在我想了解一下如果regex ^ \[0]是有效的在ecma - 262和C + + 11标准不再支持它(之后的东西…除了如下指定)。

Question: Is the \0 (not the null-character; in a string literal this would be "\\0") escape sequence legal in a C++11 regular expression? Is it legal in ECMA-262 (or are browser JS VMs just being "too" lenient)? What's the cause/justification for the different behaviors?

问题:是\0(不是空字符;在字符串文字中,这将是“\\0”)在一个c++ 11正则表达式中合法的转义序列吗?它在ECMA-262中是否合法(或者浏览器JS VMs是否“太”宽松)?不同行为的原因/理由是什么?

1 个解决方案

#1


2  

This was a bug in libc++'s implementation of <regex>. It should be fixed now in the trunk, and this should propagate to OS X's release code eventually.

这是libc+的 实现中的一个bug。现在应该在主干中修复它,并最终传播到OS X的发布代码中。

Also, here is the excerpt from the ECMA 262 Standard that is the basis for this bug report:

另外,下面是ECMA 262标准的摘录,该标准是这个bug报告的基础:

15.10.2.11 DecimalEscape

15.10.2.11 DecimalEscape

The production DecimalEscape :: DecimalIntegerLiteral [lookahead ∉ DecimalDigit] evaluates as follows:

生产DecimalEscape:DecimalIntegerLiteral[超前∉DecimalDigit)评估如下:

  1. Let i be the MV of DecimalIntegerLiteral.
  2. 我是十进制整数的MV。
  3. If i is zero, return the EscapeValue consisting of a <NUL> character (Unicode value 0000).
  4. 如果i为0,返回包含 字符(Unicode值0000)的EscapeValue。
  5. Return the EscapeValue consisting of the integer i.
  6. 返回包含整数i的EscapeValue。

Note: ... \0 represents the <NUL> character and cannot be followed by a decimal digit.

注意:……\0表示 字符,不能后跟十进制数字。

#1


2  

This was a bug in libc++'s implementation of <regex>. It should be fixed now in the trunk, and this should propagate to OS X's release code eventually.

这是libc+的 实现中的一个bug。现在应该在主干中修复它,并最终传播到OS X的发布代码中。

Also, here is the excerpt from the ECMA 262 Standard that is the basis for this bug report:

另外,下面是ECMA 262标准的摘录,该标准是这个bug报告的基础:

15.10.2.11 DecimalEscape

15.10.2.11 DecimalEscape

The production DecimalEscape :: DecimalIntegerLiteral [lookahead ∉ DecimalDigit] evaluates as follows:

生产DecimalEscape:DecimalIntegerLiteral[超前∉DecimalDigit)评估如下:

  1. Let i be the MV of DecimalIntegerLiteral.
  2. 我是十进制整数的MV。
  3. If i is zero, return the EscapeValue consisting of a <NUL> character (Unicode value 0000).
  4. 如果i为0,返回包含 字符(Unicode值0000)的EscapeValue。
  5. Return the EscapeValue consisting of the integer i.
  6. 返回包含整数i的EscapeValue。

Note: ... \0 represents the <NUL> character and cannot be followed by a decimal digit.

注意:……\0表示 字符,不能后跟十进制数字。