为什么c++ 11支持6种不同的正则表达式语法?

时间:2022-11-08 17:26:00

It appears that C++11 supports a whopping six different regular expression grammars:

似乎c++ 11支持多达六种不同的正则表达式语法:

  • ECMA-262 (ECMAScript) regular expressions (slightly modified?)
  • ECMA-262 (ECMAScript)正则表达式(稍作修改?)
  • Basic POSIX regular expressions
  • 基本的POSIX正则表达式
  • Extended POSIX regular expressions
  • 扩展POSIX正则表达式
  • awk regular expressions
  • awk正则表达式
  • grep regular expressions
  • grep正则表达式
  • egrep regular expressions
  • egrep正则表达式

Why was it decided to include so many options instead of settling on a single grammar? Why these particular 6?

为什么它决定包含这么多选项而不是只考虑一个语法呢?为什么这些特定6 ?

3 个解决方案

#1


13  

The standardization process is all about pragmatism. There are benefits to including a RE grammar in the standard, as long as it's correctly specified, but no benefit to dropping one.

标准化过程都是实用主义的。在标准中包含regrammar是有好处的,只要它是正确指定的,但是去掉一个语法没有好处。

Exclusion would make it easier for a library implementer to apply a "100% C++11 compliant" badge, but who really cares? Nobody should be making that claim anyway, and only ignorant PHBs would be looking for it. Libraries always have bugs which prevent reaching 100%, and a good library has an excess of features.

排除将使库实现人员更容易应用“100% c++ 11兼容”的标记,但谁真正关心呢?无论如何,谁都不应该这么说,只有无知的博士才会去找它。库中总是有一些bug,这些bug阻止了100%的使用,一个好的库有很多特性。

Note that all the included grammars are specified by already existing international standards. So little effort is needed on the part of the C++ committee. Just §28.13, which is a couple pages long.

请注意,所有包含的语法都是由已经存在的国际标准指定的。因此,c++委员会不需要做什么努力。§28.13,这是几页。

If they leave out a standardized grammar, then different Standard Library implementers will add it under different names, resulting in incompatibility. This is unlikely to happen for a grammar which is merely defined by a popular library, where the library implementer will be responsible for the C++ interface, not Standard Library vendors.

如果省略了标准化语法,那么不同的标准库实现者将在不同的名称下添加它,从而导致不兼容性。这种情况不太可能发生在仅仅由一个流行的库定义的语法中,在这里,库实现者负责c++接口,而不是标准的库供应商。

#2


4  

This is covered by the TR1 proposal. I will attempt to summarize.

TR1提案涵盖了这一点。我试着总结一下。

It seemed prudent to build on an existing standard rather than to strike out on their own.

在现有的标准上建立起来,而不是自己动手,似乎是明智的。

Two existing standards that they could build upon were identified: POSIX REs and ECMAScript REs. Perl REs were left out because they aren’t standardized. (Which reasonable people could disagree with.) Also, ECMAScript REs were seen as an simpler subset of Perl REs which covers the most useful (or perhaps most used) features.

他们可以构建的两个现有标准被标识为:POSIX REs和ECMAScript REs. Perl REs,因为它们没有标准化。(理性的人可能不同意。)此外,ECMAScript被视为Perl REs的一个更简单的子集,它涵盖了最有用(或者可能是最常用的)特性。

Of the two, POSIX REs’ “leftmost longest” implementation did not play well with important features, like non-greedy repeats, and was at odds with how most RE engines work these days.

在两者中,POSIX REs的“最左最长”实现不能很好地处理重要的特性,比如非贪婪的重复,并且与最近大多数RE引擎的工作方式有冲突。

On the other hand, ECMAScript REs lacked the localization support of POSIX REs. So, they extended ECMAScript REs to include POSIX-RE—style localization support.

另一方面,ECMAScript缺乏POSIX REs的本地化支持,所以他们扩展了ECMAScript,包括POSIX- re风格的本地化支持。

POSIX RE support was included as optional since it’s behavior is different enough from ECMAScript REs to justify it being an standard option. The POSIX standard comes with two grammars: Basic and extended. The awk, grep, and egrep REs are all just trivial variations to the basic or extended POSIX grammars rather than truly separate grammars.

POSIX支持是可选的,因为它的行为与ECMAScript不同,足以证明它是一个标准选项。POSIX标准有两种语法:Basic和extended。awk、grep和白鹭都是基本或扩展POSIX语法的琐碎变体,而不是真正的分离语法。

So: Two standards, three grammars, six variations.

两种标准,三种语法,六种变体。

#3


0  

I think bacause C++ is a multiplatform language. It has produced programs on a variety of programs. And most user expect the program to follow the conventions of for instance the OS.

我认为c++是一种多平台语言。它制作了各种各样的节目。大多数用户希望程序遵循操作系统的惯例。

To solve these problems there are two solutions:

解决这些问题有两个解决方案:

  • Make an API for any of those
  • 为其中任何一个创建一个API
  • Include all popular standards in the language
  • 在语言中包含所有流行的标准。

The second is more elegant because if you change the interface of one API, compatibility problems occur.

第二个更优雅,因为如果更改一个API的接口,就会出现兼容性问题。

For instance POSIX is a Unix standard. Several customers for instance the military ask software companies to make their programs POSIX compatible. There is a story that Microsoft worked several months to change Windows into a POSIX compatible Operating System, only to be able to sell it to the navy.

例如POSIX是Unix标准。例如,军方的一些客户要求软件公司让他们的程序POSIX兼容。有一个故事说,微软花了几个月的时间把Windows改成POSIX兼容的操作系统,结果却卖给了海军。

#1


13  

The standardization process is all about pragmatism. There are benefits to including a RE grammar in the standard, as long as it's correctly specified, but no benefit to dropping one.

标准化过程都是实用主义的。在标准中包含regrammar是有好处的,只要它是正确指定的,但是去掉一个语法没有好处。

Exclusion would make it easier for a library implementer to apply a "100% C++11 compliant" badge, but who really cares? Nobody should be making that claim anyway, and only ignorant PHBs would be looking for it. Libraries always have bugs which prevent reaching 100%, and a good library has an excess of features.

排除将使库实现人员更容易应用“100% c++ 11兼容”的标记,但谁真正关心呢?无论如何,谁都不应该这么说,只有无知的博士才会去找它。库中总是有一些bug,这些bug阻止了100%的使用,一个好的库有很多特性。

Note that all the included grammars are specified by already existing international standards. So little effort is needed on the part of the C++ committee. Just §28.13, which is a couple pages long.

请注意,所有包含的语法都是由已经存在的国际标准指定的。因此,c++委员会不需要做什么努力。§28.13,这是几页。

If they leave out a standardized grammar, then different Standard Library implementers will add it under different names, resulting in incompatibility. This is unlikely to happen for a grammar which is merely defined by a popular library, where the library implementer will be responsible for the C++ interface, not Standard Library vendors.

如果省略了标准化语法,那么不同的标准库实现者将在不同的名称下添加它,从而导致不兼容性。这种情况不太可能发生在仅仅由一个流行的库定义的语法中,在这里,库实现者负责c++接口,而不是标准的库供应商。

#2


4  

This is covered by the TR1 proposal. I will attempt to summarize.

TR1提案涵盖了这一点。我试着总结一下。

It seemed prudent to build on an existing standard rather than to strike out on their own.

在现有的标准上建立起来,而不是自己动手,似乎是明智的。

Two existing standards that they could build upon were identified: POSIX REs and ECMAScript REs. Perl REs were left out because they aren’t standardized. (Which reasonable people could disagree with.) Also, ECMAScript REs were seen as an simpler subset of Perl REs which covers the most useful (or perhaps most used) features.

他们可以构建的两个现有标准被标识为:POSIX REs和ECMAScript REs. Perl REs,因为它们没有标准化。(理性的人可能不同意。)此外,ECMAScript被视为Perl REs的一个更简单的子集,它涵盖了最有用(或者可能是最常用的)特性。

Of the two, POSIX REs’ “leftmost longest” implementation did not play well with important features, like non-greedy repeats, and was at odds with how most RE engines work these days.

在两者中,POSIX REs的“最左最长”实现不能很好地处理重要的特性,比如非贪婪的重复,并且与最近大多数RE引擎的工作方式有冲突。

On the other hand, ECMAScript REs lacked the localization support of POSIX REs. So, they extended ECMAScript REs to include POSIX-RE—style localization support.

另一方面,ECMAScript缺乏POSIX REs的本地化支持,所以他们扩展了ECMAScript,包括POSIX- re风格的本地化支持。

POSIX RE support was included as optional since it’s behavior is different enough from ECMAScript REs to justify it being an standard option. The POSIX standard comes with two grammars: Basic and extended. The awk, grep, and egrep REs are all just trivial variations to the basic or extended POSIX grammars rather than truly separate grammars.

POSIX支持是可选的,因为它的行为与ECMAScript不同,足以证明它是一个标准选项。POSIX标准有两种语法:Basic和extended。awk、grep和白鹭都是基本或扩展POSIX语法的琐碎变体,而不是真正的分离语法。

So: Two standards, three grammars, six variations.

两种标准,三种语法,六种变体。

#3


0  

I think bacause C++ is a multiplatform language. It has produced programs on a variety of programs. And most user expect the program to follow the conventions of for instance the OS.

我认为c++是一种多平台语言。它制作了各种各样的节目。大多数用户希望程序遵循操作系统的惯例。

To solve these problems there are two solutions:

解决这些问题有两个解决方案:

  • Make an API for any of those
  • 为其中任何一个创建一个API
  • Include all popular standards in the language
  • 在语言中包含所有流行的标准。

The second is more elegant because if you change the interface of one API, compatibility problems occur.

第二个更优雅,因为如果更改一个API的接口,就会出现兼容性问题。

For instance POSIX is a Unix standard. Several customers for instance the military ask software companies to make their programs POSIX compatible. There is a story that Microsoft worked several months to change Windows into a POSIX compatible Operating System, only to be able to sell it to the navy.

例如POSIX是Unix标准。例如,军方的一些客户要求软件公司让他们的程序POSIX兼容。有一个故事说,微软花了几个月的时间把Windows改成POSIX兼容的操作系统,结果却卖给了海军。