有条件地用字符串替换正则表达式匹配

时间:2021-08-19 04:57:11

I am trying to replace certain patterns in a string with different replacement patters.

我试图用不同的替换模式替换字符串中的某些模式。

Example:

string test = "test replacing \"these characters\"";

What I want to do is replace all ' ' with '_' and all other non letter or number characters with an empty string. I have the following regex created and it seems to tokenize correctly, but I am not sure how to (if possible) perform a conditional replace using regex_replace.

我想要做的是用空字符串替换所有''和'_'以及所有其他非字母或数字字符。我创建了以下正则表达式,它似乎正确地标记,但我不确定如何(如果可能)使用regex_replace执行条件替换。

string test = "test replacing \"these characters\"";regex reg("(\\s+)|(\\W+)");

expected result after replace would be:

替换后的预期结果是:

string result = "test_replacing_these_characters";

EDIT:I cannot use boost, which is why I left it out of the tags. So please no answer that includes boost. I have to do this with the standard library. It may be that a different regex would accomplish the goal or that I am just stuck doing two passes.

编辑:我不能使用提升,这就是我把它从标签中删除的原因。所以请不要回答包括提升。我必须使用标准库。可能是一个不同的正则表达式会完成目标,或者我只是坚持做两次传球。

EDIT2:I did not remember what characters were included in \w at the time of my original regex, after looking it up I have further simplified the expression. Again the goal is anything matching \s+ should be replaced with '_' and anything matching \W+ should be replaced with empty string.

编辑2:我不记得在我的原始正则表达式时,\ w中包含了哪些字符,在查找之后我进一步简化了表达式。再一次,目标是任何匹配\ s +应该用'_'替换,任何匹配\ W +的东西都应该用空字符串替换。

1 个解决方案

#1


The c++ (0x, 11, tr1) regular expressions do not really work (*) in every case (look up the phrase regex on this page for gcc), so it is better to use boost for a while.

c ++(0x,11,tr1)正则表达式在每种情况下都不起作用(*)(在gcc的这个页面上查找短语regex),所以最好使用boost一段时间。

You may try if your compiler supports the regular expressions needed:

如果您的编译器支持所需的正则表达式,您可以尝试:

#include <string>#include <iostream>#include <regex>using namespace std;int main(int argc, char * argv[]) {    string test = "test replacing \"these characters\"";    regex reg("[^\\w]+");    test = regex_replace(test, reg, "_");    cout << test << endl;}

The above works in Visual Studio 2012Rc.

以上工作在Visual Studio 2012Rc中。

Edit 1: To replace by two different strings in one pass (depending on the match), I'd think this won't work here. In Perl, this could easily be done within evaluated replacement expressions (/e switch).

编辑1:要在一次传递中替换两个不同的字符串(取决于匹配),我认为这在这里不起作用。在Perl中,这可以在评估的替换表达式(/ e开关)中轻松完成。

Therefore, you'll need two passes, as you already suspected:

因此,您需要两次通过,正如您已经怀疑的那样:

 ... string test = "test replacing \"these characters\""; test = regex_replace(test, regex("\\s+"), "_"); test = regex_replace(test, regex("\\W+"), ""); ...

Edit 2:

If it would be possible to use a callback function tr() in regex_replace, then you could modify the substitution there, like:

如果可以在regex_replace中使用回调函数tr(),那么你可以修改那里的替换,如:

 string output = regex_replace(test, regex("\\s+|\\W+"), tr);

with tr() doing the replacement work:

用tr()做替换工作:

 string tr(const smatch &m) { return m[0].str()[0] == ' ' ? "_" : ""; }

the problem would have been solved. Unfortunately, there's no such overload in some C++11 regex implementations, but Boost has one. The following would work with boost and use one pass:

这个问题已经解决了。不幸的是,在一些C ++ 11正则表达式实现中没有这样的重载,但是Boost有一个。以下内容适用于boost并使用一次传递:

...#include <boost/regex.hpp>using namespace boost;...string tr(const smatch &m) { return m[0].str()[0] == ' ' ? "_" : ""; }...string test = "test replacing \"these characters\"";test = regex_replace(test, regex("\\s+|\\W+"), tr);   // <= works in Boost...

Maybe some day this will work with C++11 or whatever number comes next.

也许有一天,这将适用于C ++ 11或接下来的任何数字。

Regards

rbo

#1


The c++ (0x, 11, tr1) regular expressions do not really work (*) in every case (look up the phrase regex on this page for gcc), so it is better to use boost for a while.

c ++(0x,11,tr1)正则表达式在每种情况下都不起作用(*)(在gcc的这个页面上查找短语regex),所以最好使用boost一段时间。

You may try if your compiler supports the regular expressions needed:

如果您的编译器支持所需的正则表达式,您可以尝试:

#include <string>#include <iostream>#include <regex>using namespace std;int main(int argc, char * argv[]) {    string test = "test replacing \"these characters\"";    regex reg("[^\\w]+");    test = regex_replace(test, reg, "_");    cout << test << endl;}

The above works in Visual Studio 2012Rc.

以上工作在Visual Studio 2012Rc中。

Edit 1: To replace by two different strings in one pass (depending on the match), I'd think this won't work here. In Perl, this could easily be done within evaluated replacement expressions (/e switch).

编辑1:要在一次传递中替换两个不同的字符串(取决于匹配),我认为这在这里不起作用。在Perl中,这可以在评估的替换表达式(/ e开关)中轻松完成。

Therefore, you'll need two passes, as you already suspected:

因此,您需要两次通过,正如您已经怀疑的那样:

 ... string test = "test replacing \"these characters\""; test = regex_replace(test, regex("\\s+"), "_"); test = regex_replace(test, regex("\\W+"), ""); ...

Edit 2:

If it would be possible to use a callback function tr() in regex_replace, then you could modify the substitution there, like:

如果可以在regex_replace中使用回调函数tr(),那么你可以修改那里的替换,如:

 string output = regex_replace(test, regex("\\s+|\\W+"), tr);

with tr() doing the replacement work:

用tr()做替换工作:

 string tr(const smatch &m) { return m[0].str()[0] == ' ' ? "_" : ""; }

the problem would have been solved. Unfortunately, there's no such overload in some C++11 regex implementations, but Boost has one. The following would work with boost and use one pass:

这个问题已经解决了。不幸的是,在一些C ++ 11正则表达式实现中没有这样的重载,但是Boost有一个。以下内容适用于boost并使用一次传递:

...#include <boost/regex.hpp>using namespace boost;...string tr(const smatch &m) { return m[0].str()[0] == ' ' ? "_" : ""; }...string test = "test replacing \"these characters\"";test = regex_replace(test, regex("\\s+|\\W+"), tr);   // <= works in Boost...

Maybe some day this will work with C++11 or whatever number comes next.

也许有一天,这将适用于C ++ 11或接下来的任何数字。

Regards

rbo