在匹配(sed)之后添加花括号

时间:2021-09-23 11:43:15

I'm a beginner with regexes and I'm trying to achieve something relatively simple:

我是regexes的初学者,我正在尝试实现一些相对简单的事情:

I have a dataset arranged like this:

我有这样一个数据集:

1,AAA,aaaa,BBB,bbbbbb ...
2,AAA,aaaaaaa,BBB,bbb ...
3,AAA,aaaaa,BBB,bb ...

I'm looking into adding curly brackets to the strings of various length (alphanumeric chars) following AAA or BBB (these are constant):

我正在研究在AAA或BBB(这些都是常量)之后的不同长度(字母数字字符字符)的字符串中添加花括号:

1,AAA,{aaaa},BBB,{bbbbbb} ...
2,AAA,{aaaaaaa},BBB,{bbb} ...
3,AAA,{aaaaa},BBB,{bb} ...

So I have tried with sed this way:

所以我用sed进行了尝试:

sed 's/(AAA|BBB)[[:punct:]].[[:alnum:]]/\1{&}/g' dataset.txt

However I got this result:

但是我得到的结果是:

1,AAA,{AAA,aa}aa,BBB,{BBB,bb}bbbb, ... 
2,AAA,{AAA,aa}aaaaa,BBB,[BBB,bb}b, ...
3,AAA,{AAA,aa}aaa,BBB,{BBB,bb} ...

Obvisouly, the & in the replace part of sed is going to be the matched pattern, however, I would like & to be only what is after the matched patter, what am I doing wrong?

显然,sed的替换部分将是匹配的模式,但是,我想要的只是在匹配的补丁之后,我做错了什么?

I have also tried adding word boundaries, after [^ ] to no avail. Am I trying too hard with sed? Should I use a language that allows lookbehind instead?

我也试过添加单词边界,在[^]无济于事。我对sed是不是太努力了?我应该使用允许lookbehind的语言吗?

Thanks for any help!

感谢任何帮助!

3 个解决方案

#1


1  

Try this:

试试这个:

sed 's/\(AAA\|BBB\),\([^,]*\)/\1,{\2}/g' dataset.txt

#2


1  

You can always have more than 1 capture groups in your regex, to capture different parts. You can even move the [:punct:] part inside the first capture group:

您可以在regex中拥有超过一个捕获组,以捕获不同的部分。你甚至可以将[:punct:]部分移动到第一个捕获组中:

sed 's/((?:AAA|BBB)[[:punct:]])([[:alnum:]]+)/\1{\3}/g' dataset.txt

I don't understand what that . in between [:punct:] and [:alnum:] was doing. So, I removed it. Because of that, you might have noticed that, the regex was matching the following pattern:

我不明白那是什么。在[:punct:]和[:alnum:]之间。所以我删除了它。因此,您可能已经注意到,regex与以下模式匹配:

{AAA,aa}
{BBB,bb}

i.e, it was matching just 2 characters after AAA and BBB. One for . and one for [[:alnum:]].

我。e,在AAA和BBB之后只匹配两个字符。一。和[[:alnum:]]。

To match all the alphanumeric characters after , till the next , you need to use quantifier: [[:alnum:]]+

要匹配所有字母数字字符,直到下一个,您需要使用量词:[[:alnum:]]+。

#3


1  

Following sed should work.

sed后应该工作。

On Linux:

在Linux上:

sed -i.bak -r 's/((AAA|BBB)[[:punct:]])([[:alnum:]]+)/\1{\3}/g'

OR on OSX:

或OSX上:

sed -i.bak -E 's/((AAA|BBB)[[:punct:]])([[:alnum:]]+)/\1{\3}/g'

-i is for inline option to save changes in the input file itself.

-我是内联选项,以保存输入文件本身的变化。

#1


1  

Try this:

试试这个:

sed 's/\(AAA\|BBB\),\([^,]*\)/\1,{\2}/g' dataset.txt

#2


1  

You can always have more than 1 capture groups in your regex, to capture different parts. You can even move the [:punct:] part inside the first capture group:

您可以在regex中拥有超过一个捕获组,以捕获不同的部分。你甚至可以将[:punct:]部分移动到第一个捕获组中:

sed 's/((?:AAA|BBB)[[:punct:]])([[:alnum:]]+)/\1{\3}/g' dataset.txt

I don't understand what that . in between [:punct:] and [:alnum:] was doing. So, I removed it. Because of that, you might have noticed that, the regex was matching the following pattern:

我不明白那是什么。在[:punct:]和[:alnum:]之间。所以我删除了它。因此,您可能已经注意到,regex与以下模式匹配:

{AAA,aa}
{BBB,bb}

i.e, it was matching just 2 characters after AAA and BBB. One for . and one for [[:alnum:]].

我。e,在AAA和BBB之后只匹配两个字符。一。和[[:alnum:]]。

To match all the alphanumeric characters after , till the next , you need to use quantifier: [[:alnum:]]+

要匹配所有字母数字字符,直到下一个,您需要使用量词:[[:alnum:]]+。

#3


1  

Following sed should work.

sed后应该工作。

On Linux:

在Linux上:

sed -i.bak -r 's/((AAA|BBB)[[:punct:]])([[:alnum:]]+)/\1{\3}/g'

OR on OSX:

或OSX上:

sed -i.bak -E 's/((AAA|BBB)[[:punct:]])([[:alnum:]]+)/\1{\3}/g'

-i is for inline option to save changes in the input file itself.

-我是内联选项,以保存输入文件本身的变化。