从perl搜索中排除单个模式,并使用单词边界匹配替换

时间:2022-09-13 09:19:36

After asking this perl newbie question, I have a perl newbie follow-up. I have discovered the one case in which using the word boundary fails for the purposes of my application which does this regex search and replace over a set of files:

在问了这个perl新手问题之后,我有一个perl新手后续。我发现有一种情况,在我的应用程序中使用“边界”一词失败,该应用程序对一组文件进行regex搜索和替换:

s/\bcat\b/cat_tastic/g

Which is that I would also like for -cat to not be a match for replacement, although it is currently a match since the hyphen is considered a word boundary. I have read up on word boundaries but what I've learned is that creating a change to boundary conditions when using \b is non-trivial. How do I exclude "-cat" from being searched and replaced? So the end result is:

我也希望-cat不匹配替换,尽管它目前是匹配的,因为连字符被认为是一个单词边界。我已经读过单词边界,但我学到的是,当使用\b时,创建边界条件的更改是非常重要的。如何排除“-cat”被搜索和替换?所以最终的结果是:

:cat { --> :cat_tastic {
:catalog { --> no change
-cat { --> no change

This doesn't have to be part of the one line search and replace, it can also be a condition previous to the search and replace which controls whether the search and replace is executed, although having it in the search and replace would be most useful.

这并不一定是一行搜索和替换的一部分,它也可以是搜索和替换之前的一个条件,该条件控制搜索和替换是否执行,尽管在搜索和替换中使用它将是最有用的。

2 个解决方案

#1


3  

This is not a newbie regexp, but it seems like the best fit for your pattern: Use a "negative lookbehind" expression, to say "I want what I match NOT to follow a hyphen:

这不是一个新手regexp,但它似乎最适合你的模式:使用一个“消极的向后看”的表达,说“我想要我匹配的东西不要跟随连字符:

s/(?<!-)\bcat\b/cat_tastic/g

Addendum: This does the job, but a more general approach (also portable to languages with less fancy regexps) is to split this kind of problem into two: cat after NOT a hyphen, or cat at the start of a string:

附录:这是可行的,但是更一般的方法(也适用于不太花哨的正则表达式的语言)是将这类问题分为两类:连字符后的cat,字符串开头的cat:

s/([^-])\bcat\b|^\bcat\b/\1cat_tastic/g

Or better yet:

或者更好的是:

s/([^-]|^)\bcat\b/\1cat_tastic/g

#2


0  

If the "word boundary" in your case only occurs with "a-z, A-Z, 0123456789, and the underscore and hyphen" as per your comment, you can use a character class:

如果您的情况下的“边界”只出现“a-z, a-z, 0123456789,下划线和连字符”,您可以使用字符类:

s/(?<![\w-])cat(?![\w-])/cat_tastic/g

Word boundary \b occurs where characters matching \w does not border to another \w character. To add hyphen to that, the simplest way is to use a character class like above.

单词边界\b发生在字符匹配\w没有边界到另一个\w字符的地方。要添加连字符,最简单的方法是使用上面所示的字符类。

#1


3  

This is not a newbie regexp, but it seems like the best fit for your pattern: Use a "negative lookbehind" expression, to say "I want what I match NOT to follow a hyphen:

这不是一个新手regexp,但它似乎最适合你的模式:使用一个“消极的向后看”的表达,说“我想要我匹配的东西不要跟随连字符:

s/(?<!-)\bcat\b/cat_tastic/g

Addendum: This does the job, but a more general approach (also portable to languages with less fancy regexps) is to split this kind of problem into two: cat after NOT a hyphen, or cat at the start of a string:

附录:这是可行的,但是更一般的方法(也适用于不太花哨的正则表达式的语言)是将这类问题分为两类:连字符后的cat,字符串开头的cat:

s/([^-])\bcat\b|^\bcat\b/\1cat_tastic/g

Or better yet:

或者更好的是:

s/([^-]|^)\bcat\b/\1cat_tastic/g

#2


0  

If the "word boundary" in your case only occurs with "a-z, A-Z, 0123456789, and the underscore and hyphen" as per your comment, you can use a character class:

如果您的情况下的“边界”只出现“a-z, a-z, 0123456789,下划线和连字符”,您可以使用字符类:

s/(?<![\w-])cat(?![\w-])/cat_tastic/g

Word boundary \b occurs where characters matching \w does not border to another \w character. To add hyphen to that, the simplest way is to use a character class like above.

单词边界\b发生在字符匹配\w没有边界到另一个\w字符的地方。要添加连字符,最简单的方法是使用上面所示的字符类。