正则表达式有条件地查找和替换

时间:2022-03-21 13:15:05

I need to replace string A with string B, only when string A is a whole word (e.g. "MECH"), and I don't want to make the replacement when A is a part of a longer string (e.g. "MECHANICAL"). So far, I have a grepl() which checks if string A is a whole string, but I cannot figure out how to make the replacement. I have added an ifelse() with the idea to makes the gsub() replacement when grep() returns TRUE, otherwise not to replace. Any suggestions? Please see the code below. Thanks.

我需要用字符串B替换字符串A,只有当字符串A是整个单词(例如“MECH”)时,我不想在A是较长字符串的一部分时进行替换(例如“MECHANICAL”) 。到目前为止,我有一个grepl(),它检查字符串A是否是整个字符串,但我无法弄清楚如何进行替换。我添加了一个ifelse(),当grep()返回TRUE时,想要更换gsub(),否则不要替换。有什么建议么?请参阅下面的代码。谢谢。

aa <- data.frame(type = c("CONSTR", "MECH CONSTRUCTION", "MECHANICAL CONSTRUCTION MECH", "MECH CONSTR", "MECHCONSTRUCTION"))

from <- c("MECH", "MECHANICAL", "CONSTR",  "CONSTRUCTION")
to <- c("MECHANICAL", "MECHANICAL", "CONSTRUCTION", "CONSTRUCTION")

gsub2 <- function(pattern, replacement, x, ...) {
  for(i in 1:length(pattern)){
    reg <- paste0("(^", pattern[i], "$)|(^", pattern[i], " )|( ", pattern[i], "$)|( ", pattern[i], " )")
    ifelse(grepl(reg, aa$type),
           x <- gsub(pattern[i], replacement[i], x, ...),
           aa$type)
  }
  x
}

aa$title3 <- gsub2(from, to, aa$type)

3 个解决方案

#1


2  

You can enclose the strings in the from vector in \\< and \\> to match only whole words:

您可以将字符串括在\\ <和\\> 中的from向量中,以仅匹配整个单词:

x <- c("CONSTR", "MECH CONSTRUCTION", "MECHANICAL CONSTRUCTION MECH", 
       "MECH CONSTR", "MECHCONSTRUCTION")

from <- c("\\<MECH\\>", "\\<CONSTR\\>")
to <- c("MECHANICAL", "CONSTRUCTION")

for(i in 1:length(from)){
  x <- gsub(from[i], to[i], x)
}

print(x)
# [1] "CONSTRUCTION"                       "MECHANICAL CONSTRUCTION"           
# [3] "MECHANICAL CONSTRUCTION MECHANICAL" "MECHANICAL CONSTRUCTION"           
# [5] "MECHCONSTRUCTION"

#2


0  

I use regex (?<=\W|^)MECH(?=\W|$) to get if inside the string contain whole word MECH like this.

我使用正则表达式(?<= \ W | ^)MECH(?= \ W | $)来获取如果字符串里面包含像这​​样的整个单词MECH。

Is that what you need?

这就是你需要的吗?

#3


0  

Just for posterity, other than using the \< \> enclosure, a whole word can be defined as any string ending in a space or end-of-line (\s|$).

对于后代而言,除了使用\ <\>外壳之外,整个单词可以定义为以空格或行尾(\ s | $)结尾的任何字符串。

gsub("MECH(\\s|$)", "MECHANICAL\\1", aa$type)

The only problem with this approach is that you need to carry over the space or end-of-line that you used as part of the match, hence the encapsulation in parentheses and the backreference (\1).

这种方法的唯一问题是你需要携带你用作匹配的一部分的空间或行尾,因此在括号和后向引用(\ 1)中进行封装。

The \< \> enclosure is superior for this particular question, since you have no special exceptions. However, if you have exceptions, it is better to use a more explicit method. The more tools in your toolbox, the better.

对于这个特定问题,\ <\>附件是优越的,因为您没有特殊的例外。但是,如果您有例外,最好使用更明确的方法。工具箱中的工具越多越好。

#1


2  

You can enclose the strings in the from vector in \\< and \\> to match only whole words:

您可以将字符串括在\\ <和\\> 中的from向量中,以仅匹配整个单词:

x <- c("CONSTR", "MECH CONSTRUCTION", "MECHANICAL CONSTRUCTION MECH", 
       "MECH CONSTR", "MECHCONSTRUCTION")

from <- c("\\<MECH\\>", "\\<CONSTR\\>")
to <- c("MECHANICAL", "CONSTRUCTION")

for(i in 1:length(from)){
  x <- gsub(from[i], to[i], x)
}

print(x)
# [1] "CONSTRUCTION"                       "MECHANICAL CONSTRUCTION"           
# [3] "MECHANICAL CONSTRUCTION MECHANICAL" "MECHANICAL CONSTRUCTION"           
# [5] "MECHCONSTRUCTION"

#2


0  

I use regex (?<=\W|^)MECH(?=\W|$) to get if inside the string contain whole word MECH like this.

我使用正则表达式(?<= \ W | ^)MECH(?= \ W | $)来获取如果字符串里面包含像这​​样的整个单词MECH。

Is that what you need?

这就是你需要的吗?

#3


0  

Just for posterity, other than using the \< \> enclosure, a whole word can be defined as any string ending in a space or end-of-line (\s|$).

对于后代而言,除了使用\ <\>外壳之外,整个单词可以定义为以空格或行尾(\ s | $)结尾的任何字符串。

gsub("MECH(\\s|$)", "MECHANICAL\\1", aa$type)

The only problem with this approach is that you need to carry over the space or end-of-line that you used as part of the match, hence the encapsulation in parentheses and the backreference (\1).

这种方法的唯一问题是你需要携带你用作匹配的一部分的空间或行尾,因此在括号和后向引用(\ 1)中进行封装。

The \< \> enclosure is superior for this particular question, since you have no special exceptions. However, if you have exceptions, it is better to use a more explicit method. The more tools in your toolbox, the better.

对于这个特定问题,\ <\>附件是优越的,因为您没有特殊的例外。但是,如果您有例外,最好使用更明确的方法。工具箱中的工具越多越好。