R中特殊字符的条件替换

时间:2022-02-12 15:26:45

I would like to be able to take a dataframe df containing a column df$col that has entries like:

我希望能够获取包含列df $ col的数据帧df,其中包含以下条目:

I?m tired
You?re tired
You?re tired?
Are you tired?
?I am tired

and replace question marks that occur between letters with apostrophes and question marks that occur at the beginnings of strings with nothing:

并替换在带有撇号的字母和出现在字符串开头的问号之间出现的问号:

I'm tired
You're tired
You're tired?
Are you tired?
I am tired

2 个解决方案

#1


2  

I would use a sub for the question marks at the beginning and gsub for the others, because there could be several question marks between words in a string but only one at the beginning.

我会在开头使用sub表示问号,而在其他表单使用gsub,因为字符串中的单词之间可能有几个问号但开头只有一个。

gsub("(\\w)\\?(\\w)", "\\1'\\2", sub("^\\?", "", df$col))
[1] "I'm tired"      "You're tired"   "You're tired?"  "Are you tired?"
[5] "I am tired"   

See https://regex101.com/r/jClVPg/1 for some explanation.

有关说明,请参阅https://regex101.com/r/jClVPg/1。

Some explanation:

  • 1st Capturing Group (\\w):

    第一捕获组(\\ w):

    \\w matches any word character (equal to [a-zA-Z0-9_])

    \\ w匹配任何单词字符(等于[a-zA-Z0-9_])

  • \\? matches the character ? literally (case sensitive)

    \\?匹配角色?字面意思(区分大小写)

  • 2nd Capturing Group (\\w):

    第二捕获组(\\ w):

    \\w matches any word character (equal to [a-zA-Z0-9_])

    \\ w匹配任何单词字符(等于[a-zA-Z0-9_])

#2


0  

We can use sub

我们可以使用sub

df$col <- sub("^'", "", sub("[?](?!$)", "'", df$col, perl = TRUE))
df$col
#[1] "I'm tired"      "You're tired"   "You're tired?"  "Are you tired?" "I am tired"    

Here we assume that there will be a single ? as showed in the example. Otherwise, just replace the inner sub with gsub

在这里我们假设会有一个?如示例中所示。否则,只需用gsub替换内部子

data

df <- structure(list(col = c("I?m tired", "You?re tired", "You?re tired?", 
"Are you tired?", "?I am tired")), .Names = "col", 
 class = "data.frame", row.names = c(NA, -5L))

#1


2  

I would use a sub for the question marks at the beginning and gsub for the others, because there could be several question marks between words in a string but only one at the beginning.

我会在开头使用sub表示问号,而在其他表单使用gsub,因为字符串中的单词之间可能有几个问号但开头只有一个。

gsub("(\\w)\\?(\\w)", "\\1'\\2", sub("^\\?", "", df$col))
[1] "I'm tired"      "You're tired"   "You're tired?"  "Are you tired?"
[5] "I am tired"   

See https://regex101.com/r/jClVPg/1 for some explanation.

有关说明,请参阅https://regex101.com/r/jClVPg/1。

Some explanation:

  • 1st Capturing Group (\\w):

    第一捕获组(\\ w):

    \\w matches any word character (equal to [a-zA-Z0-9_])

    \\ w匹配任何单词字符(等于[a-zA-Z0-9_])

  • \\? matches the character ? literally (case sensitive)

    \\?匹配角色?字面意思(区分大小写)

  • 2nd Capturing Group (\\w):

    第二捕获组(\\ w):

    \\w matches any word character (equal to [a-zA-Z0-9_])

    \\ w匹配任何单词字符(等于[a-zA-Z0-9_])

#2


0  

We can use sub

我们可以使用sub

df$col <- sub("^'", "", sub("[?](?!$)", "'", df$col, perl = TRUE))
df$col
#[1] "I'm tired"      "You're tired"   "You're tired?"  "Are you tired?" "I am tired"    

Here we assume that there will be a single ? as showed in the example. Otherwise, just replace the inner sub with gsub

在这里我们假设会有一个?如示例中所示。否则,只需用gsub替换内部子

data

df <- structure(list(col = c("I?m tired", "You?re tired", "You?re tired?", 
"Are you tired?", "?I am tired")), .Names = "col", 
 class = "data.frame", row.names = c(NA, -5L))