用gsub替换多个字母

时间:2022-07-22 16:49:01

of course I could replace specific arguments like this:

当然,我可以替换如下的具体论点:

    mydata=c("á","é","ó")
    mydata=gsub("á","a",mydata)
    mydata=gsub("é","e",mydata)
    mydata=gsub("ó","o",mydata)
    mydata

but surely there is a easier way to do this all in onle line, right? I dont find the gsub help to be very comprehensive on this.

但是肯定有一种更简单的方法可以在onle线中完成,对吧?我发现gsub并没有提供全面的帮助。

10 个解决方案

#1


74  

Use the character translation function

使用字符翻译功能

chartr("áéó", "aeo", mydata)

#2


30  

An interesting question! I think the simplest option is to devise a special function, something like a "multi" gsub():

一个有趣的问题!我认为最简单的选择是设计一个特殊的函数,比如“multi”gsub():

mgsub <- function(pattern, replacement, x, ...) {
  if (length(pattern)!=length(replacement)) {
    stop("pattern and replacement do not have the same length.")
  }
  result <- x
  for (i in 1:length(pattern)) {
    result <- gsub(pattern[i], replacement[i], result, ...)
  }
  result
}

Which gives me:

给我:

> mydata <- c("á","é","ó")
> mgsub(c("á","é","ó"), c("a","e","o"), mydata)
[1] "a" "e" "o"

#3


21  

Maybe this can be usefull:

也许这是有用的:

iconv('áéóÁÉÓçã', to="ASCII//TRANSLIT")
[1] "aeoAEOca"

#4


8  

You can use stringi package to replace these characters.

您可以使用stringi包替换这些字符。

> stri_trans_general(c("á","é","ó"), "latin-ascii")

[1] "a" "e" "o"

#5


7  

Another mgsub implementation using Reduce

使用Reduce的另一个mgsub实现。

mystring = 'This is good'
myrepl = list(c('o', 'a'), c('i', 'n'))

mgsub2 <- function(myrepl, mystring){
  gsub2 <- function(l, x){
   do.call('gsub', list(x = x, pattern = l[1], replacement = l[2]))
  }
  Reduce(gsub2, myrepl, init = mystring, right = T) 
}

#6


6  

A problem with some of the implementations above (e.g., Theodore Lytras's) is that if the patterns are multiple characters, they may conflict in the case that one pattern is a substring of another. A way to solve this is to create a copy of the object and perform the pattern replacement in that copy. This is implemented in my package bayesbio, available on CRAN.

上面的一些实现(例如Theodore Lytras的)的一个问题是,如果模式是多个字符,那么当一个模式是另一个模式的子字符串时,它们可能会发生冲突。解决这个问题的方法是创建一个对象的副本,并在该副本中执行模式替换。这是在我的包bayesbio中实现的,在CRAN上可用。

mgsub <- function(pattern, replacement, x, ...) {
  n = length(pattern)
  if (n != length(replacement)) {
    stop("pattern and replacement do not have the same length.")
  }
  result = x
  for (i in 1:n) {
    result[grep(pattern[i], x, ...)] = replacement[i]
  }
  return(result)
}

Here is a test case:

这里有一个测试用例:

  asdf = c(4, 0, 1, 1, 3, 0, 2, 0, 1, 1)

  res = mgsub(c("0", "1", "2"), c("10", "11", "12"), asdf)

#7


5  

This is very similar to @kith, but in function form, and with the most common diacritcs cases:

这与@kith非常相似,但在功能上,和最常见的diacritcs案例:

removeDiscritics <- function(string) {
  chartr(
     "ŠŽšžŸÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖÙÚÛÜÝàáâãäåçèéêëìíîïðñòóôõöùúûüýÿ"
    ,"SZszYAAAAAACEEEEIIIIDNOOOOOUUUUYaaaaaaceeeeiiiidnooooouuuuyy"
    , string
  )
}


removeDiscritics("test áéíóú")

"test aeiou"

“测试”五个母音字母

#8


3  

Not so elegant, but it works and does what you want

不是很优雅,但它能工作,做你想做的。

> diag(sapply(1:length(mydata), function(i, x, y) {
+   gsub(x[i],y[i], x=x)
+ }, x=mydata, y=c('a', 'b', 'c')))
[1] "a" "b" "c"

#9


1  

You can use the match function. Here match(x, y) returns the index of y where the element of x is matched. Then you can use the returned indices, to subset another vector (say z) that contains the replacements for the values of x, appropriately matched with y. In your case:

您可以使用match函数。在这里match(x, y)返回y的索引,其中x的元素是匹配的。然后,您可以使用返回的索引,将另一个包含与x值相匹配的替换值的向量(比如z)子集化。

mydata <- c("á","é","ó")
desired <- c('a', 'e', 'o')

desired[match(mydata, mydata)]

In a simpler example, consider the situation below, where I was trying to substitute a for 'alpha', 'b' for 'beta' and so forth.

在一个简单的例子中,考虑下面的情况,我试图用a代替alpha,用b代替beta,等等。

x <- c('a', 'a', 'b', 'c', 'b', 'c', 'e', 'e', 'd')

y <- c('a', 'b', 'c', 'd', 'e')
z <- c('alpha', 'beta', 'gamma', 'delta', 'epsilon')

z[match(x, y)]

#10


1  

Related to Justin's answer:

与贾斯汀的回答:

> m <- c("á"="a", "é"="e", "ó"="o")
> m[mydata]
  á   é   ó 
"a" "e" "o" 

And you can get rid of the names with names(*) <- NULL if you want.

如果你想的话,你可以去掉名字(*)<- NULL。

#1


74  

Use the character translation function

使用字符翻译功能

chartr("áéó", "aeo", mydata)

#2


30  

An interesting question! I think the simplest option is to devise a special function, something like a "multi" gsub():

一个有趣的问题!我认为最简单的选择是设计一个特殊的函数,比如“multi”gsub():

mgsub <- function(pattern, replacement, x, ...) {
  if (length(pattern)!=length(replacement)) {
    stop("pattern and replacement do not have the same length.")
  }
  result <- x
  for (i in 1:length(pattern)) {
    result <- gsub(pattern[i], replacement[i], result, ...)
  }
  result
}

Which gives me:

给我:

> mydata <- c("á","é","ó")
> mgsub(c("á","é","ó"), c("a","e","o"), mydata)
[1] "a" "e" "o"

#3


21  

Maybe this can be usefull:

也许这是有用的:

iconv('áéóÁÉÓçã', to="ASCII//TRANSLIT")
[1] "aeoAEOca"

#4


8  

You can use stringi package to replace these characters.

您可以使用stringi包替换这些字符。

> stri_trans_general(c("á","é","ó"), "latin-ascii")

[1] "a" "e" "o"

#5


7  

Another mgsub implementation using Reduce

使用Reduce的另一个mgsub实现。

mystring = 'This is good'
myrepl = list(c('o', 'a'), c('i', 'n'))

mgsub2 <- function(myrepl, mystring){
  gsub2 <- function(l, x){
   do.call('gsub', list(x = x, pattern = l[1], replacement = l[2]))
  }
  Reduce(gsub2, myrepl, init = mystring, right = T) 
}

#6


6  

A problem with some of the implementations above (e.g., Theodore Lytras's) is that if the patterns are multiple characters, they may conflict in the case that one pattern is a substring of another. A way to solve this is to create a copy of the object and perform the pattern replacement in that copy. This is implemented in my package bayesbio, available on CRAN.

上面的一些实现(例如Theodore Lytras的)的一个问题是,如果模式是多个字符,那么当一个模式是另一个模式的子字符串时,它们可能会发生冲突。解决这个问题的方法是创建一个对象的副本,并在该副本中执行模式替换。这是在我的包bayesbio中实现的,在CRAN上可用。

mgsub <- function(pattern, replacement, x, ...) {
  n = length(pattern)
  if (n != length(replacement)) {
    stop("pattern and replacement do not have the same length.")
  }
  result = x
  for (i in 1:n) {
    result[grep(pattern[i], x, ...)] = replacement[i]
  }
  return(result)
}

Here is a test case:

这里有一个测试用例:

  asdf = c(4, 0, 1, 1, 3, 0, 2, 0, 1, 1)

  res = mgsub(c("0", "1", "2"), c("10", "11", "12"), asdf)

#7


5  

This is very similar to @kith, but in function form, and with the most common diacritcs cases:

这与@kith非常相似,但在功能上,和最常见的diacritcs案例:

removeDiscritics <- function(string) {
  chartr(
     "ŠŽšžŸÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖÙÚÛÜÝàáâãäåçèéêëìíîïðñòóôõöùúûüýÿ"
    ,"SZszYAAAAAACEEEEIIIIDNOOOOOUUUUYaaaaaaceeeeiiiidnooooouuuuyy"
    , string
  )
}


removeDiscritics("test áéíóú")

"test aeiou"

“测试”五个母音字母

#8


3  

Not so elegant, but it works and does what you want

不是很优雅,但它能工作,做你想做的。

> diag(sapply(1:length(mydata), function(i, x, y) {
+   gsub(x[i],y[i], x=x)
+ }, x=mydata, y=c('a', 'b', 'c')))
[1] "a" "b" "c"

#9


1  

You can use the match function. Here match(x, y) returns the index of y where the element of x is matched. Then you can use the returned indices, to subset another vector (say z) that contains the replacements for the values of x, appropriately matched with y. In your case:

您可以使用match函数。在这里match(x, y)返回y的索引,其中x的元素是匹配的。然后,您可以使用返回的索引,将另一个包含与x值相匹配的替换值的向量(比如z)子集化。

mydata <- c("á","é","ó")
desired <- c('a', 'e', 'o')

desired[match(mydata, mydata)]

In a simpler example, consider the situation below, where I was trying to substitute a for 'alpha', 'b' for 'beta' and so forth.

在一个简单的例子中,考虑下面的情况,我试图用a代替alpha,用b代替beta,等等。

x <- c('a', 'a', 'b', 'c', 'b', 'c', 'e', 'e', 'd')

y <- c('a', 'b', 'c', 'd', 'e')
z <- c('alpha', 'beta', 'gamma', 'delta', 'epsilon')

z[match(x, y)]

#10


1  

Related to Justin's answer:

与贾斯汀的回答:

> m <- c("á"="a", "é"="e", "ó"="o")
> m[mydata]
  á   é   ó 
"a" "e" "o" 

And you can get rid of the names with names(*) <- NULL if you want.

如果你想的话,你可以去掉名字(*)<- NULL。