从R中删除字符串中的所有特殊字符?

时间:2022-08-22 12:59:37

How to remove all special characters in a given string in R and replace each special character with space ?

如何删除给定字符串中的所有特殊字符,并用空格替换每个特殊字符?

The special characters to remove are : ~!@#$%^&*(){}_+:"<>?,./;'[]-=

删除特殊字符是:~ ! @ # $ % ^ & *(){ } _ +:“< > ?,。/;[]- =

regex [:punct:] is going to make half of the job.

regex [:punct:]将完成一半的工作。

Question_2 : But how to remove for example these characters from foreign languages : â í ü Â á ą ę ś ć ?

Question_2:但是如何删除从外语例如这些人物:一个我uąęść吗?

Answer_2 : Replace [^[:alnum:]] with [^a-zA-Z0-9] in regex or regexpr:
regex[^a-zA-Z0-9]

Answer_2:取代[^[:alnum:]]与[^ a-zA-Z0-9]在正则表达式或regexpr:正则表达式[^ a-zA-Z0-9]

3 个解决方案

#1


120  

You need to use regular expressions to identify the unwanted characters. For the most easily readable code, you want the str_replace_all from the stringr package, though gsub from base R works just as well.

您需要使用正则表达式来识别不需要的字符。对于最容易读懂的代码,您想要str_replace_all从stringr包中获得,尽管来自base R的gsub也可以工作。

The exact regular expression depends upon what you are trying to do. You could just remove those specific characters that you gave in the question, but it's much easier to remove all punctuation characters.

确切的正则表达式取决于您正在尝试做什么。您可以删除问题中的特定字符,但是删除所有的标点符号要容易得多。

x <- "a1~!@#$%^&*(){}_+:\"<>?,./;'[]-=" #or whatever
str_replace_all(x, "[[:punct:]]", " ")

(The base R equivalent is gsub("[[:punct:]]", " ", x).)

(基本R等效gsub(“[[:punct:]]”,“”,x))。

An alternative is to swap out all non-alphanumeric characters.

另一种选择是替换掉所有非字母数字字符。

str_replace_all(x, "[^[:alnum:]]", " ")

Note that the definition of what constitutes a letter or a number or a punctuatution mark varies slightly depending upon your locale, so you may need to experiment a little to get exactly what you want.

请注意,字母、数字或标点符号的定义会因语言环境的不同而略有不同,因此您可能需要进行一些实验,以得到您想要的结果。

#2


7  

Instead of using regex to remove those "crazy" characters, just convert them to ASCII, which will remove accents, bu keeping the letters.

与其使用regex删除那些“疯狂的”字符,不如将它们转换为ASCII,这将删除重音,但是保留字母。

astr <- "Ábcdêãçoàúü"
iconv(astr, to = "ASCII//TRANSLIT")

which results in

这将导致

[1] "Abcdeacoauu"

#3


0  

Convert the Special characters to apostrophe,

将特殊字符转换为撇号,

Data  <- gsub("[^0-9A-Za-z///' ]","'" , Data ,ignore.case = TRUE)

Below code it to remove extra ''' apostrophe

下面的代码可以删除额外的“撇号”。

Data <- gsub("''","" , Data ,ignore.case = TRUE)

Use gsub(..) function for replacing the special character with apostrophe

使用gsub(..)函数用撇号替换特殊字符

#1


120  

You need to use regular expressions to identify the unwanted characters. For the most easily readable code, you want the str_replace_all from the stringr package, though gsub from base R works just as well.

您需要使用正则表达式来识别不需要的字符。对于最容易读懂的代码,您想要str_replace_all从stringr包中获得,尽管来自base R的gsub也可以工作。

The exact regular expression depends upon what you are trying to do. You could just remove those specific characters that you gave in the question, but it's much easier to remove all punctuation characters.

确切的正则表达式取决于您正在尝试做什么。您可以删除问题中的特定字符,但是删除所有的标点符号要容易得多。

x <- "a1~!@#$%^&*(){}_+:\"<>?,./;'[]-=" #or whatever
str_replace_all(x, "[[:punct:]]", " ")

(The base R equivalent is gsub("[[:punct:]]", " ", x).)

(基本R等效gsub(“[[:punct:]]”,“”,x))。

An alternative is to swap out all non-alphanumeric characters.

另一种选择是替换掉所有非字母数字字符。

str_replace_all(x, "[^[:alnum:]]", " ")

Note that the definition of what constitutes a letter or a number or a punctuatution mark varies slightly depending upon your locale, so you may need to experiment a little to get exactly what you want.

请注意,字母、数字或标点符号的定义会因语言环境的不同而略有不同,因此您可能需要进行一些实验,以得到您想要的结果。

#2


7  

Instead of using regex to remove those "crazy" characters, just convert them to ASCII, which will remove accents, bu keeping the letters.

与其使用regex删除那些“疯狂的”字符,不如将它们转换为ASCII,这将删除重音,但是保留字母。

astr <- "Ábcdêãçoàúü"
iconv(astr, to = "ASCII//TRANSLIT")

which results in

这将导致

[1] "Abcdeacoauu"

#3


0  

Convert the Special characters to apostrophe,

将特殊字符转换为撇号,

Data  <- gsub("[^0-9A-Za-z///' ]","'" , Data ,ignore.case = TRUE)

Below code it to remove extra ''' apostrophe

下面的代码可以删除额外的“撇号”。

Data <- gsub("''","" , Data ,ignore.case = TRUE)

Use gsub(..) function for replacing the special character with apostrophe

使用gsub(..)函数用撇号替换特殊字符