如何删除R中字符串中特定长度的单词?

时间:2021-09-01 22:19:01

I want to remove words of length less than 3 in a string. for example my input is

我想在字符串中删除长度小于3的单词。例如我的输入是

str<- c("hello RP have a nice day")

I want my output to be

我想要我的输出

str<- c("hello have nice day")

Please help

4 个解决方案

#1


5  

Try this:

gsub('\\b\\w{1,2}\\b','',str)
[1] "hello  have  nice day"

EDIT \b is word boundary. If need to drop extra space,change it as:

编辑\ b是单词边界。如果需要删除额外空间,请将其更改为:

gsub('\\b\\w{1,2}\\s','',str)

Or

gsub('(?<=\\s)(\\w{1,2}\\s)','',str,perl=T)

#2


3  

Or use str_extract_all to extract all words that have length >=3 and paste

或者使用str_extract_all提取长度> = 3的所有单词并粘贴

library(stringr)
paste(str_extract_all(str, '\\w{3,}')[[1]], collapse=' ')
#[1] "hello have nice day"

#3


2  

x <- "hello RP have a nice day"
z <- unlist(strsplit(x, split=" "))
paste(z[nchar(z)>=3], collapse=" ")
# [1] "hello have nice day"

#4


1  

Here's an approach using the rm_nchar_words function from the qdapRegex package that I coauthored with @hwnd (SO regex guru extraordinaire). Here I show removing 1-2 letter words and then 1-3 letter words:

这是一种方法,使用qdapRegex包中的rm_nchar_words函数,我与@hwnd合作(SO regex guru extraordinaire)。在这里,我展示删除1-2个字母单词,然后1-3个字母单词:

str<- c("hello RP have a nice day")

library(qdapTools)

rm_nchar_words(str, "1,2")
## [1] "hello have nice day"

rm_nchar_words(str, "1,3")
## [1] "hello have nice"

As qdapRegex aims to teach here is the regex behind the scene where the S function puts 1,2 into the quantifier curly braces:

正如qdapRegex旨在教导的是场景背后的正则表达式,其中S函数将1,2放入量词大括号中:

S("@rm_nchar_words", "1,2")
##  "(?<![\\w'])(?:'?\\w'?){1,2}(?![\\w'])"

#1


5  

Try this:

gsub('\\b\\w{1,2}\\b','',str)
[1] "hello  have  nice day"

EDIT \b is word boundary. If need to drop extra space,change it as:

编辑\ b是单词边界。如果需要删除额外空间,请将其更改为:

gsub('\\b\\w{1,2}\\s','',str)

Or

gsub('(?<=\\s)(\\w{1,2}\\s)','',str,perl=T)

#2


3  

Or use str_extract_all to extract all words that have length >=3 and paste

或者使用str_extract_all提取长度> = 3的所有单词并粘贴

library(stringr)
paste(str_extract_all(str, '\\w{3,}')[[1]], collapse=' ')
#[1] "hello have nice day"

#3


2  

x <- "hello RP have a nice day"
z <- unlist(strsplit(x, split=" "))
paste(z[nchar(z)>=3], collapse=" ")
# [1] "hello have nice day"

#4


1  

Here's an approach using the rm_nchar_words function from the qdapRegex package that I coauthored with @hwnd (SO regex guru extraordinaire). Here I show removing 1-2 letter words and then 1-3 letter words:

这是一种方法,使用qdapRegex包中的rm_nchar_words函数,我与@hwnd合作(SO regex guru extraordinaire)。在这里,我展示删除1-2个字母单词,然后1-3个字母单词:

str<- c("hello RP have a nice day")

library(qdapTools)

rm_nchar_words(str, "1,2")
## [1] "hello have nice day"

rm_nchar_words(str, "1,3")
## [1] "hello have nice"

As qdapRegex aims to teach here is the regex behind the scene where the S function puts 1,2 into the quantifier curly braces:

正如qdapRegex旨在教导的是场景背后的正则表达式,其中S函数将1,2放入量词大括号中:

S("@rm_nchar_words", "1,2")
##  "(?<![\\w'])(?:'?\\w'?){1,2}(?![\\w'])"