I have words that include numbers within, or begin with or end with numbers. How do i extract those only.
我的单词包括数字,或以数字开头或结尾。我该如何仅提取它们。
s <- c("An ex4mple". "anothe 3xample" "A thir7", "And sentences w1th w0rds as w3ll")
Expected output:
c("ex4mple", "3xample", "thir7", "w1th w0rds w3ll")
Words could include more than one number.
单词可以包含多个数字。
1 个解决方案
#1
2
We can split the strings by space into a list
, loop through the elements with sapply
, then match all words that have only letters from start (^
) to end ($
), specify invert=TRUE
with value=TRUE
to get those elements that don't fit the criteria, paste
them together
我们可以按空格将字符串拆分为一个列表,使用sapply循环遍历元素,然后匹配所有只包含从start(^)到end($)的字母的单词,指定invert = TRUE并使用value = TRUE来获取那些元素不符合标准,将它们粘贴在一起
sapply(strsplit(s, "\\s+"), function(x)
paste(grep("^[A-Za-z]+$", x, invert = TRUE, value = TRUE), collapse=' '))
#[1] "ex4mple" "3xample" "thir7" "w1th w0rds w3ll"
Or we can use str_extract
或者我们可以使用str_extract
library(stringr)
sapply(str_extract_all(s, '[A-Za-z]*[0-9]+[A-Za-z]*'), paste, collapse=' ')
#[1] "ex4mple" "3xample" "thir7" "w1th w0rds w3ll"
data
s <- c("An ex4mple", "anothe 3xample", "A thir7", "And sentences w1th w0rds as w3ll")
#1
2
We can split the strings by space into a list
, loop through the elements with sapply
, then match all words that have only letters from start (^
) to end ($
), specify invert=TRUE
with value=TRUE
to get those elements that don't fit the criteria, paste
them together
我们可以按空格将字符串拆分为一个列表,使用sapply循环遍历元素,然后匹配所有只包含从start(^)到end($)的字母的单词,指定invert = TRUE并使用value = TRUE来获取那些元素不符合标准,将它们粘贴在一起
sapply(strsplit(s, "\\s+"), function(x)
paste(grep("^[A-Za-z]+$", x, invert = TRUE, value = TRUE), collapse=' '))
#[1] "ex4mple" "3xample" "thir7" "w1th w0rds w3ll"
Or we can use str_extract
或者我们可以使用str_extract
library(stringr)
sapply(str_extract_all(s, '[A-Za-z]*[0-9]+[A-Za-z]*'), paste, collapse=' ')
#[1] "ex4mple" "3xample" "thir7" "w1th w0rds w3ll"
data
s <- c("An ex4mple", "anothe 3xample", "A thir7", "And sentences w1th w0rds as w3ll")