从字符串中提取第一个数字

时间:2022-09-13 16:02:38

I have a string in a variable which we call v1. This string states picture numbers and takes the form of "Pic 27 + 28". I want to extract the first number and store it in a new variable called item.

在变量中有一个字符串我们称之为v1。这个字符串表示图片编号并采取“Pic 27 + 28”的形式。我想提取第一个数字并将它存储在一个名为item的新变量中。

Some code that I've tried is:

我试过的一些代码是:

item <- unique(na.omit(as.numeric(unlist(strsplit(unlist(v1),"[^0-9]+")))))

This worked fine, until I came upon a list that went:

这招很管用,直到我找到一个清单:

[1,] "Pic 26 + 25"
[2,] "Pic 27 + 28"
[3,] "Pic 28 + 27"
[4,] "Pic 29 + 30"
[5,] "Pic 30 + 29"
[6,] "Pic 31 + 32"

At this point I get more numbers than I want, as it is also grabbing other unique numbers (the 25).

此时,我得到的数字比我想要的要多,因为它也抓住了其他唯一的数字(25)。

I've actually tried doing it with gsub, but got nothing to work. Help would be appreciated greatly!

我已经用gsub做过了,但是没有什么用。非常感谢您的帮助!

5 个解决方案

#1


9  

I assume that you'd like to extract the first of two numbers in each string.

我假设你想提取每个字符串中两个数字中的第一个。

You may use the stri_extract_first_regex function from the stringi package:

您可以使用来自stringi包的stri_extract_first_regex函数:

library(stringi)
stri_extract_first_regex(c("Pic 26+25", "Pic 1,2,3", "no pics"), "[0-9]+")
## [1] "26" "1"  NA  

#2


3  

In the responses below we use this test data:

在下面的回复中,我们使用了这个测试数据:

# test data
v1 <- c("Pic 26 + 25", "Pic 27 + 28", "Pic 28 + 27", "Pic 29 + 30", 
"Pic 30 + 29", "Pic 31 + 32")

1) gsubfn

1)gsubfn

library(gsubfn)

strapply(v1, "(\\d+).*", as.numeric, simplify = c)
## [1] 26 27 28 29 30 31

2) sub This requires no packages but does involve a slightly longer regular expression:

2) sub - This不需要包,但包含稍微长一点的正则表达式:

as.numeric( sub("\\D*(\\d+).*", "\\1", v1) )
## [1] 26 27 28 29 30 31

3) read.table This involves no regular expressions or packages:

3)阅读。表此不涉及正则表达式或包:

read.table(text = v1, fill = TRUE)[[2]]
## [1] 26 27 28 29 30 31

In this particular example the fill=TRUE could be omitted but it might be needed if the components of v1 had a differing number of fields.

在这个特定的示例中,fill=TRUE可以省略,但是如果v1的组件有不同数量的字段,则可能需要这个函数。

#3


1  

To follow up your strsplit attempt:

跟进你的尝试:

# split the strings
l <- strsplit(x = c("Pic 26 + 25", "Pic 27 + 28"), split = " ")
l
# [[1]]
# [1] "Pic" "26"  "+"   "25" 
# 
# [[2]]
# [1] "Pic" "27"  "+"   "28" 

# extract relevant part from each list element and convert to numeric
as.numeric(lapply(l , `[`, 2))
# [1] 26 27

#4


1  

You can do this very nicely with the first_number() function from the filesstrings package, or for more general needs, you can use the nth_number() function. Install it via install.packages("filesstrings").

使用来自filesstring包的first_number()函数可以很好地实现这一点,或者对于更一般的需求,可以使用nth_number()函数。通过install.packages安装它(“filesstrings”)。

library(filesstrings)
#> Loading required package: stringr
strings <- c("Pic 26 + 25", "Pic 27 + 28", "Pic 28 + 27",
             "Pic 29 + 30", "Pic 30 + 29", "Pic 31 + 32")
first_number(strings)
#> [1] 26 27 28 29 30 31
nth_number(strings, n = 1)
#> [1] 26 27 28 29 30 31

#5


1  

With str_extract from stringr:

从stringr str_extract:

library(stringr)

vec = c("Pic 26 + 25", "Pic 27 + 28", "Pic 28 + 27", 
        "Pic 29 + 30", "Pic 30 + 29", "Pic 31 + 32")

str_extract(v1, "[0-9]+")
# [1] "26" "27" "28" "29" "30" "31"

#1


9  

I assume that you'd like to extract the first of two numbers in each string.

我假设你想提取每个字符串中两个数字中的第一个。

You may use the stri_extract_first_regex function from the stringi package:

您可以使用来自stringi包的stri_extract_first_regex函数:

library(stringi)
stri_extract_first_regex(c("Pic 26+25", "Pic 1,2,3", "no pics"), "[0-9]+")
## [1] "26" "1"  NA  

#2


3  

In the responses below we use this test data:

在下面的回复中,我们使用了这个测试数据:

# test data
v1 <- c("Pic 26 + 25", "Pic 27 + 28", "Pic 28 + 27", "Pic 29 + 30", 
"Pic 30 + 29", "Pic 31 + 32")

1) gsubfn

1)gsubfn

library(gsubfn)

strapply(v1, "(\\d+).*", as.numeric, simplify = c)
## [1] 26 27 28 29 30 31

2) sub This requires no packages but does involve a slightly longer regular expression:

2) sub - This不需要包,但包含稍微长一点的正则表达式:

as.numeric( sub("\\D*(\\d+).*", "\\1", v1) )
## [1] 26 27 28 29 30 31

3) read.table This involves no regular expressions or packages:

3)阅读。表此不涉及正则表达式或包:

read.table(text = v1, fill = TRUE)[[2]]
## [1] 26 27 28 29 30 31

In this particular example the fill=TRUE could be omitted but it might be needed if the components of v1 had a differing number of fields.

在这个特定的示例中,fill=TRUE可以省略,但是如果v1的组件有不同数量的字段,则可能需要这个函数。

#3


1  

To follow up your strsplit attempt:

跟进你的尝试:

# split the strings
l <- strsplit(x = c("Pic 26 + 25", "Pic 27 + 28"), split = " ")
l
# [[1]]
# [1] "Pic" "26"  "+"   "25" 
# 
# [[2]]
# [1] "Pic" "27"  "+"   "28" 

# extract relevant part from each list element and convert to numeric
as.numeric(lapply(l , `[`, 2))
# [1] 26 27

#4


1  

You can do this very nicely with the first_number() function from the filesstrings package, or for more general needs, you can use the nth_number() function. Install it via install.packages("filesstrings").

使用来自filesstring包的first_number()函数可以很好地实现这一点,或者对于更一般的需求,可以使用nth_number()函数。通过install.packages安装它(“filesstrings”)。

library(filesstrings)
#> Loading required package: stringr
strings <- c("Pic 26 + 25", "Pic 27 + 28", "Pic 28 + 27",
             "Pic 29 + 30", "Pic 30 + 29", "Pic 31 + 32")
first_number(strings)
#> [1] 26 27 28 29 30 31
nth_number(strings, n = 1)
#> [1] 26 27 28 29 30 31

#5


1  

With str_extract from stringr:

从stringr str_extract:

library(stringr)

vec = c("Pic 26 + 25", "Pic 27 + 28", "Pic 28 + 27", 
        "Pic 29 + 30", "Pic 30 + 29", "Pic 31 + 32")

str_extract(v1, "[0-9]+")
# [1] "26" "27" "28" "29" "30" "31"