I have a character vector t
as follows.
我有一个字符向量t,如下所示。
t <- c("GID456 SPK711", "GID456 GID667 VINK", "GID45345 DNP990 GID2345",
"GID895 GID895 K350")
I would like to extract all the strings starting with GID and followed by a sequence of digits.
我想提取所有的字符串,以GID开头,后面跟着一个数字序列。
This works, but does not retrieve multiple instances.
这可以工作,但是不检索多个实例。
gsub(".*(GID\\d+).*", "\\1", t)
[1] "GID456" "GID667" "GID2345" "GID895"
How to extract all the strings in this case? The desired output is as follows
在这种情况下如何提取所有的字符串?期望的输出如下所示
out <- c("GID456", "GID456", "GID667", "GID45345", "GID2345",
"GID895", "GID895")
4 个解决方案
#1
10
Here's an approach using a package I maintain qdapRegex (I prefer this or stringi/stringr) to base for consistency and ease of use. I also show a base approach. In any event I'd look at this more as an "extraction" problem than a subbing problem.
这里有一种方法,使用我维护的qdapRegex包(我更喜欢这个或stringi/stringr)来建立一致性和易用性。我还展示了一个基本方法。在任何情况下,我都把这个问题看成是一个“提取”问题,而不是一个简单的问题。
y <- c("GID456 SPK711", "GID456 GID667 VINK", "GID45345 DNP990 GID2345",
"GID895 GID895 K350")
library(qdapRegex)
unlist(ex_default(y, pattern = "GID\\d+"))
## [1] "GID456" "GID456" "GID667" "GID45345" "GID2345" "GID895" "GID895"
In base R:
在基地R:
unlist(regmatches(y, gregexpr("GID\\d+", y)))
#2
3
Through gsub
通过gsub
> t <- c("GID456 SPK711", "GID456 GID667 VINK", "GID45345 DNP990 GID2345",
+ "GID895 GID895 K350")
> unlist(strsplit(gsub("(GID\\d+)|.", "\\1 ", t), "\\s+"))
[1] "GID456" "GID456" "GID667" "GID45345" "GID2345"
[6] "GID895" "GID895"
#3
1
I have used str_split
function from the stringr
package
我使用了stringr包中的str_split函数
library(stringr)
word.list = str_split(t, '\\s+')
new_list <- unlist(word.list)
new_list[grep("GID", new_list)]
I hope this helps.
我希望这可以帮助。
#4
1
I'm late to the party, but this tidyverse one-liner might be useful for someone.
我参加聚会迟到了,但这句俏皮话可能对某些人有用。
With stringr + dplyr:
stringr + dplyr:
t <- c("GID456 SPK711", "GID456 GID667 VINK", "GID45345 DNP990 GID2345", "GID895 GID895 K350")
str_extract_all(t, regex("GID\\d+")) %>% unlist()
gives:
给:
[1] "GID456" "GID456" "GID667" "GID45345" "GID2345" "GID895" "GID895"
#1
10
Here's an approach using a package I maintain qdapRegex (I prefer this or stringi/stringr) to base for consistency and ease of use. I also show a base approach. In any event I'd look at this more as an "extraction" problem than a subbing problem.
这里有一种方法,使用我维护的qdapRegex包(我更喜欢这个或stringi/stringr)来建立一致性和易用性。我还展示了一个基本方法。在任何情况下,我都把这个问题看成是一个“提取”问题,而不是一个简单的问题。
y <- c("GID456 SPK711", "GID456 GID667 VINK", "GID45345 DNP990 GID2345",
"GID895 GID895 K350")
library(qdapRegex)
unlist(ex_default(y, pattern = "GID\\d+"))
## [1] "GID456" "GID456" "GID667" "GID45345" "GID2345" "GID895" "GID895"
In base R:
在基地R:
unlist(regmatches(y, gregexpr("GID\\d+", y)))
#2
3
Through gsub
通过gsub
> t <- c("GID456 SPK711", "GID456 GID667 VINK", "GID45345 DNP990 GID2345",
+ "GID895 GID895 K350")
> unlist(strsplit(gsub("(GID\\d+)|.", "\\1 ", t), "\\s+"))
[1] "GID456" "GID456" "GID667" "GID45345" "GID2345"
[6] "GID895" "GID895"
#3
1
I have used str_split
function from the stringr
package
我使用了stringr包中的str_split函数
library(stringr)
word.list = str_split(t, '\\s+')
new_list <- unlist(word.list)
new_list[grep("GID", new_list)]
I hope this helps.
我希望这可以帮助。
#4
1
I'm late to the party, but this tidyverse one-liner might be useful for someone.
我参加聚会迟到了,但这句俏皮话可能对某些人有用。
With stringr + dplyr:
stringr + dplyr:
t <- c("GID456 SPK711", "GID456 GID667 VINK", "GID45345 DNP990 GID2345", "GID895 GID895 K350")
str_extract_all(t, regex("GID\\d+")) %>% unlist()
gives:
给:
[1] "GID456" "GID456" "GID667" "GID45345" "GID2345" "GID895" "GID895"