如何从字符向量r动态提取字符串

时间:2022-11-29 10:55:50

Here are three character vectors:

这是三个字符向量:

[1] "Session_1/Focal_1_P1/240915_P1_S1_F1.csv"
[2] "Session_2/Focal_1_PA10/250915_PA10_S2_F1.csv"
[3] "Session_3/Focal_1_DA100/260915_DA100_S3_F1.csv"

I'm trying to extract the strings P1, PA10 and DA100, respectively in a standardised manner (as I have several hundred other strings in which I want to extract this.

我试图以标准化的方式分别提取字符串P1,PA10和DA100(因为我有几百个其他字符串,我想要提取它。

I know I need to use regex but I'm fairly new to it and not exactly sure which one.

我知道我需要使用正则表达式,但我对它很新,并不完全确定哪一个。

I can see that the commonalities are 6 numbers (\d\d\d\d\d\d)followed by an _ and then what I want followed by another _.

我可以看到共同点是6个数字(\ d \ d \ d \ d \ d \ d)后跟_然后我想要的是另一个_。

How do I extract what I want? I believe with grep but am not 100% on the regular expression I need.

如何提取我想要的内容?我相信grep,但不是我需要的正则表达式100%。

1 个解决方案

#1


2  

We can use gsub. We match zero or more characters (.*) followed by a forward slash (\\/), followed by one or more numbers and a underscore (\\d+_), or (!) two instances of an underscore followed by one or more characters that are not an underscore ((_[^_]+){2}) and replace it with blank ("").

我们可以使用gsub。我们匹配零个或多个字符(。*)后跟正斜杠(\\ /),后跟一个或多个数字和下划线(\\ d + _),或者(!)下划线的两个实例后跟一个或更多不是下划线的字符((_ [^ _] +){2})并将其替换为空格(“”)。

gsub(".*\\/\\d+_|(_[^_]+){2}", "", v1)
#[1] "P1"    "PA10"  "DA100"

Or we extract the basename of the vector, match one or more numbers followed by underscore (\\d+_) followed by character not an underscore (([^_]+)) as a capture group followed by characters until the end of the string and replace it with the backreference (\\1) for the captured group.

或者我们提取向量的基本名称,匹配一个或多个数字,后跟下划线(\\ d + _),后跟字符而不是下划线(([^ _] +))作为捕获组,后跟字符,直到结束字符串并将其替换为捕获组的反向引用(\\ 1)。

sub("\\d+_([^_]+).*", "\\1", basename(v1))
#[1] "P1"    "PA10"  "DA100"

data

v1 <- c( "Session_1/Focal_1_P1/240915_P1_S1_F1.csv",
       "Session_2/Focal_1_PA10/250915_PA10_S2_F1.csv",
       "Session_3/Focal_1_DA100/260915_DA100_S3_F1.csv")

#1


2  

We can use gsub. We match zero or more characters (.*) followed by a forward slash (\\/), followed by one or more numbers and a underscore (\\d+_), or (!) two instances of an underscore followed by one or more characters that are not an underscore ((_[^_]+){2}) and replace it with blank ("").

我们可以使用gsub。我们匹配零个或多个字符(。*)后跟正斜杠(\\ /),后跟一个或多个数字和下划线(\\ d + _),或者(!)下划线的两个实例后跟一个或更多不是下划线的字符((_ [^ _] +){2})并将其替换为空格(“”)。

gsub(".*\\/\\d+_|(_[^_]+){2}", "", v1)
#[1] "P1"    "PA10"  "DA100"

Or we extract the basename of the vector, match one or more numbers followed by underscore (\\d+_) followed by character not an underscore (([^_]+)) as a capture group followed by characters until the end of the string and replace it with the backreference (\\1) for the captured group.

或者我们提取向量的基本名称,匹配一个或多个数字,后跟下划线(\\ d + _),后跟字符而不是下划线(([^ _] +))作为捕获组,后跟字符,直到结束字符串并将其替换为捕获组的反向引用(\\ 1)。

sub("\\d+_([^_]+).*", "\\1", basename(v1))
#[1] "P1"    "PA10"  "DA100"

data

v1 <- c( "Session_1/Focal_1_P1/240915_P1_S1_F1.csv",
       "Session_2/Focal_1_PA10/250915_PA10_S2_F1.csv",
       "Session_3/Focal_1_DA100/260915_DA100_S3_F1.csv")