在R中,如何使用正则表达式操纵数据框中的变量?

时间:2021-11-15 05:10:07

This is the dataset

这是数据集

df1 <- data.frame("id" = c("ebi.ac.uk:MIAMExpress:Reporter:A-MEXP-503.100044", 
                       "ebi.ac.uk:MIAMExpress:Reporter:A-MEXP-783.100435",
                       "ebi.ac.uk:MIAMExpress:Reporter:C-DEA-783.100435"),
              "Name" = c("ABC", "DEF", ""))

The product of the dataset

数据集的乘积

                                                  id   Name
1   ebi.ac.uk:MIAMExpress:Reporter:A-MEXP-503.100044    ABC
2   ebi.ac.uk:MIAMExpress:Reporter:A-MEXP-503.100435    DEF
3   ebi.ac.uk:MIAMExpress:Reporter:A-MEXP-503.100488     

I want to make the dataframe look like this

我想让数据框看起来像这样

       id     Name
1  100044      ABC
2  100435      DEF
3  100488       NA 

Can anyone show me how to approach this problem?

任何人都可以告诉我如何处理这个问题?

1 个解决方案

#1


2  

Regex way to find the last dot:

正则表达式找到最后一个点:

df1$id <- as.character(df1$id)
regexpr("\\.[^\\.]*$", df1$id) # may not need \\ on second one

or sapply(gregexpr("\\.", x), tail, 1)

或者是sapply(gregexpr(“\\。”,x),tail,1)

Easier to remember, non-regex way:

更容易记住,非正则表达方式:

df1$id <- as.character(df1$id)

df1$id <- sapply(strsplit(df1$id,split="\\."),tail,1)
df1$Name[df1$Name == ""] <- NA

df1
      id Name
1 100044  ABC
2 100435  DEF
3 100435 <NA>

sapply(strsplit(df1$id,split="\\."),tail,1) is from here.

sapply(strsplit(df1 $ id,split =“\\。”),tail,1)来自这里。

#1


2  

Regex way to find the last dot:

正则表达式找到最后一个点:

df1$id <- as.character(df1$id)
regexpr("\\.[^\\.]*$", df1$id) # may not need \\ on second one

or sapply(gregexpr("\\.", x), tail, 1)

或者是sapply(gregexpr(“\\。”,x),tail,1)

Easier to remember, non-regex way:

更容易记住,非正则表达方式:

df1$id <- as.character(df1$id)

df1$id <- sapply(strsplit(df1$id,split="\\."),tail,1)
df1$Name[df1$Name == ""] <- NA

df1
      id Name
1 100044  ABC
2 100435  DEF
3 100435 <NA>

sapply(strsplit(df1$id,split="\\."),tail,1) is from here.

sapply(strsplit(df1 $ id,split =“\\。”),tail,1)来自这里。