如何向数据添加列。表中的值来自基于regex的列表

时间:2022-10-19 23:21:03

I have the following data.table:

我有以下数据。

    id      fShort
1   432-12  1245
2   3242-12 453543
3   324-32  45543
4   322-34  45343
5   2324-34 13543


DT <- data.table(
        id=c("432-12", "3242-12", "324-32", "322-34", "2324-34"), 
        fShort=c("1245", "453543", "45543", "45343", "13543"))

and the following list:

和下面的列表:

filenames <- list("3242-124342345.png", "432-124343.png", "135-13434.jpeg")

I would like to create a new column "fComplete" that includes the complete filename from the list. For this the values of column "id" need to be matched with the filename-list. If the filename starts with the "id" string, the complete filename should be returned. I use the following regex

我想创建一个新的列“fComplete”,其中包含列表中的完整文件名。为此,需要将列“id”的值与fil珐琅-list进行匹配。如果文件名以“id”字符串开头,则应返回完整的文件名。我使用下面的regex

t <- grep("432-12","432-124343.png",value=T)

that return the correct filename.

返回正确的文件名。

This is how the final table should look like:

最后的表格应该是这样的:

    id      fShort      fComplete
1   432-12  1245    432-124343.png
2   3242-12 453543  3242-124342345.png
3   324-32  45543   NA
4   322-34  45343   NA
5   2324-34 13543   NA


DT2 <- data.table(
         id=c("432-12", "3242-12", "324-32", "322-34", "2324-34"), 
         fshort=c("1245", "453543", "45543", "45343", "13543"), 
         fComplete = c("432-124343.png", "3242-124342345.png", NA, NA, NA))

I tried using apply and data.table approaches but I always get warnings like

我尝试使用apply和data。表方法,但我总是得到类似的警告

argument 'pattern' has length > 1 and only the first element will be used

What is a simple approach to accomplish this?

有什么简单的方法可以做到这一点?

2 个解决方案

#1


3  

Here's a data.table solution:

这里有一个数据。表解决方案:

DT[ , fComplete := lapply(id, function(x) {
  m <- grep(x, filenames, value = TRUE)
  if (!length(m)) NA else m})]


        id fShort          fComplete
1:  432-12   1245     432-124343.png
2: 3242-12 453543 3242-124342345.png
3:  324-32  45543                 NA
4:  322-34  45343                 NA
5: 2324-34  13543                 NA

#2


1  

In my experience with similar functions, sometimes the regex functions return a list, so you have to consider that in the apply - I usually do an example manually Also apply will not always in y experience on its own return something that always works into a data.frame,sometimes I had to use lap ply, and or unlist and data.frame to modify it

与类似的功能在我的经验中,有时regex函数返回一个列表,所以你必须考虑到在应用——我通常做一个例子还手动应用不会总是在y的经验自行返回的东西总是data.frame,有时我必须用一圈厚度,或unlist data.frame修改它

Here is an answer - I am not familiar with data.tables and I was having issues with the filenames being in a list, but with some transformations this works. I worked it out by seeing what apply was outputting and adding the [1] to get the piece I needed

这里有一个答案——我不熟悉数据。表和我遇到了文件名在列表中的问题,但是通过一些转换,这个方法可以工作。我通过查看应用的输出和添加[1]来得到我需要的部分来解决这个问题

DT <- data.frame(
  id=c("432-12", "3242-12", "324-32", "322-34", "2324-34"), 
  fShort=c("1245", "453543", "45543", "45343", "13543"))

filenames <- list("3242-124342345.png", "432-124343.png", "135-13434.jpeg")
filenames1 <- unlist(filenames)

x<-apply(DT[1],1,function(x) grep(x,filenames1)[1])
DT$fielname <- filenames1[x]

#1


3  

Here's a data.table solution:

这里有一个数据。表解决方案:

DT[ , fComplete := lapply(id, function(x) {
  m <- grep(x, filenames, value = TRUE)
  if (!length(m)) NA else m})]


        id fShort          fComplete
1:  432-12   1245     432-124343.png
2: 3242-12 453543 3242-124342345.png
3:  324-32  45543                 NA
4:  322-34  45343                 NA
5: 2324-34  13543                 NA

#2


1  

In my experience with similar functions, sometimes the regex functions return a list, so you have to consider that in the apply - I usually do an example manually Also apply will not always in y experience on its own return something that always works into a data.frame,sometimes I had to use lap ply, and or unlist and data.frame to modify it

与类似的功能在我的经验中,有时regex函数返回一个列表,所以你必须考虑到在应用——我通常做一个例子还手动应用不会总是在y的经验自行返回的东西总是data.frame,有时我必须用一圈厚度,或unlist data.frame修改它

Here is an answer - I am not familiar with data.tables and I was having issues with the filenames being in a list, but with some transformations this works. I worked it out by seeing what apply was outputting and adding the [1] to get the piece I needed

这里有一个答案——我不熟悉数据。表和我遇到了文件名在列表中的问题,但是通过一些转换,这个方法可以工作。我通过查看应用的输出和添加[1]来得到我需要的部分来解决这个问题

DT <- data.frame(
  id=c("432-12", "3242-12", "324-32", "322-34", "2324-34"), 
  fShort=c("1245", "453543", "45543", "45343", "13543"))

filenames <- list("3242-124342345.png", "432-124343.png", "135-13434.jpeg")
filenames1 <- unlist(filenames)

x<-apply(DT[1],1,function(x) grep(x,filenames1)[1])
DT$fielname <- filenames1[x]