Grep变量并将结果存储在R中的向量中

时间:2021-07-25 05:55:40

I have a list of txt files stored in A.path that I would like to use grep on to find the year associated with that file, and save this year to a vector. However, as some of these txt files have multiple years in their text, I would only like to store the first year. How can I do this?

我有一个存储在A.path中的txt文件列表,我想使用grep来查找与该文件关联的年份,并将今年保存到向量中。但是,由于其中一些txt文件的文本有多年,我只想存储第一年。我怎样才能做到这一点?

I've done similar things using lapply, and this is how I began approaching this problem:

我使用lapply做了类似的事情,这就是我开始解决这个问题的方法:

lapply(A.path, function(i){
j <- paste0(scan(i, what = character(), comment.char='', quote=NULL),  collapse = " ")
year <- vector()
year[i] <- grep('[0-9][0-9][0-9][0-9]', j)
})

grep probably isn't the right function to use, as this returns the entirety of j for each i. What is the right function to use here?

grep可能不是正确使用的函数,因为它返回每个i的j的全部。在这里使用的功能是什么?

1 个解决方案

#1


5  

Converting comment to answer, you can use gsub with \\1 to extract the value of the first match (ie. the text matched between () in the regex)

将注释转换为答案,您可以使用带有\\ 1的gsub来提取第一个匹配的值(即正则表达式中的()之间匹配的文本)

gsub(".*?([0-9]{4}).*", "\\1", j)

#1


5  

Converting comment to answer, you can use gsub with \\1 to extract the value of the first match (ie. the text matched between () in the regex)

将注释转换为答案,您可以使用带有\\ 1的gsub来提取第一个匹配的值(即正则表达式中的()之间匹配的文本)

gsub(".*?([0-9]{4}).*", "\\1", j)