
时间:2021-07-25 05:55:40

I have a list of txt files stored in A.path that I would like to use grep on to find the year associated with that file, and save this year to a vector. However, as some of these txt files have multiple years in their text, I would only like to store the first year. How can I do this?


I've done similar things using lapply, and this is how I began approaching this problem:


lapply(A.path, function(i){
j <- paste0(scan(i, what = character(), comment.char='', quote=NULL),  collapse = " ")
year <- vector()
year[i] <- grep('[0-9][0-9][0-9][0-9]', j)

grep probably isn't the right function to use, as this returns the entirety of j for each i. What is the right function to use here?


1 个解决方案



Converting comment to answer, you can use gsub with \\1 to extract the value of the first match (ie. the text matched between () in the regex)

将注释转换为答案,您可以使用带有\\ 1的gsub来提取第一个匹配的值(即正则表达式中的()之间匹配的文本)

gsub(".*?([0-9]{4}).*", "\\1", j)



Converting comment to answer, you can use gsub with \\1 to extract the value of the first match (ie. the text matched between () in the regex)

将注释转换为答案,您可以使用带有\\ 1的gsub来提取第一个匹配的值(即正则表达式中的()之间匹配的文本)

gsub(".*?([0-9]{4}).*", "\\1", j)