将多个CSV文件读入单独的数据帧

时间:2022-09-01 22:08:34

Suppose we have files file1.csv, file2.csv, ... , and file100.csv in directory C:\R\Data and we want to read them all into separate data frames (e.g. file1, file2, ... , and file100).

假设我们在目录C:\ R \ Data中有文件file1.csv,file2.csv,...和file100.csv,我们希望将它们全部读入单独的数据框(例如file1,file2,...和file100)。

The reason for this is that, despite having similar names they have different file structures, so it is not that useful to have them in a list.

这样做的原因是,尽管它们具有相似的名称,但它们具有不同的文件结构,因此将它们放在列表中并不是很有用。

I could use lapply but that returns a single list containing 100 data frames. Instead I want these data frames in the Global Environment.

我可以使用lapply但返回包含100个数据帧的单个列表。相反,我想在全球环境中使用这些数据框。

How do I read multiple files directly into the global environment? Or, alternatively, How do I unpack the contents of a list of data frames into it?

如何将多个文件直接读入全局环境?或者,或者,如何将数据框列表的内容解压缩到其中?

9 个解决方案

#1


21  

Quick draft, untested:

快速草案,未经测试:

  1. Use list.files() aka dir() to dynamically generate your list of files.

    使用list.files()aka dir()来动态生成文件列表。

  2. This returns a vector, just run along the vector in a for loop.

    这将返回一个向量,只是在for循环中沿向量运行。

  3. Read the i-th file, then use assign() to place the content into a new variable file_i

    读取第i个文件,然后使用assign()将内容放入新变量file_i中

That should do the trick for you.

这应该为你做的伎俩。

#2


19  

Thank you all for replying.

谢谢大家的回复。

For completeness here is my final answer for loading any number of (tab) delimited files, in this case with 6 columns of data each where column 1 is characters, 2 is factor, and remainder numeric:

为了完整性,这里是我加载任意数量(制表符)分隔文件的最终答案,在这种情况下有6列数据,其中第1列是字符,2是因子,余数是数字:

##Read files named xyz1111.csv, xyz2222.csv, etc.
filenames <- list.files(path="../Data/original_data",
    pattern="xyz+.*csv")

##Create list of data frame names without the ".csv" part 
names <-substr(filenames,1,7))

###Load all files
for(i in names){
    filepath <- file.path("../Data/original_data/",paste(i,".csv",sep=""))
    assign(i, read.delim(filepath,
    colClasses=c("character","factor",rep("numeric",4)),
    sep = "\t"))
}

#3


14  

Use assign with a character variable containing the desired name of your data frame.

使用带有包含所需数据框名称的字符变量的assign。

for(i in 1:100)
{
   oname = paste("file", i, sep="")
   assign(oname, read.csv(paste(oname, ".txt", sep="")))
}

#4


11  

Don't. Keep them as a list. It's the way to go.

别。将它们保存为列表。这是要走的路。

#5


6  

Here is a way to unpack a list of data.frames using just lapply

这是一种使用lapply解压缩data.frames列表的方法

filenames <- list.files(path="../Data/original_data",
                        pattern="xyz+.*csv")

filelist <- lappy(filenames, read.csv)

#if necessary, assign names to data.frames
names(filelist) <- c("one","two","three")

#note the invisible function keeps lapply from spitting out the data.frames to the console

invisible(lapply(names(filelist), function(x) assign(x,filelist[[x]],envir=.GlobalEnv)))

#6


5  

This answer is intended as a more useful complement to Hadley's answer.

这个答案是对哈德利答案的一个更有用的补充。

While the OP specifically wanted each file read into their R workspace as a separate object, many other people naively landing on this question may think that that's what they want to do, when in fact they'd be better off reading the files into a single list of data frames.

虽然OP特别希望将每个文件作为一个单独的对象读入其R工作区,但许多其他人天真地登陆这个问题可能会认为这就是他们想要做的事情,而实际上他们最好将文件读入一个数据框列表。

So for the record, here's how you might do that.

所以对于记录,这是你如何做到这一点。

#If the path is different than your working directory
# you'll need to set full.names = TRUE to get the full
# paths.
my_files <- list.files("path/to/files")

#Further arguments to read.csv can be passed in ...
all_csv <- lapply(my_files,read.csv,...)

#Set the name of each list element to its
# respective file name. Note full.names = FALSE to
# get only the file names, not the full path.
names(all_csv) <- gsub(".csv","",
                       list.files("path/to/files",full.names = FALSE),
                       fixed = TRUE)

Now any of the files can be referred to by my_files[["filename"]], which really isn't much worse that just having separate filename variables in your workspace, and often it is much more convenient.

现在my_files [[“filename”]]可以引用任何文件,这在工作区中只有单独的文件名变量并不是更糟糕,而且通常更方便。

#7


2  

A simple way to access the elements of a list from the global environment is to attach the list. Note that this actually creates a new environment on the search path and copies the elements of your list into it, so you may want to remove the original list after attaching to prevent having two potentially different copies floating around.

从全局环境访问列表元素的简单方法是附加列表。请注意,这实际上会在搜索路径上创建一个新环境,并将列表中的元素复制到其中,因此您可能希望在附加后删除原始列表,以防止两个可能不同的副本浮动。

#8


0  

Reading all the CSV files from a folder and creating vactors same as the file names:

从文件夹中读取所有CSV文件并创建与文件名相同的vactors:

setwd("your path to folder where CSVs are")

filenames <- gsub("\\.csv$","", list.files(pattern="\\.csv$"))

for(i in filenames){
  assign(i, read.csv(paste(i, ".csv", sep="")))
}

#9


-2  

#copy all the files you want to read in R in your working directory
a <- dir()
#using lapply to remove the".csv" from the filename 
for(i in a){
list1 <- lapply(a, function(x) gsub(".csv","",x))
}
#Final step 
for(i in list1){
filepath <- file.path("../Data/original_data/..",paste(i,".csv",sep=""))
assign(i, read.csv(filepath))
}

#1


21  

Quick draft, untested:

快速草案,未经测试:

  1. Use list.files() aka dir() to dynamically generate your list of files.

    使用list.files()aka dir()来动态生成文件列表。

  2. This returns a vector, just run along the vector in a for loop.

    这将返回一个向量,只是在for循环中沿向量运行。

  3. Read the i-th file, then use assign() to place the content into a new variable file_i

    读取第i个文件,然后使用assign()将内容放入新变量file_i中

That should do the trick for you.

这应该为你做的伎俩。

#2


19  

Thank you all for replying.

谢谢大家的回复。

For completeness here is my final answer for loading any number of (tab) delimited files, in this case with 6 columns of data each where column 1 is characters, 2 is factor, and remainder numeric:

为了完整性,这里是我加载任意数量(制表符)分隔文件的最终答案,在这种情况下有6列数据,其中第1列是字符,2是因子,余数是数字:

##Read files named xyz1111.csv, xyz2222.csv, etc.
filenames <- list.files(path="../Data/original_data",
    pattern="xyz+.*csv")

##Create list of data frame names without the ".csv" part 
names <-substr(filenames,1,7))

###Load all files
for(i in names){
    filepath <- file.path("../Data/original_data/",paste(i,".csv",sep=""))
    assign(i, read.delim(filepath,
    colClasses=c("character","factor",rep("numeric",4)),
    sep = "\t"))
}

#3


14  

Use assign with a character variable containing the desired name of your data frame.

使用带有包含所需数据框名称的字符变量的assign。

for(i in 1:100)
{
   oname = paste("file", i, sep="")
   assign(oname, read.csv(paste(oname, ".txt", sep="")))
}

#4


11  

Don't. Keep them as a list. It's the way to go.

别。将它们保存为列表。这是要走的路。

#5


6  

Here is a way to unpack a list of data.frames using just lapply

这是一种使用lapply解压缩data.frames列表的方法

filenames <- list.files(path="../Data/original_data",
                        pattern="xyz+.*csv")

filelist <- lappy(filenames, read.csv)

#if necessary, assign names to data.frames
names(filelist) <- c("one","two","three")

#note the invisible function keeps lapply from spitting out the data.frames to the console

invisible(lapply(names(filelist), function(x) assign(x,filelist[[x]],envir=.GlobalEnv)))

#6


5  

This answer is intended as a more useful complement to Hadley's answer.

这个答案是对哈德利答案的一个更有用的补充。

While the OP specifically wanted each file read into their R workspace as a separate object, many other people naively landing on this question may think that that's what they want to do, when in fact they'd be better off reading the files into a single list of data frames.

虽然OP特别希望将每个文件作为一个单独的对象读入其R工作区,但许多其他人天真地登陆这个问题可能会认为这就是他们想要做的事情,而实际上他们最好将文件读入一个数据框列表。

So for the record, here's how you might do that.

所以对于记录,这是你如何做到这一点。

#If the path is different than your working directory
# you'll need to set full.names = TRUE to get the full
# paths.
my_files <- list.files("path/to/files")

#Further arguments to read.csv can be passed in ...
all_csv <- lapply(my_files,read.csv,...)

#Set the name of each list element to its
# respective file name. Note full.names = FALSE to
# get only the file names, not the full path.
names(all_csv) <- gsub(".csv","",
                       list.files("path/to/files",full.names = FALSE),
                       fixed = TRUE)

Now any of the files can be referred to by my_files[["filename"]], which really isn't much worse that just having separate filename variables in your workspace, and often it is much more convenient.

现在my_files [[“filename”]]可以引用任何文件,这在工作区中只有单独的文件名变量并不是更糟糕,而且通常更方便。

#7


2  

A simple way to access the elements of a list from the global environment is to attach the list. Note that this actually creates a new environment on the search path and copies the elements of your list into it, so you may want to remove the original list after attaching to prevent having two potentially different copies floating around.

从全局环境访问列表元素的简单方法是附加列表。请注意,这实际上会在搜索路径上创建一个新环境,并将列表中的元素复制到其中,因此您可能希望在附加后删除原始列表,以防止两个可能不同的副本浮动。

#8


0  

Reading all the CSV files from a folder and creating vactors same as the file names:

从文件夹中读取所有CSV文件并创建与文件名相同的vactors:

setwd("your path to folder where CSVs are")

filenames <- gsub("\\.csv$","", list.files(pattern="\\.csv$"))

for(i in filenames){
  assign(i, read.csv(paste(i, ".csv", sep="")))
}

#9


-2  

#copy all the files you want to read in R in your working directory
a <- dir()
#using lapply to remove the".csv" from the filename 
for(i in a){
list1 <- lapply(a, function(x) gsub(".csv","",x))
}
#Final step 
for(i in list1){
filepath <- file.path("../Data/original_data/..",paste(i,".csv",sep=""))
assign(i, read.csv(filepath))
}