
时间:2022-09-05 15:33:49

I have several CSV files for each year. Each file contains the same variables and observations.


df14 <- data.frame(name = c("one", "two", "three"), A = c(1,2,3), B = c(4, 2, 1), C = c(0, 1, 1))
df15 <- data.frame(name = c("one", "two", "three"), A = c(3,1,1), C = c(0, 0, 1), B = c(8, 5, 5))

Suppose df14 & df15 represent years 2014 & 2015 respectively.


Note: the variables are not recorded in the same order.


What I'd like to do is see how each variable (A, B, C) are changing by year for each name.


Is there a way to combine these in one data frame? Should I simply rbind them?



One thing I could do is assign the years as a new variable and rbind but is it good practice?


df14$year <- 2014; df15$year <- 2015
df <- rbind(df14, df15)

which gives:

   name A B C year
   one 1 4 0 2014
   two 2 2 1 2014
   three 3 1 1 2014
   one 3 8 0 2015
   two 1 5 0 2015
   three 1 5 1 2015

3 个解决方案



years_2_digt <- 14:15

DT <- 
rbindlist(lapply(years_2_digt, function(y) {
  get(paste0("df", y)) %>% 
  setDT %>% 
  .[, year := y] %>%

DT.molt <- reshape2::melt(DT, id.vars=c("name", "year"))

ggplot(data=DT.molt, aes(x=year, color=variable, y=value)) + 
    geom_line() + geom_point() + 
    facet_grid(name ~ .) + 
    ggtitle("Change by year and name")



You can programmatically add the year column to each data frame and then rbind them. Here's an example that relies on being able to get the year corresponding to each data frame from the file name. Here, I've stored you sample data frames in a list. In your real use case, you'd read the csv files into a list using something like df.list = sapply(vector_of_file_names, read.csv).

您可以以编程方式将年份列添加到每个数据框,然后再绑定它们。这是一个依赖于能够从文件名中获取与每个数据框相对应的年份的示例。在这里,我已经将样本数据帧存储在列表中。在您的实际用例中,您将使用df.list = sapply(vector_of_file_names,read.csv)之类的内容将csv文件读入列表。

df.list = list(df14=df14, df15=df15)

df.list = lapply(1:length(df.list), function(i) {
  df.list[[i]] = data.frame(df.list[[i]], 
                            year = 2000 + as.numeric(gsub(".*(\\d{2})\\.csv","\\1", names(df.list)[[i]])))

df = do.call(rbind, df.list)


Here is a working example within one lapply:


Make some dummy CSV files:


df14 <- data.frame(name = c("one", "two", "three"), A = c(1,2,3), B = c(4, 2, 1), C = c(0, 1, 1))
df15 <- data.frame(name = c("one", "two", "three"), A = c(3,1,1), C = c(0, 0, 1), B = c(8, 5, 5))
df16 <- data.frame(name = c("one", "two", "three"), C = c(1,2,3), B = c(4, 2, 1), A = c(0, 1, 1))
df17 <- data.frame(name = c("one", "two", "three"), C = c(3,1,1), A = c(0, 0, 1), B = c(8, 5, 5))
#get dataframe names
myNames <- ls()[grepl("df",ls())]
lapply(myNames, function(i){write.csv(get(i),paste0(i,".csv"),row.names = FALSE)})

Solution: read CSV files, fix columns using sort, then rbind them into one dataframe:


#Solution - read CSV, fix columns, rbind
                 d <- read.csv(i)
                 res <- d[,sort(colnames(d))]
# output
#    A B C  name FileName
# 1  1 4 0   one df14.csv
# 2  2 2 1   two df14.csv
# 3  3 1 1 three df14.csv
# 4  3 8 0   one df15.csv
# 5  1 5 0   two df15.csv
# 6  1 5 1 three df15.csv
# 7  0 4 1   one df16.csv
# 8  1 2 2   two df16.csv
# 9  1 1 3 three df16.csv
# 10 0 8 3   one df17.csv
# 11 0 5 1   two df17.csv
# 12 1 5 1 three df17.csv



years_2_digt <- 14:15

DT <- 
rbindlist(lapply(years_2_digt, function(y) {
  get(paste0("df", y)) %>% 
  setDT %>% 
  .[, year := y] %>%

DT.molt <- reshape2::melt(DT, id.vars=c("name", "year"))

ggplot(data=DT.molt, aes(x=year, color=variable, y=value)) + 
    geom_line() + geom_point() + 
    facet_grid(name ~ .) + 
    ggtitle("Change by year and name")



You can programmatically add the year column to each data frame and then rbind them. Here's an example that relies on being able to get the year corresponding to each data frame from the file name. Here, I've stored you sample data frames in a list. In your real use case, you'd read the csv files into a list using something like df.list = sapply(vector_of_file_names, read.csv).

您可以以编程方式将年份列添加到每个数据框,然后再绑定它们。这是一个依赖于能够从文件名中获取与每个数据框相对应的年份的示例。在这里,我已经将样本数据帧存储在列表中。在您的实际用例中,您将使用df.list = sapply(vector_of_file_names,read.csv)之类的内容将csv文件读入列表。

df.list = list(df14=df14, df15=df15)

df.list = lapply(1:length(df.list), function(i) {
  df.list[[i]] = data.frame(df.list[[i]], 
                            year = 2000 + as.numeric(gsub(".*(\\d{2})\\.csv","\\1", names(df.list)[[i]])))

df = do.call(rbind, df.list)


Here is a working example within one lapply:


Make some dummy CSV files:


df14 <- data.frame(name = c("one", "two", "three"), A = c(1,2,3), B = c(4, 2, 1), C = c(0, 1, 1))
df15 <- data.frame(name = c("one", "two", "three"), A = c(3,1,1), C = c(0, 0, 1), B = c(8, 5, 5))
df16 <- data.frame(name = c("one", "two", "three"), C = c(1,2,3), B = c(4, 2, 1), A = c(0, 1, 1))
df17 <- data.frame(name = c("one", "two", "three"), C = c(3,1,1), A = c(0, 0, 1), B = c(8, 5, 5))
#get dataframe names
myNames <- ls()[grepl("df",ls())]
lapply(myNames, function(i){write.csv(get(i),paste0(i,".csv"),row.names = FALSE)})

Solution: read CSV files, fix columns using sort, then rbind them into one dataframe:


#Solution - read CSV, fix columns, rbind
                 d <- read.csv(i)
                 res <- d[,sort(colnames(d))]
# output
#    A B C  name FileName
# 1  1 4 0   one df14.csv
# 2  2 2 1   two df14.csv
# 3  3 1 1 three df14.csv
# 4  3 8 0   one df15.csv
# 5  1 5 0   two df15.csv
# 6  1 5 1 three df15.csv
# 7  0 4 1   one df16.csv
# 8  1 2 2   two df16.csv
# 9  1 1 3 three df16.csv
# 10 0 8 3   one df17.csv
# 11 0 5 1   two df17.csv
# 12 1 5 1 three df17.csv