将空列添加到多个数据帧

时间:2022-12-13 22:58:19

I want to add multiple empty columns to multiple dataframes. I know the code to do this for 1 dataframe is df[,namevector] <- NA (other question). Namevector is a vector which contains the names of the empty variables that should be added. I have a list of multiple dataframes so I thought the following code would do the trick.

我想将多个空列添加到多个数据帧。我知道为1个数据帧执行此操作的代码是df [,namevector] < - NA(其他问题)。 Namevector是一个向量,包含应添加的空变量的名称。我有一个包含多个数据帧的列表,所以我认为以下代码可以解决问题。

a <- data.frame(x = 1:10, y = 21:30)
b <- data.frame(x = 1:10, y = 31:40)
c <- list(a,b)
namevector <- c("z","w")     

EmptyVariables <- function(df) {df[,namevector] <- NA}
sapply(X = c, FUN = EmptyVariables)

I don't get an error message, but these 2 lines of code also don't add the empty columns.

我没有收到错误消息,但这两行代码也没有添加空列。

1 个解决方案

#1


1  

In principle the solution is there in the comments from BondedDust, but maybe some additional explanations might help.

原则上,解决方案是在BondedDust的评论中,但也许一些额外的解释可能会有所帮助。

Why did your original code not work? There are two things to be said about this:

为什么你的原始代码不起作用?关于此,有两件事要说:

  • as BondedDust mentioned, the assignment inside the function EmptyVariables is done in the environment of the function. Thus, only a local copy of the data frame df is changed, but not the df that exists in the global environment. Calling EmtpyVariables(a) leaves a unchanged.
  • 正如BondedDust所提到的,函数EmptyVariables中的赋值是在函数的环境中完成的。因此,仅改变数据帧df的本地副本,而不改变全局环境中存在的df。调用EmtpyVariables(a)保持不变。
  • a function returns the output from its last line. Since the last line of EmptyVariables is an assignment, and since assignments don't return anything in R, also the function does not return anything. This is the reason that you simply get NA twice from your call to sapply. The solution to this has already been pointed out by BondedDust: the function body should be {df[,namevector] <- NA;df}. In this case, the changed data frame is returned as the result of the function.
  • 函数返回其最后一行的输出。由于EmptyVariables的最后一行是赋值,并且由于赋值不返回R中的任何内容,因此该函数也不返回任何内容。这就是你在通话中获得两次NA的原因。 BondedDust已经指出了解决方案:函数体应该是{df [,namevector] < - NA; df}。在这种情况下,作为函数的结果返回改变的数据帧。

Also a comment regarding sapply: This function tries to return a vector or matrix. But your list of data frames can not reasonably be simplified in this way and you should therefore use lapply.

还有关于sapply的注释:此函数尝试返回向量或矩阵。但是,您的数据框列表无法以这种方式合理地简化,因此您应该使用lapply。

Finally, this is the code that should do what you want:

最后,这是应该做你想做的代码:

EmptyVariables <- function(df) {df[,namevector] <- NA;df}
res <- lapply(X = c, FUN = EmptyVariables)

res will be a list containing two data frames. Thus, res[[1]] and res[[2]] will give you a and b with the empty columns added, respectively.

res将是包含两个数据帧的列表。因此,res [[1]]和res [[2]]将分别为a和b添加空列。

#1


1  

In principle the solution is there in the comments from BondedDust, but maybe some additional explanations might help.

原则上,解决方案是在BondedDust的评论中,但也许一些额外的解释可能会有所帮助。

Why did your original code not work? There are two things to be said about this:

为什么你的原始代码不起作用?关于此,有两件事要说:

  • as BondedDust mentioned, the assignment inside the function EmptyVariables is done in the environment of the function. Thus, only a local copy of the data frame df is changed, but not the df that exists in the global environment. Calling EmtpyVariables(a) leaves a unchanged.
  • 正如BondedDust所提到的,函数EmptyVariables中的赋值是在函数的环境中完成的。因此,仅改变数据帧df的本地副本,而不改变全局环境中存在的df。调用EmtpyVariables(a)保持不变。
  • a function returns the output from its last line. Since the last line of EmptyVariables is an assignment, and since assignments don't return anything in R, also the function does not return anything. This is the reason that you simply get NA twice from your call to sapply. The solution to this has already been pointed out by BondedDust: the function body should be {df[,namevector] <- NA;df}. In this case, the changed data frame is returned as the result of the function.
  • 函数返回其最后一行的输出。由于EmptyVariables的最后一行是赋值,并且由于赋值不返回R中的任何内容,因此该函数也不返回任何内容。这就是你在通话中获得两次NA的原因。 BondedDust已经指出了解决方案:函数体应该是{df [,namevector] < - NA; df}。在这种情况下,作为函数的结果返回改变的数据帧。

Also a comment regarding sapply: This function tries to return a vector or matrix. But your list of data frames can not reasonably be simplified in this way and you should therefore use lapply.

还有关于sapply的注释:此函数尝试返回向量或矩阵。但是,您的数据框列表无法以这种方式合理地简化,因此您应该使用lapply。

Finally, this is the code that should do what you want:

最后,这是应该做你想做的代码:

EmptyVariables <- function(df) {df[,namevector] <- NA;df}
res <- lapply(X = c, FUN = EmptyVariables)

res will be a list containing two data frames. Thus, res[[1]] and res[[2]] will give you a and b with the empty columns added, respectively.

res将是包含两个数据帧的列表。因此,res [[1]]和res [[2]]将分别为a和b添加空列。