在循环中以R填充数据帧

时间:2022-02-07 22:55:28

I am trying to populate a data frame from within a for loop in R. The names of the columns are generated dynamically within the loop and the value of some of the loop variables is used as the values while populating the data frame. For instance the name of the current column could be some variable name as a string in the loop, and the column can take the value of the current iterator as its value in the data frame.

我尝试在r中填充一个for循环中的数据框架,在循环中动态生成列的名称,并在填充数据帧时使用一些循环变量的值作为值。例如,当前列的名称可以是循环中的某个变量名作为字符串,该列可以将当前迭代器的值作为其在数据框架中的值。

I tried to create an empty data frame outside the loop, like this

我试图在循环外部创建一个空的数据帧,就像这样

d = data.frame()

But I cant really do anything with it, the moment I try to populate it, I run into an error

但是我不能对它做任何事情,当我试图填充它的时候,我遇到了一个错误

 d[1] = c(1,2)
Error in `[<-.data.frame`(`*tmp*`, 1, value = c(1, 2)) : 
  replacement has 2 rows, data has 0

What may be a good way to achieve what I am looking to do. Please let me know if I wasnt clear.

什么可能是实现我想做的事情的好方法。如果我不清楚,请告诉我。

3 个解决方案

#1


29  

You could do it like this:

你可以这样做:

 iterations = 10
 variables = 2

 output <- matrix(ncol=variables, nrow=iterations)

 for(i in 1:iterations){
  output[i,] <- runif(2)

 }

 output

and then turn it into a data.frame

然后把它变成一个data.frame

 output <- data.frame(output)
 class(output)

what this does:

这样做:

  1. create a matrix with rows and columns according to the expected growth
  2. 根据预期增长创建一个包含行和列的矩阵
  3. insert 2 random numbers into the matrix
  4. 在矩阵中插入2个随机数。
  5. convert this into a dataframe after the loop has finished.
  6. 在循环完成后将其转换为dataframe。

#2


37  

It is often preferable to avoid loops and use vectorized functions. If that is not possible there are two approaches:

避免循环和使用矢量化函数通常更好。如果这是不可能的,有两种方法:

  1. Preallocate your data.frame. This is not recommended because indexing is slow for data.frames.
  2. Preallocate data.frame。这是不推荐的,因为索引对于data.frame来说很慢。
  3. Use another data structure in the loop and transform into a data.frame afterwards. A list is very useful here.
  4. 在循环中使用另一个数据结构,然后转换为data.frame。列表在这里非常有用。

Example to illustrate the general approach:

举例说明一般方法:

mylist <- list() #create an empty list

for (i in 1:5) {
  vec <- numeric(5) #preallocate a numeric vector
  for (j in 1:5) { #fill the vector
    vec[j] <- i^j 
  }
  mylist[[i]] <- vec #put all vectors in the list
}
df <- do.call("rbind",mylist) #combine all vectors into a matrix

In this example it is not necessary to use a list, you could preallocate a matrix. However, if you do not know how many iterations your loop will need, you should use a list.

在本例中,没有必要使用列表,您可以预先分配一个矩阵。但是,如果不知道循环需要多少次迭代,应该使用列表。

Finally here is a vectorized alternative to the example loop:

最后,这里有一个矢量化的替代示例循环:

outer(1:5,1:5,function(i,j) i^j)

As you see it's simpler and also more efficient.

如你所见,它更简单,也更有效。

#3


0  

I had a case in where I was needing to use a data frame within a for loop function. In this case, it was the "efficient", however, keep in mind that the database was small and the iterations in the loop were very simple. But maybe the code could be useful for some one with similar conditions.

我有一个例子,我需要在for循环函数中使用一个数据帧。在这种情况下,它是“有效的”,但是请记住,数据库很小,循环中的迭代非常简单。但是这些代码可能对具有类似条件的人有用。

The for loop purpose was to use the raster extract function along five locations (i.e. 5 Tokio, New York, Sau Paulo, Seul & Mexico city) and each location had their respective raster grids. I had a spatial point database with more than 1000 observations allocated within the 5 different locations and I was needing to extract information from 10 different raster grids (two grids per location). Also, for the subsequent analysis, I was not only needing the raster values but also the unique ID for each observations.

for循环的目的是使用光栅提取函数沿着5个位置(即Tokio, New York, Sau Paulo, Seul & Mexico city),每个位置都有各自的光栅网格。我有一个空间点数据库,在5个不同的位置上分配了1000多个观察结果,我需要从10个不同的栅格(每个位置有两个栅格)中提取信息。另外,对于后续的分析,我不仅需要光栅值,而且还需要每个观测值的唯一ID。

After preparing the spatial data, which included the following tasks:

在准备空间数据后,包括以下任务:

  1. Import points shapefile with the readOGR function (rgdap package)
  2. 带有readOGR函数的导入点shapefile (rgdap包)
  3. Import raster files with the raster function (raster package)
  4. 使用光栅函数(光栅包)导入光栅文件
  5. Stack grids from the same location into one file, with the function stack (raster package)
  6. 将同一位置的网格与函数堆栈(光栅包)放在一个文件中

Here the for loop code with the use of a data frame:

这里的for循环代码使用一个数据帧:

1. Add stacked rasters per location into a list

1。将每个位置的叠加光栅添加到列表中

raslist <- list(LOC1,LOC2,LOC3,LOC4,LOC5)

2. Create an empty dataframe, this will be the output file

2。创建一个空的dataframe,这将是输出文件

TB <- data.frame(VAR1=double(),VAR2=double(),ID=character())

3. Set up for loop function

3所示。设置循环函数。

L1 <- seq(1,5,1) # the location ID is a numeric variable with values from 1 to 5 

for (i in 1:length(L1)) {
  dat=subset(points,LOCATION==i) # select corresponding points for location [i] 
  t=data.frame(extract(raslist[[i]],dat),dat$ID) # run extract function with points & raster stack for location [i]
  names(t)=c("VAR1","VAR2","ID") 
  TB=rbind(TB,t)
}

#1


29  

You could do it like this:

你可以这样做:

 iterations = 10
 variables = 2

 output <- matrix(ncol=variables, nrow=iterations)

 for(i in 1:iterations){
  output[i,] <- runif(2)

 }

 output

and then turn it into a data.frame

然后把它变成一个data.frame

 output <- data.frame(output)
 class(output)

what this does:

这样做:

  1. create a matrix with rows and columns according to the expected growth
  2. 根据预期增长创建一个包含行和列的矩阵
  3. insert 2 random numbers into the matrix
  4. 在矩阵中插入2个随机数。
  5. convert this into a dataframe after the loop has finished.
  6. 在循环完成后将其转换为dataframe。

#2


37  

It is often preferable to avoid loops and use vectorized functions. If that is not possible there are two approaches:

避免循环和使用矢量化函数通常更好。如果这是不可能的,有两种方法:

  1. Preallocate your data.frame. This is not recommended because indexing is slow for data.frames.
  2. Preallocate data.frame。这是不推荐的,因为索引对于data.frame来说很慢。
  3. Use another data structure in the loop and transform into a data.frame afterwards. A list is very useful here.
  4. 在循环中使用另一个数据结构,然后转换为data.frame。列表在这里非常有用。

Example to illustrate the general approach:

举例说明一般方法:

mylist <- list() #create an empty list

for (i in 1:5) {
  vec <- numeric(5) #preallocate a numeric vector
  for (j in 1:5) { #fill the vector
    vec[j] <- i^j 
  }
  mylist[[i]] <- vec #put all vectors in the list
}
df <- do.call("rbind",mylist) #combine all vectors into a matrix

In this example it is not necessary to use a list, you could preallocate a matrix. However, if you do not know how many iterations your loop will need, you should use a list.

在本例中,没有必要使用列表,您可以预先分配一个矩阵。但是,如果不知道循环需要多少次迭代,应该使用列表。

Finally here is a vectorized alternative to the example loop:

最后,这里有一个矢量化的替代示例循环:

outer(1:5,1:5,function(i,j) i^j)

As you see it's simpler and also more efficient.

如你所见,它更简单,也更有效。

#3


0  

I had a case in where I was needing to use a data frame within a for loop function. In this case, it was the "efficient", however, keep in mind that the database was small and the iterations in the loop were very simple. But maybe the code could be useful for some one with similar conditions.

我有一个例子,我需要在for循环函数中使用一个数据帧。在这种情况下,它是“有效的”,但是请记住,数据库很小,循环中的迭代非常简单。但是这些代码可能对具有类似条件的人有用。

The for loop purpose was to use the raster extract function along five locations (i.e. 5 Tokio, New York, Sau Paulo, Seul & Mexico city) and each location had their respective raster grids. I had a spatial point database with more than 1000 observations allocated within the 5 different locations and I was needing to extract information from 10 different raster grids (two grids per location). Also, for the subsequent analysis, I was not only needing the raster values but also the unique ID for each observations.

for循环的目的是使用光栅提取函数沿着5个位置(即Tokio, New York, Sau Paulo, Seul & Mexico city),每个位置都有各自的光栅网格。我有一个空间点数据库,在5个不同的位置上分配了1000多个观察结果,我需要从10个不同的栅格(每个位置有两个栅格)中提取信息。另外,对于后续的分析,我不仅需要光栅值,而且还需要每个观测值的唯一ID。

After preparing the spatial data, which included the following tasks:

在准备空间数据后,包括以下任务:

  1. Import points shapefile with the readOGR function (rgdap package)
  2. 带有readOGR函数的导入点shapefile (rgdap包)
  3. Import raster files with the raster function (raster package)
  4. 使用光栅函数(光栅包)导入光栅文件
  5. Stack grids from the same location into one file, with the function stack (raster package)
  6. 将同一位置的网格与函数堆栈(光栅包)放在一个文件中

Here the for loop code with the use of a data frame:

这里的for循环代码使用一个数据帧:

1. Add stacked rasters per location into a list

1。将每个位置的叠加光栅添加到列表中

raslist <- list(LOC1,LOC2,LOC3,LOC4,LOC5)

2. Create an empty dataframe, this will be the output file

2。创建一个空的dataframe,这将是输出文件

TB <- data.frame(VAR1=double(),VAR2=double(),ID=character())

3. Set up for loop function

3所示。设置循环函数。

L1 <- seq(1,5,1) # the location ID is a numeric variable with values from 1 to 5 

for (i in 1:length(L1)) {
  dat=subset(points,LOCATION==i) # select corresponding points for location [i] 
  t=data.frame(extract(raslist[[i]],dat),dat$ID) # run extract function with points & raster stack for location [i]
  names(t)=c("VAR1","VAR2","ID") 
  TB=rbind(TB,t)
}