使用ddply对dataframe列的子集进行r操作

时间:2023-01-05 22:57:31

I have a large-ish dataframe (40000 observations of 800 variables) and wish to operate on a range of columns of every observation with something akin to dot product. This is how I implemented it:

我有一个大的dataframe(40000个观察值,包含800个变量),我希望对每个观察的列进行操作,使用类似于点积的东西。我就是这样实施的:

matrixattempt <- as.matrix(dframe)
takerow <- function(k) {as.vector(matrixattempt[k,])}
takedot0 <- function(k) {sqrt(sum(data0averrow * takerow(k)[2:785]))}

for (k in 1:40000){
print(k)
dframe$dot0aver[k]<-takedot0(k)
}

The print is just to keep track of what's going on. data0averrow is a numeric vector, same size as takerow(k)[2:785], that has been pre-defined.

印刷只是为了跟踪正在发生的事情。data0averrow是一个数字向量,大小与takerow(k)[2:785]相同,它已经被预先定义。

This is running, and from a few tests running correctly, but it is very slow.

这是正在运行的,一些测试正确地运行,但是非常缓慢。

I searched for dot product for a subset of columns, and found this question, but could not figure out how to apply it to my setup. ddply sounds like it should work faster (although I do not want to do splitting and would have to use the same define-id trick that the referenced questioner did). Any insight/hints?

我搜索了一组列的点积,发现了这个问题,但不知道如何将它应用到我的设置中。ddply听起来应该工作得更快(尽管我不想进行分割,而且必须使用与引用的发问者相同的定义id技巧)。任何见解/提示吗?

2 个解决方案

#1


2  

Try this:

试试这个:

sqrt(colSums(t(matrixattempt[, 2:785])  * data0averrow))

or equivalently:

或者说:

sqrt(matrixattempt[, 2:785] %*% data0averrow)

#2


2  

Use matrix multiplication and rowSums on the result:

使用矩阵乘法和行数对结果:

dframe$dot0aver <- NA
dframe$dot0aver[2:785] <- sqrt( rowSums( 
                              matrixattempt[2:785,] %*% data0averrow ))

It's the sqrt of the dot-product of data0aver with each row in the range

它是data0aver每一行在范围内的点乘积的平方根

#1


2  

Try this:

试试这个:

sqrt(colSums(t(matrixattempt[, 2:785])  * data0averrow))

or equivalently:

或者说:

sqrt(matrixattempt[, 2:785] %*% data0averrow)

#2


2  

Use matrix multiplication and rowSums on the result:

使用矩阵乘法和行数对结果:

dframe$dot0aver <- NA
dframe$dot0aver[2:785] <- sqrt( rowSums( 
                              matrixattempt[2:785,] %*% data0averrow ))

It's the sqrt of the dot-product of data0aver with each row in the range

它是data0aver每一行在范围内的点乘积的平方根