使用每个()与reshape2 :: dcast聚合数据

时间:2023-01-29 16:58:03

I'm usually using reshape package to aggregate some data (d'uh), usually with plyr, because of its uber-awesome function each. Recently, I received a suggestion to switch to reshape2 and try it out, and now I can't seem to use each wizardry anymore.

我通常使用reshape包来聚合一些数据(呃),通常用plyr,因为每个都有超级棒的功能。最近,我收到了一个建议,切换到reshape2并尝试一下,现在我似乎无法再使用每个魔法。

reshape

> m <- melt(mtcars, id.vars = c("am", "vs"), measure.vars = "hp")
> cast(m, am + vs ~ variable, each(min, max, mean, sd))
  am vs hp_min hp_max   hp_mean    hp_sd
1  0  0    150    245 194.16667 33.35984
2  0  1     62    123 102.14286 20.93186
3  1  0     91    335 180.83333 98.81582
4  1  1     52    113  80.57143 24.14441

reshape2

require(plyr)
> m <- melt(mtcars, id.vars = c("am", "vs"), measure.vars = "hp")
> dcast(m, am + vs ~ variable, each(min, max, mean, sd))
Error in structure(ordered, dim = ns) : 
  dims [product 4] do not match the length of object [16]
In addition: Warning messages:
1: In fs[[i]](x, ...) : no non-missing arguments to min; returning Inf
2: In fs[[i]](x, ...) : no non-missing arguments to max; returning -Inf

I wasn't into mood to comb this down, as my previous code works like a charm with reshape, but I'd really like to know:

我没有心情去梳理它,因为我之前的代码就像一个重塑的魅力,但我真的很想知道:

  1. is it possible to use each with dcast?
  2. 是否有可能与dcast一起使用?
  3. is it advisable to use reshape2 at all? is reshape deprecated?
  4. 是否建议使用reshape2?重塑已弃用?

1 个解决方案

#1


5  

The answer to your first question appears to be no. Quoting from ?reshape2:::dcast:

你的第一个问题的答案似乎是否定的。引自?reshape2 ::: dcast:

If the combination of variables you supply does not uniquely identify one row in the original data set, you will need to supply an aggregating function, fun.aggregate. This function should take a vector of numbers and return a single summary statistic.

如果您提供的变量组合不能唯一标识原始数据集中的一行,则需要提供聚合函数fun.aggregate。此函数应采用数字向量并返回单个摘要统计信息。

A look at Hadley's github page for reshape2 suggests that he knows this functionality was removed, but seems to think it's better done in plyr, presumably with something like:

看看Hadley的reshape2的github页面表明他知道这个功能被删除了,但似乎认为在plyr中做得更好,大概是这样的:

ddply(m,.(am,vs),summarise,min = min(value),
                           max = max(value),
                           mean = mean(value),
                           sd = sd(value))

or if you really want to keep using each:

或者如果你真的想继续使用每个:

ddply(m,.(am,vs),function(x){each(min,max,mean,sd)(x$value)})

#1


5  

The answer to your first question appears to be no. Quoting from ?reshape2:::dcast:

你的第一个问题的答案似乎是否定的。引自?reshape2 ::: dcast:

If the combination of variables you supply does not uniquely identify one row in the original data set, you will need to supply an aggregating function, fun.aggregate. This function should take a vector of numbers and return a single summary statistic.

如果您提供的变量组合不能唯一标识原始数据集中的一行,则需要提供聚合函数fun.aggregate。此函数应采用数字向量并返回单个摘要统计信息。

A look at Hadley's github page for reshape2 suggests that he knows this functionality was removed, but seems to think it's better done in plyr, presumably with something like:

看看Hadley的reshape2的github页面表明他知道这个功能被删除了,但似乎认为在plyr中做得更好,大概是这样的:

ddply(m,.(am,vs),summarise,min = min(value),
                           max = max(value),
                           mean = mean(value),
                           sd = sd(value))

or if you really want to keep using each:

或者如果你真的想继续使用每个:

ddply(m,.(am,vs),function(x){each(min,max,mean,sd)(x$value)})