在R中，通过id的Aggregate（）函数，将列设置为值的数量，而不是值本身

I have a dataset that has household ID ('id') and fuel economy of vehicles owned by the household ('mpg'). This is in long form, with only the two columns 'id' and 'mpg'.

我有一个家庭ID（'id'）的数据集和家庭拥有的车辆的燃油经济性（'mpg'）。这是长形式，只有两列'id'和'mpg'。

I am trying to use either the aggregate() function or ddply() to apply the following function to the data:

我试图使用aggregate（）函数或ddply（）将以下函数应用于数据：

logratio <- function(data=x, mpg=mpg)
{
    if (length(data[mpg])>1) {
        ratio <- log(max(data[mpg])/min(data[mpg]))
        return(ratio)
    }
    else return(0)
}

I have tried the following:

我尝试过以下方法：

mpgdf <- aggregate(mpg~id, FUN=logratio, data=mpgdata)

and

和

df <- ddply(mpgdata,~id,logratio)

Neither work.

都没有工作。

The key here is that my theoretical wide format would be an 'id' column with one row for each id, and then columns for the mpg of each vehicle up to the maximum number of vehicles (ie if the house with the most vehicles has three vehicles, 'mpg1, 'mpg2', 'mpg3'). And I would like to find the natural log of the ratio of the highest fuel economy to the smallest, returning 0 (log of 1) if there is only one vehicle.

这里的关键是我的理论宽格式是一个'id'列，每个id有一行，然后是每辆车的mpg列到最大车辆数（即如果车辆最多的房子有三个）车辆，'mpg1，'mpg2'，'mpg3'）。我想找到最高燃油经济性与最小燃油经济性之比的自然对数，如果只有一辆车则返回0（对数为1）。

I'm starting to get a bit frustrated as both plyr and reshape seem to want to set columns as the values of the extant 'mpg' column, whereas I would like them as explained above.

我开始有点沮丧，因为plyr和reshape似乎都希望将列设置为现存'mpg'列的值，而我希望它们如上所述。

I would like this be returned as a dataframe with two columns - 'id' with each of the household IDs a single time set against 'mpglogratio', so that I can then merge that back into a larger dataset I have.

我希望这可以作为一个带有两列的数据框返回 - 'id'，每个家庭ID一次设置为'mpglogratio'，这样我就可以将它合并回我拥有的更大的数据集中。

And help would be greatly appreciated!

非常感谢帮助！

Thanks.

谢谢。

1 个解决方案

#1

With plyr you can try this

有了plyr，你可以尝试一下

logratio <- function(x)
        log(max(x)/min(x))

require(plyr)
mtcars <- mtcars[,c("cyl", "mpg")]
mtcars <- rbind(mtcars, c(5, 30))

ddply(mtcars, .(cyl), summarise, mpglogratio = logratio(mpg))
##   cyl mpglogratio
## 1   4     0.46002
## 2   5     0.00000
## 3   6     0.18419
## 4   8     0.61310

Just replace cyl by id and mtcars with your actual data to make it work with your data and actually there's no need to test for the length because if your mpg is of length one then max == min thus max/min == 1 so you'll end up with log(1) also known as 0

只需将id和mtcars替换为您的实际数据，以使其与您的数据一起使用，实际上不需要测试长度，因为如果您的mpg长度为1，那么max == min因此max / min == 1所以你最终会得到log（1），也称为0

A final note, if you want to merge it back quickly, use transform instead of summarise like this

最后一点，如果你想快速合并，请使用transform而不是像这样汇总

ddply(mtcars, .(cyl), transform, mpglogratio = logratio(mpg))

#1