在R中的旧数据框中从ddply填充新变量

时间:2023-01-12 12:42:10

I have a data.frame which looks like this (in reality 1M rows):

我有一个看起来像这样的data.frame(实际上是1M行):

`> df

`> df

             R.DMA.NAMES quarter     daypart allpersons.imp rate                    station  spot.id
1 Wilkes.Barre.Scranton.Hztn  Q22014   afternoon            0.0   30                       WSWB 13048713
2                  Nashville  Q12014   primetime            0.0   50              COM NASHVILLE 11969260
3             Seattle.Tacoma  Q12014   primetime            6.1   51 ESPN SEATTLE, EVERETT ZONE 11898905
4               Jacksonville  Q42013 late fringe            2.3  130          Jacksonville WAWS 11617447
5                    Detroit  Q22014   overnight            0.0    0                       WKBD 12571421
6         South.Bend.Elkhart  Q42013   primetime           11.5  325                       WBND 11741171`

dput(df)

dput(DF)

structure(list(R.DMA.NAMES = c("Wilkes.Barre.Scranton.Hztn", 
"Nashville", "Seattle.Tacoma", "Jacksonville", "Detroit", "South.Bend.Elkhart"
), quarter = structure(c(3L, 1L, 1L, 6L, 3L, 6L), .Label = c("Q12014", 
"Q22013", "Q22014", "Q32013", "Q32014", "Q42013"), class = "factor"), 
    daypart = c("afternoon", "primetime", "primetime", "late fringe", 
    "overnight", "primetime"), allpersons.imp = c(0, 0, 6.1, 
    2.3, 0, 11.5), rate = c(30, 50, 51, 130, 0, 325), station = c("WSWB", 
    "COM NASHVILLE", "ESPN SEATTLE, EVERETT ZONE", "Jacksonville WAWS", 
    "WKBD", "WBND"), spot.id = c(13048713L, 11969260L, 11898905L, 
    11617447L, 12571421L, 11741171L)), .Names = c("R.DMA.NAMES", 
"quarter", "daypart", "allpersons.imp", "rate", "station", "spot.id"
), row.names = c(NA, -6L), class = "data.frame")

I am using a ddply function to perform a calculation:

我正在使用ddply函数来执行计算:

ddply(df, .(R.DMA.NAMES, station, quarter), function (x) {
cpi = sum(df$rate) / sum(df$allpersons.imp)
})

This creates a new data.frame which looks like this:

这将创建一个新的data.frame,如下所示:

   R.DMA.NAMES                    station quarter        V1
1                    Detroit                       WKBD  Q22014       NaN
2               Jacksonville          Jacksonville WAWS  Q42013 56.521739
3                  Nashville              COM NASHVILLE  Q12014       Inf
4             Seattle.Tacoma ESPN SEATTLE, EVERETT ZONE  Q12014  8.360656
5         South.Bend.Elkhart                       WBND  Q42013 28.260870
6 Wilkes.Barre.Scranton.Hztn                       WSWB  Q22014       Inf

What I'd like to do is create a new column called "cpi" in my original df i.e. the applicable "cpi" value should appear against the particular row. Of course, the same value will repeat many times i.e. 8.36 will appear for every row which contains "Seattle.Tacoma" for R.DMA.NAMES, "ESPN SEATTLE, EVERETT ZONE" for station and Q12014 for quarter. I tried several things including:

我想要做的是在我的原始df中创建一个名为“cpi”的新列,即适用的“cpi”值应该出现在特定的行上。当然,相同的值将重复多次,即包含R.DMA.NAMES的“Seattle.Tacoma”,车站的“ESPN SEATTLE,EVERETT ZONE”和季度的Q12014的每一行都会出现8.36。我尝试了几件事,包括:

transform(df, cpi = ddply(df, .(R.DMA.NAMES, station, quarter), function (x) {
cpi = sum(df$rate) / sum(df$allpersons.imp)
})

But this didn't work ! Can someone explain . .

但这没用!有人可以解释一下。

1 个解决方案

#1


1  

Use transform within ddply:

在ddply中使用transform:

ddply(df, .(R.DMA.NAMES, station, quarter), 
      transform, cpi = sum(rate) / sum(allpersons.imp))

#1


1  

Use transform within ddply:

在ddply中使用transform:

ddply(df, .(R.DMA.NAMES, station, quarter), 
      transform, cpi = sum(rate) / sum(allpersons.imp))