I am very new to R and this is my first stack overflow question so I expect this may be a little rough. I have a data frame (from a .csv) in the following structure:
我是R的新手,这是我的第一个堆栈溢出问题所以我希望这可能有点粗糙。我在以下结构中有一个数据框(来自.csv):
FeatureName Uuid Count
ClickHeadline ABC1 17
ChangeSetting ABC1 3
ClickHeadline CBA2 5
ChangeSetting CBA2 7
SomethingElse CBA2 5
I am trying to figure out how to make a new data frame in which the unique values of FeatureName, the factors ClickHeadline, ChangeSetting, SomethingElse are now variables summing over the Count for each Uuid. So the new data frame I want would be:
我试图弄清楚如何创建一个新的数据框,其中FeatureName的唯一值,因子ClickHeadline,ChangeSetting,SomethingElse现在是每个Uuid的Count的变量。所以我想要的新数据框将是:
Uuid ClickHeadline ChangeSetting SomethingElse
ABC1 17 3 0
CBA2 5 7 5
I feel like I should be able to do this over the aggregate function, but I can't figure out how to tell it to look sum over the counts by a variable. I know I'm in way over my head but can anybody help me figure this out?
我觉得我应该能够在聚合函数上做到这一点,但是我无法弄清楚如何通过变量来判断它的总和。我知道我已经超越了我的头脑,但有人可以帮我解决这个问题吗?
1 个解决方案
#1
1
There are many possibilities
有很多种可能性
If you require a sum
you could also use the reshape2
package dcast
function
如果您需要总和,您还可以使用reshape2包dcast功能
df <- read.table(header=T, text='
FeatureName Uuid Count
ClickHeadline ABC1 17
ChangeSetting ABC1 3
ClickHeadline CBA2 5
ChangeSetting CBA2 7
SomethingElse CBA2 5
')
library(reshape2)
dcast(df, Uuid ~ FeatureName, value.var="Count", sum)
Uuid ChangeSetting ClickHeadline SomethingElse
1 ABC1 3 17 0
2 CBA2 7 5 5
If you dataset is limited to the scope you provided you just can use the base reshape
function
如果数据集仅限于您提供的范围,则可以使用基本重塑功能
out <- reshape(df, idvar="Uuid", timevar="FeatureName", v.names="Count", direction="wide")
out[is.na(out)] = 0
out
Uuid Count.ClickHeadline Count.ChangeSetting Count.SomethingElse
1 ABC1 17 3 0
3 CBA2 5 7 5
Another base R alternative is xtabs
without need for removing NA
另一个基本R替代品是xtabs,无需移除NA
xtabs(Count ~ Uuid+FeatureName, df)
FeatureName
Uuid ChangeSetting ClickHeadline SomethingElse
ABC1 3 17 0
CBA2 7 5 5
tidyr
package solution with spread
tidyr包装解决方案与传播
library(tidyr)
spread(df, key=FeatureName, value=Count, fill=0)
Uuid ChangeSetting ClickHeadline SomethingElse
1 ABC1 3 17 0
2 CBA2 7 5 5
#1
1
There are many possibilities
有很多种可能性
If you require a sum
you could also use the reshape2
package dcast
function
如果您需要总和,您还可以使用reshape2包dcast功能
df <- read.table(header=T, text='
FeatureName Uuid Count
ClickHeadline ABC1 17
ChangeSetting ABC1 3
ClickHeadline CBA2 5
ChangeSetting CBA2 7
SomethingElse CBA2 5
')
library(reshape2)
dcast(df, Uuid ~ FeatureName, value.var="Count", sum)
Uuid ChangeSetting ClickHeadline SomethingElse
1 ABC1 3 17 0
2 CBA2 7 5 5
If you dataset is limited to the scope you provided you just can use the base reshape
function
如果数据集仅限于您提供的范围,则可以使用基本重塑功能
out <- reshape(df, idvar="Uuid", timevar="FeatureName", v.names="Count", direction="wide")
out[is.na(out)] = 0
out
Uuid Count.ClickHeadline Count.ChangeSetting Count.SomethingElse
1 ABC1 17 3 0
3 CBA2 5 7 5
Another base R alternative is xtabs
without need for removing NA
另一个基本R替代品是xtabs,无需移除NA
xtabs(Count ~ Uuid+FeatureName, df)
FeatureName
Uuid ChangeSetting ClickHeadline SomethingElse
ABC1 3 17 0
CBA2 7 5 5
tidyr
package solution with spread
tidyr包装解决方案与传播
library(tidyr)
spread(df, key=FeatureName, value=Count, fill=0)
Uuid ChangeSetting ClickHeadline SomethingElse
1 ABC1 3 17 0
2 CBA2 7 5 5