用相应的组填充所有行的平均值(ddply?)

时间:2022-04-11 09:16:15

Probably a silly question about a simple task for ddply, but strangely enough I could not find the solution. So, let's say I have a dataframe, containing respondents within countries, and a number of jobs that the respondent has held in his or her career:

关于ddply的简单任务可能是一个愚蠢的问题,但奇怪的是我无法找到解决方案。所以,假设我有一个数据框,包含国内的受访者,以及受访者在其职业生涯中所担任的一些工作:

mydata <- structure(list(country = structure(c(11L, 6L, 7L, 12L, 12L, 3L, 
7L, 10L, 6L, 4L, 5L, 12L, 3L, 1L, 4L, 13L, 2L, 4L, 7L, 3L), contrasts = structure(c(1, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 1, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, -1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 
0, 0, 0, -1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 
0, 0, 0, 1, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 
0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, -1, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 1, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 
0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, -1), .Dim = c(13L, 
12L), .Dimnames = list(c("Austria", "Germany", "Sweden", "Netherlands", 
"Spain", "Italy", "France", "Denmark", "Greece", "Switzerland", 
"Belgium", "Czechia", "Poland"), c("AT", "DE", "SE", "NL", "ES", 
"IT", "FR", "DK", "GR", "CH", "BE", "CZ"))), .Label = c("Austria", 
"Germany", "Sweden", "Netherlands", "Spain", "Italy", "France", 
"Denmark", "Greece", "Switzerland", "Belgium", "Czechia", "Poland"
), class = "factor"), njobs = c(2, 2, 3, 2, 1, 2, 4, 2, 1, 3, 
2, 3, 3, 2, 8, 3, 1, 2, 9, 3)), .Names = c("country", "njobs"
), class = "data.frame", row.names = c(NA, -20L))

I would like to add a third column variable, containing an average number of jobs in a career in that particular country. This is easy to do in two lines:

我想添加第三个列变量,其中包含该特定国家/地区职业生涯中的平均工作数。这很容易分为两行:

ctry.means <- ddply(mydata,.(country),summarize,avejobs=mean(njobs))
result <- merge(mydata,ctry.means,by="country")

However, this is such a simple and frequently used operation, that I feel there must be a simpler way to do it in one step, some trick with ddply. In a more general case this relates to combining group-level and case-level variables in a single summarize or mutate statement.

然而,这是一个如此简单和经常使用的操作,我觉得必须有一个更简单的方法来一步完成它,一些技巧与ddply。在更一般的情况下,这涉及在单个汇总或变异语句中组合组级和案例级变量。

1 个解决方案

#1


1  

if you're happy with a simple base solution,

如果您对简单的基础解决方案感到满意,

mydata$new = ave(mydata$njobs, mydata$country)

will do that too.

也会这样做。

#1


1  

if you're happy with a simple base solution,

如果您对简单的基础解决方案感到满意,

mydata$new = ave(mydata$njobs, mydata$country)

will do that too.

也会这样做。