I have a long format dataframe dogs that I'm trying to reformat to wide using the reshape() function. It currently looks like so:
我有一个长格式数据帧狗,我正在尝试使用reshape()函数重新格式化为宽。它目前看起来像这样:
dogid month year trainingtype home school timeincomp
12345 1 2014 1 1 1 340
12345 2 2014 1 1 1 360
31323 12 2015 2 7 3 440
31323 1 2014 1 7 3 500
31323 2 2014 1 7 3 520
The dogid column is a bunch of ids, one for each dog. The month column varies for 1 to 12 for the 12 months, and year from 2014 to 2015. Trainingtype varies for 1 to 2. Each dog has a timeincomp value for every month-year-trainingtype combination, so 48 entries per dog. Home and school vary from 1-8 and are constant per dog (every entry for the same dog has the same school and home). Time in comp is my response variable.
dogid列是一堆id,每只狗一个。 12个月和2014年至2015年的月份栏的变化范围为1至12.训练类型的变化范围为1至2.每只狗每个月 - 训练型组合都有一个timeincomp值,因此每只狗有48个条目。家庭和学校从1-8不等,每只狗不变(同一条狗的每个条目都有相同的学校和家庭)。 comp中的时间是我的响应变量。
I would like my table to look like so:
我希望我的表看起来像这样:
dogid home school month1year2014trainingtype1 month2year2014trainingtype1
12345 1 1 340 360
31323 7 3 500 520
etc. (with columns for each month-year-trainingtype combination)
等(每个月 - 年 - 训练类型组合的列)
What parameters should I use in reshape to achieve this?
我应该在重塑中使用哪些参数来实现这一目标?
3 个解决方案
#1
5
You can use the function dcast
from package reshape2
. It's easier to understand. The left side of the formula is the one that stays long, while the right side is the one that goes wide.
您可以使用包reshape2中的dcast功能。它更容易理解。公式的左侧是长的,而右侧是宽的。
The fun.aggregate is the function to apply in case that there is more than 1 number per case. If you're sure you don't have repeated cases, you can use mean
or sum
fun.aggregate是在每个案例中有超过1个数字的情况下应用的函数。如果您确定没有重复案例,可以使用均值或求和
dcast(data, formula= dogid + home + school ~ month + year + trainingtype,
value.var = 'timeincomp',
fun.aggregate = sum)
I hope it works:
我希望它有效:
dogid home school 1_2014_1 2_2014_1 12_2015_2
1 12345 1 1 340 360 0
2 31323 7 3 500 520 440
#2
4
In this case, using base reshape
, you essentially want an interaction()
of the three time variables to define your wide variables, so:
在这种情况下,使用base reshape,你基本上需要三个时间变量的interaction()来定义你的宽变量,所以:
idvars <- c("dogid","home","school")
grpvars <- c("year","month","trainingtype")
outvar <- "timeincomp"
time <- interaction(dat[grpvars])
reshape(
cbind(dat[c(idvars,outvar)],time),
idvar=idvars,
timevar="time",
direction="wide"
)
# dogid home school timeincomp.2014.1.1 timeincomp.2014.2.1 timeincomp.2015.12.2
#1 12345 1 1 340 360 NA
#3 31323 7 3 500 520 440
#3
3
You can do the same thing using the new replacement for reshape2
, tidyr
:
你可以使用reshape2,tidyr的新替代品做同样的事情:
library(tidyr)
library(dplyr)
data %>% unite(newcol, c(year, month, trainingtype)) %>%
spread(newcol, timeincomp)
dogid home school 2014_1_1 2014_2_1 2015_12_2
1 12345 1 1 340 360 NA
2 31323 7 3 500 520 440
First, we unite the year, month and trainingtype columns into a new column called newcol, then we spread the data with timeincomp as our value variable.
首先,我们将year,month和trainingtype列合并到一个名为newcol的新列中,然后我们将timeincomp作为值变量传播。
The NA is there as we have no value, you can give it one by changing fill = NA
in the spread function.
因为我们没有值,所以你可以通过改变扩展函数中的fill = NA来给它一个。
#1
5
You can use the function dcast
from package reshape2
. It's easier to understand. The left side of the formula is the one that stays long, while the right side is the one that goes wide.
您可以使用包reshape2中的dcast功能。它更容易理解。公式的左侧是长的,而右侧是宽的。
The fun.aggregate is the function to apply in case that there is more than 1 number per case. If you're sure you don't have repeated cases, you can use mean
or sum
fun.aggregate是在每个案例中有超过1个数字的情况下应用的函数。如果您确定没有重复案例,可以使用均值或求和
dcast(data, formula= dogid + home + school ~ month + year + trainingtype,
value.var = 'timeincomp',
fun.aggregate = sum)
I hope it works:
我希望它有效:
dogid home school 1_2014_1 2_2014_1 12_2015_2
1 12345 1 1 340 360 0
2 31323 7 3 500 520 440
#2
4
In this case, using base reshape
, you essentially want an interaction()
of the three time variables to define your wide variables, so:
在这种情况下,使用base reshape,你基本上需要三个时间变量的interaction()来定义你的宽变量,所以:
idvars <- c("dogid","home","school")
grpvars <- c("year","month","trainingtype")
outvar <- "timeincomp"
time <- interaction(dat[grpvars])
reshape(
cbind(dat[c(idvars,outvar)],time),
idvar=idvars,
timevar="time",
direction="wide"
)
# dogid home school timeincomp.2014.1.1 timeincomp.2014.2.1 timeincomp.2015.12.2
#1 12345 1 1 340 360 NA
#3 31323 7 3 500 520 440
#3
3
You can do the same thing using the new replacement for reshape2
, tidyr
:
你可以使用reshape2,tidyr的新替代品做同样的事情:
library(tidyr)
library(dplyr)
data %>% unite(newcol, c(year, month, trainingtype)) %>%
spread(newcol, timeincomp)
dogid home school 2014_1_1 2014_2_1 2015_12_2
1 12345 1 1 340 360 NA
2 31323 7 3 500 520 440
First, we unite the year, month and trainingtype columns into a new column called newcol, then we spread the data with timeincomp as our value variable.
首先,我们将year,month和trainingtype列合并到一个名为newcol的新列中,然后我们将timeincomp作为值变量传播。
The NA is there as we have no value, you can give it one by changing fill = NA
in the spread function.
因为我们没有值,所以你可以通过改变扩展函数中的fill = NA来给它一个。