I am working on a project, where I have a file which is in long format with 131170 objects and 4 variables; not all the values are numeric , and I have been trying to use the dcast function from reshape2 but how ever I try it gives me the error that the Aggregation function missing: defaulting to length. I do not want my data to be changed I simply want to change the format of the file.
我正在开发一个项目,我有一个长格式的文件,包含131170个对象和4个变量;并非所有的值都是数字,我一直在尝试使用reshape2中的dcast函数,但是我怎么试试它给出了聚合函数缺失的错误:默认为长度。我不希望我的数据被更改我只想更改文件的格式。
This is the function I have written
这是我写的功能
W_data1 <- dcast(L_data1, formula = ID + Date ~ Metric, value.var = "Value")
This is an example of how my file looks like.
这是我的文件的样子的一个例子。
ID Date Metric Value
1003 3/5/2001 Age 74
1003 3/5/2001 Age 74
1003 3/5/2001 Age 74
1003 3/5/2001 Age 74
1003 3/5/2001 Sex F
1003 3/5/2001 Sex F
1003 3/5/2001 Sex F
1003 3/5/2001 Sex F
1003 3/5/2001 Dx MM
1003 3/5/2001 Dx MM
1003 3/5/2001 Dx MM
1003 3/5/2001 Dx MM
1003 3/5/2001 ISS.Stage 1
The wide format should look like this:
宽格式应如下所示:
ID Age Sex Dx Date ISS Stage Heavy Chain Isotype
1003 74 F MM 3/5/2001 1 IgA
1003 74 F MM 3/5/2001 1 IgA
1003 74 F MM 3/5/2001 1 IgA
1003 74 F MM 3/5/2001 1 IgA
1004 79 F MM 1/1/1997 Unknown N/A
there are multiple data for each ID some may have 4 sets of data and others just one. the reason why the ID's are repeating is because the same variables have different values on different dates for the same ID.
每个ID有多个数据,有些可能有4组数据,有些只有一组。 ID重复的原因是因为相同的变量在同一ID的不同日期具有不同的值。
3 个解决方案
#1
1
There are duplicated values in the combination of your LHS and RHS variables. You need to add an indicator variable to distinguish between the unique values if you don't want dcast
to resort to length
.
LHS和RHS变量的组合中存在重复值。如果您不希望dcast求助于长度,则需要添加指示符变量以区分唯一值。
Try:
尝试:
L_data1$ind <- ave(1:nrow(L_data1), L_data1[1:3], FUN = seq_along)
dcast(L_data1, ID + Date ~ Metric + ind, value.var = "Value")
# ID Date Age_1 Age_2 Age_3 Age_4 Dx_1 Dx_2 Dx_3 Dx_4 ISS.Stage_1
# 1 1003 3/5/2001 74 74 74 74 MM MM MM MM 1
# Sex_1 Sex_2 Sex_3 Sex_4
# 1 F F F F
#2
0
You are constructing a situation where there are more than one item to stick into one place, so dcast is asking how to assemble them. If you wanted just the first one then build a function that does that as the "aggregation function".:
你正在构建一种情况,其中有多个项目可以粘在一个地方,所以dcast正在询问如何组装它们。如果您只想要第一个,那么构建一个函数,将其作为“聚合函数”:
W_data1 <- dcast(L_data1, formula = ID + Date ~ Metric,
fun.aggregate=function(x){ as.character(x)[1] },
value.var = "Value")
W_data1
#---------------------
ID Date Age Dx ISS.Stage Sex
1 1003 3/5/2001 74 MM 1 F
#3
0
You could also try with reshape
您也可以尝试重塑
L_data1$Metric <- with(L_data1,
paste0(Metric,ave(seq_along(Metric), ID, Date, Metric, FUN = seq_along)))
res <- reshape(L_data1, timevar="Metric", idvar=c("ID", "Date"), direction="wide")
colnames(res) <- gsub("^[[:alpha:]]+\\.","",colnames(res))
res
# ID Date Age1 Age2 Age3 Age4 Sex1 Sex2 Sex3 Sex4 Dx1 Dx2 Dx3 Dx4
#1 1003 3/5/2001 74 74 74 74 F F F F MM MM MM MM
# ISS.Stage1
#1 1
#1
1
There are duplicated values in the combination of your LHS and RHS variables. You need to add an indicator variable to distinguish between the unique values if you don't want dcast
to resort to length
.
LHS和RHS变量的组合中存在重复值。如果您不希望dcast求助于长度,则需要添加指示符变量以区分唯一值。
Try:
尝试:
L_data1$ind <- ave(1:nrow(L_data1), L_data1[1:3], FUN = seq_along)
dcast(L_data1, ID + Date ~ Metric + ind, value.var = "Value")
# ID Date Age_1 Age_2 Age_3 Age_4 Dx_1 Dx_2 Dx_3 Dx_4 ISS.Stage_1
# 1 1003 3/5/2001 74 74 74 74 MM MM MM MM 1
# Sex_1 Sex_2 Sex_3 Sex_4
# 1 F F F F
#2
0
You are constructing a situation where there are more than one item to stick into one place, so dcast is asking how to assemble them. If you wanted just the first one then build a function that does that as the "aggregation function".:
你正在构建一种情况,其中有多个项目可以粘在一个地方,所以dcast正在询问如何组装它们。如果您只想要第一个,那么构建一个函数,将其作为“聚合函数”:
W_data1 <- dcast(L_data1, formula = ID + Date ~ Metric,
fun.aggregate=function(x){ as.character(x)[1] },
value.var = "Value")
W_data1
#---------------------
ID Date Age Dx ISS.Stage Sex
1 1003 3/5/2001 74 MM 1 F
#3
0
You could also try with reshape
您也可以尝试重塑
L_data1$Metric <- with(L_data1,
paste0(Metric,ave(seq_along(Metric), ID, Date, Metric, FUN = seq_along)))
res <- reshape(L_data1, timevar="Metric", idvar=c("ID", "Date"), direction="wide")
colnames(res) <- gsub("^[[:alpha:]]+\\.","",colnames(res))
res
# ID Date Age1 Age2 Age3 Age4 Sex1 Sex2 Sex3 Sex4 Dx1 Dx2 Dx3 Dx4
#1 1003 3/5/2001 74 74 74 74 F F F F MM MM MM MM
# ISS.Stage1
#1 1