长到宽格式，有几个重复。 Circumvent与独特的列组合

I have a dataset similar to this (real one is way bigger). It is in long format and I need to change it to wide format with one row per id. My problem is that there are a lot of different combinations of time, drug, unit and admin. Only a combination of time, drug, unit and admin will be unique and should only occur once pr id. I could not find a solution to this. I would like R to create unique combinations of columns so the data can be transformed to wide format. I have tried

我有一个类似于此的数据集(真正的一个更大)。它是长格式的,我需要将其更改为宽格式,每个id一行。我的问题是时间,药物,单位和管理员有很多不同的组合。只有时间,药物,单位和管理员的组合才是唯一的,并且应该只发生一次。我无法找到解决方案。我希望R创建唯一的列组合,以便将数据转换为宽格式。我试过了

melt.data.table(df, id.vars=c(id,time,drug,unit,admin), measure.vars = c(dose), na.rm=F)

melt.data.table(df,id.vars = c(id,time,drug,unit,admin),measure.vars = c(dose),na.rm = F)

and also a combination with

还有一个组合

%>% expand(nesting(time, drug, unit, admin, dose), id)

but it doesn't work. Here is mock data:

但它不起作用。这是模拟数据:

id<-c(1492,1492,1492,1492,1493,1493)
time<-c("Pre-bypass","Post-bypass","Total","Post-bypass","Pre-OP","Pre-OP")
drug<-c("ACE","LEVO","LEVO","MIL","BB","BC")
unit<-c(NA,"ml/hr","ml","mg",NA,NA)
admin<-c(NA, "IV","IV","Inhale",NA,NA)
dose<-c(NA,50,40,5,NA,NA)
df<-rbind(id,time,drug,unit,admin,dose)
df<-t(df)
df<-as.data.table(df)

I would like my output to be something like this (the reason for the TRUE in Pre.bypass.Ace.unitNA.adminNA and Pre.OP columns is that dose and unit is missing here but because it is listed it is given in standard dose and unit:

我希望我的输出是这样的(在Pre.bypass.Ace.unitNA.adminNA和Pre.OP列中TRUE的原因是这里缺少剂量和单位但是因为它被列出它以标准剂量给出和单位:

id.new<-c(1492,1493)
Pre.OP.BB.unitNA.adminNA<-c(NA,TRUE)
Pre.OP.BC.unitNA.adminNA<-c(NA,TRUE)
Total.LEVO.ml.h.IV<-c(40,NA)
Pre.bypass.Ace.unitNA.adminNA<-c(TRUE,NA)
Post.bypass.LEVO.ml.h.IV<-c(50,NA)
Post.bypass.MIL.ml.h.IV<-c(5,NA)
df.new<-rbind(id.new,Post.bypass.MIL.ml.h.IV,Pre.OP.BB.unitNA.adminNA,Pre.OP.BC.unitNA.adminNA,Total.LEVO.ml.h.IV,Pre.bypass.Ace.unitNA.adminNA,Post.bypass.LEVO.ml.h.IV)
df.new<-t(df.new)

2 个解决方案

#1

I agree with the comments that long format is usually the better way to go. If you have to use wide format the using the tidyr package you can do the following:

我同意这些评论,即长格式通常是更好的方式。如果您必须使用宽格式使用tidyr包,您可以执行以下操作:

library(tidyr)
df %>% 
  unite(combination, time, drug, unit, admin) %>% 
  spread(key = combination, value  = dose)

#2

library(data.table)
id <- c(1492, 1492, 1492, 1492, 1493, 1493)
time <- c("Pre-bypass", "Post-bypass", "Total", "Post-bypass", "Pre-OP", "Pre-OP")
drug <- c("ACE", "LEVO", "LEVO", "MIL", "BB", "BC")
unit <- c(NA, "ml/hr", "ml", "mg", NA, NA)
admin <- c(NA, "IV", "IV", "Inhale", NA, NA)
dose <- c(NA, 50, 40, 5, NA, NA)
df <- rbind(id, time, drug, unit, admin, dose)
df <- t(df)
df <- as.data.table(df)
df
#>      id        time drug  unit  admin dose
#> 1: 1492  Pre-bypass  ACE    NA     NA   NA
#> 2: 1492 Post-bypass LEVO ml/hr     IV   50
#> 3: 1492       Total LEVO    ml     IV   40
#> 4: 1492 Post-bypass  MIL    mg Inhale    5
#> 5: 1493      Pre-OP   BB    NA     NA   NA
#> 6: 1493      Pre-OP   BC    NA     NA   NA

Convert to wide with data.table package function dcast

使用data.table包函数dcast转换为wide

data.table::dcast(df, id ~ ..., value.var = "dose")
#>      id Post-bypass_LEVO_ml/hr_IV Post-bypass_MIL_mg_Inhale
#> 1: 1492                        50                         5
#> 2: 1493                        NA                        NA
#>    Pre-OP_BB_NA_NA Pre-OP_BC_NA_NA Pre-bypass_ACE_NA_NA Total_LEVO_ml_IV
#> 1:              NA              NA                   NA               40
#> 2:              NA              NA                   NA               NA

#1