如何重塑数据。表(长到宽)不做求和或平均值之类的函数?

时间:2021-07-07 20:09:22

How can I reshape a data.table (long into wide) without doing a function like sum or mean? I was looking at dcast/melt/reshape/etc. But I don't get the desired results.

如何重塑数据。表(长到宽)不做算术或平均值?我在看dcast/melt/整形等等。但是我没有得到理想的结果。

This is my data:

这是我的数据:

DT <- data.table(id = c("1","1","2","3"), score = c("5", "4", "5", "6"))

Original format:

原来的格式:

> DT
id score
1  5 
1  4 
2  5 
3  6 

Desired format:

需要的格式:

id score1 score2
1  5      4
2  5      NA
3  6      NA 

I now do the trick with:

我现在用:

DT <- DT[, list(list(score)), by=id]

But then the contents of the first cell is like:

但是第一个单元格的内容是:

c("5", "4")

And I need to split it (I use the package splitstackshape):

我需要拆分它(我使用splitstackshape):

DT <- cSplit(DT, "V1", ",")

This is probably not the most efficient method... What is a better way?

这可能不是最有效的方法……什么是更好的方法?

1 个解决方案

#1


4  

You can use getanID to create a unique .id for the grouping variable id. Then, try with dcast.data.table (or simply dcast from versions 1.9.5 and beyond) and if needed change the column names using setnames

您可以使用getanID为分组变量id创建一个惟一的.id。表(或简单的来自版本1.9.5或更高版本的dcast),如果需要,使用setname更改列名

 library(splitstackshape)
 res <- dcast(getanID(DT, 'id'), id~.id,value.var='score')
 setnames(res, 2:3, paste0('score', 1:2))[]
 #    id score1 score2
 #1:  1      5      4
 #2:  2      5     NA
 #3:  3      6     NA

Or using only data.table

或者只使用data.table

 dcast(DT[, .id:=paste0('score', 1:.N), by=id],
       id~.id, value.var='score')
 #   id score1 score2
 #1:  1      5      4
 #2:  2      5     NA
 #3:  3      6     NA

Or from the code you were using (less number of characters)

或者从您正在使用的代码中(减少字符数)

cSplit(DT[, toString(score), by=id], 'V1', ',')
#   id V1_1 V1_2
#1:  1    5    4
#2:  2    5   NA
#3:  3    6   NA

#1


4  

You can use getanID to create a unique .id for the grouping variable id. Then, try with dcast.data.table (or simply dcast from versions 1.9.5 and beyond) and if needed change the column names using setnames

您可以使用getanID为分组变量id创建一个惟一的.id。表(或简单的来自版本1.9.5或更高版本的dcast),如果需要,使用setname更改列名

 library(splitstackshape)
 res <- dcast(getanID(DT, 'id'), id~.id,value.var='score')
 setnames(res, 2:3, paste0('score', 1:2))[]
 #    id score1 score2
 #1:  1      5      4
 #2:  2      5     NA
 #3:  3      6     NA

Or using only data.table

或者只使用data.table

 dcast(DT[, .id:=paste0('score', 1:.N), by=id],
       id~.id, value.var='score')
 #   id score1 score2
 #1:  1      5      4
 #2:  2      5     NA
 #3:  3      6     NA

Or from the code you were using (less number of characters)

或者从您正在使用的代码中(减少字符数)

cSplit(DT[, toString(score), by=id], 'V1', ',')
#   id V1_1 V1_2
#1:  1    5    4
#2:  2    5   NA
#3:  3    6   NA