具有二进制输出的r中的枢轴表[重复]

时间:2022-02-24 20:10:59

This question already has an answer here:

这个问题已经有了答案:

I have the following dataset

我有以下数据集

#datset

id  attributes  value
1   a,b,c        1
2   c,d          0
3   b,e          1

I wish to make a pivot table out of them and assign binary values to the attribute (1 to the attributes if they exist otherwise assign 0 to them). My ideal output will be the following:

我希望用它们创建一个pivot表,并为属性分配二进制值(如果属性存在,则为1)。我的理想输出如下:

#output

id  a   b   c   d   e   Value
1   1   1   1   0   0   1
2   0   0   1   1   0   0
3   0   1   0   0   1   1

Any tip is really appreciated.

任何提示都非常感谢。

2 个解决方案

#1


1  

We split the 'attributes' column by ',', get the frequency with mtabulate from qdapTools and cbind with the first and third column.

我们将“属性”列拆分为“,”,从qdapTools中获取mtabulate的频率,并使用第一和第三列的cbind。

library(qdapTools)
cbind(df1[1], mtabulate(strsplit(df1$attributes, ",")), df1[3])
#  id a b c d e value
#1  1 1 1 1 0 0     1
#2  2 0 0 1 1 0     0
#3  3 0 1 0 0 1     1

#2


1  

With base R:

于基本的R:

attributes <- sort(unique(unlist(strsplit(as.character(df$attributes), split=','))))
cols <- as.data.frame(matrix(rep(0, nrow(df)*length(attributes)), ncol=length(attributes)))
names(cols) <- attributes
df <- cbind.data.frame(df, cols)
df <- as.data.frame(t(apply(df, 1, function(x){attributes <- strsplit(x['attributes'], split=','); x[unlist(attributes)] <- 1;x})))[c('id', attributes, 'value')]
df
  id a b c d e value
1  1 1 1 1 0 0     1
2  2 0 0 1 1 0     0
3  3 0 1 0 0 1     1

#1


1  

We split the 'attributes' column by ',', get the frequency with mtabulate from qdapTools and cbind with the first and third column.

我们将“属性”列拆分为“,”,从qdapTools中获取mtabulate的频率,并使用第一和第三列的cbind。

library(qdapTools)
cbind(df1[1], mtabulate(strsplit(df1$attributes, ",")), df1[3])
#  id a b c d e value
#1  1 1 1 1 0 0     1
#2  2 0 0 1 1 0     0
#3  3 0 1 0 0 1     1

#2


1  

With base R:

于基本的R:

attributes <- sort(unique(unlist(strsplit(as.character(df$attributes), split=','))))
cols <- as.data.frame(matrix(rep(0, nrow(df)*length(attributes)), ncol=length(attributes)))
names(cols) <- attributes
df <- cbind.data.frame(df, cols)
df <- as.data.frame(t(apply(df, 1, function(x){attributes <- strsplit(x['attributes'], split=','); x[unlist(attributes)] <- 1;x})))[c('id', attributes, 'value')]
df
  id a b c d e value
1  1 1 1 1 0 0     1
2  2 0 0 1 1 0     0
3  3 0 1 0 0 1     1