重塑长结构化数据。使用数据表进入一个广泛的结构。表的功能?

时间:2022-05-03 20:12:05
> library(data.table)
> A <- data.table(x = c(1,1,2,2), y = c(1,2,1,2), v = c(0.1,0.2,0.3,0.4))
> A
   x y   v
1: 1 1 0.1
2: 1 2 0.2
3: 2 1 0.3
4: 2 2 0.4
> B <- dcast(A, x~y)
Using v as value column: use value.var to override.
> B
  x   1   2
1 1 0.1 0.2
2 2 0.3 0.4

Apparently I can reshape a data.table from long to wide using f.x. dcast of package reshape2. But data.table comes along with an overloaded bracket-operator offering parameters like 'by' and 'group', which make me wonder if it is possible to achieve it using this (to data.table specific functionality)?

显然我可以重塑数据。表从长到宽采用f.x. dcast的包装改造2。但数据。table附带了一个重载的bracket-operator,它提供了“by”和“group”等参数,这让我怀疑是否有可能使用这个(to data)来实现它。表特定功能)?

Just one random example from the manual:

从手册中随便举一个例子:

DT[,lapply(.SD,sum),by=x]

That looks awesome - but I don't fully understand the usage yet.

这看起来很棒——但我还不完全理解它的用法。

I neither found a way nor an example for this so maybe it is just not possible maybe it isn't even supposed to be - so, a definite "no, is not possible because ..." is then of course also a valid answer.

我找不到一个方法或者例子,所以也许这是不可能的,也许它甚至不应该是。所以,一个明确的“不,不可能的,因为…”当然也是一个有效的答案。

3 个解决方案

#1


15  

I'll pick an example with unequal groups so that it's easier to illustrate for the general case:

我将选取一个不平等群体的例子,以便更容易地说明一般情况:

A <- data.table(x=c(1,1,1,2,2), y=c(1,2,3,1,2), v=(1:5)/5)
> A
   x y   v
1: 1 1 0.2
2: 1 2 0.4
3: 1 3 0.6
4: 2 1 0.8
5: 2 2 1.0

The first step is to get the number of elements/entries for each group of "x" to be the same. Here, for x=1 there are 3 values of y, but only 2 for x=2. So, we'll have to fix that first with NA for x=2, y=3.

第一步是使每组“x”的元素/条目的数量相同。这里,对于x=1,有3个y值,但x=2时只有2个值。首先我们要用NA表示x=2 y=3。

setkey(A, x, y)
A[CJ(unique(x), unique(y))]

Now, to get it to wide format, we should group by "x" and use as.list on v as follows:

现在,为了使它具有更广泛的格式,我们应该按“x”进行分组并使用as。v的列表如下:

out <- A[CJ(unique(x), unique(y))][, as.list(v), by=x]
   x  V1  V2  V3
1: 1 0.2 0.4 0.6
2: 2 0.8 1.0  NA

Now, you can set the names of the reshaped columns using reference with setnames as follows:

现在,您可以使用引用的setnames来设置reshape列的名称:

setnames(out, c("x", as.character(unique(A$y)))

   x   1   2   3
1: 1 0.2 0.4 0.6
2: 2 0.8 1.0  NA

#2


10  

Use dcast() (now a default data.table method, from version 1.9.5; earlier versions use dcast.data.table) as in

使用dcast()(现在是默认数据。表方法,来自版本1.9.5;早期版本使用的是dcast.data.table)

> dcast(A,x~y)
Using 'v' as value column. Use 'value.var' to override
   x   1   2   3
1: 1 0.2 0.4 0.6
2: 2 0.8 1.0  NA

This is fast and obviates the need to setnames().

这是快速的,并排除了setnames()的需要。

It is also especially helpful when y in the above example is a factor variable with character levels -- e.g. 'Low', 'Medium', 'High' -- because CJ() may not return the wide data with variables in the order that setnames() expects, and you can end up with your data mislabeled badly.

当上面例子中的y是具有字符级别的因子变量时,它也特别有用。'Low'、'Medium'、'High'——因为CJ()可能不会以setnames()所期望的顺序返回具有变量的大数据,并且您可能会以错误标记的数据结束。

#3


2  

(with credits to Arun)

阿伦(学分)

A[, setattr(as.list(v), 'names', y), by=x]

#1


15  

I'll pick an example with unequal groups so that it's easier to illustrate for the general case:

我将选取一个不平等群体的例子,以便更容易地说明一般情况:

A <- data.table(x=c(1,1,1,2,2), y=c(1,2,3,1,2), v=(1:5)/5)
> A
   x y   v
1: 1 1 0.2
2: 1 2 0.4
3: 1 3 0.6
4: 2 1 0.8
5: 2 2 1.0

The first step is to get the number of elements/entries for each group of "x" to be the same. Here, for x=1 there are 3 values of y, but only 2 for x=2. So, we'll have to fix that first with NA for x=2, y=3.

第一步是使每组“x”的元素/条目的数量相同。这里,对于x=1,有3个y值,但x=2时只有2个值。首先我们要用NA表示x=2 y=3。

setkey(A, x, y)
A[CJ(unique(x), unique(y))]

Now, to get it to wide format, we should group by "x" and use as.list on v as follows:

现在,为了使它具有更广泛的格式,我们应该按“x”进行分组并使用as。v的列表如下:

out <- A[CJ(unique(x), unique(y))][, as.list(v), by=x]
   x  V1  V2  V3
1: 1 0.2 0.4 0.6
2: 2 0.8 1.0  NA

Now, you can set the names of the reshaped columns using reference with setnames as follows:

现在,您可以使用引用的setnames来设置reshape列的名称:

setnames(out, c("x", as.character(unique(A$y)))

   x   1   2   3
1: 1 0.2 0.4 0.6
2: 2 0.8 1.0  NA

#2


10  

Use dcast() (now a default data.table method, from version 1.9.5; earlier versions use dcast.data.table) as in

使用dcast()(现在是默认数据。表方法,来自版本1.9.5;早期版本使用的是dcast.data.table)

> dcast(A,x~y)
Using 'v' as value column. Use 'value.var' to override
   x   1   2   3
1: 1 0.2 0.4 0.6
2: 2 0.8 1.0  NA

This is fast and obviates the need to setnames().

这是快速的,并排除了setnames()的需要。

It is also especially helpful when y in the above example is a factor variable with character levels -- e.g. 'Low', 'Medium', 'High' -- because CJ() may not return the wide data with variables in the order that setnames() expects, and you can end up with your data mislabeled badly.

当上面例子中的y是具有字符级别的因子变量时,它也特别有用。'Low'、'Medium'、'High'——因为CJ()可能不会以setnames()所期望的顺序返回具有变量的大数据,并且您可能会以错误标记的数据结束。

#3


2  

(with credits to Arun)

阿伦(学分)

A[, setattr(as.list(v), 'names', y), by=x]