I have a dataset that is structured as following:
我有一个数据集,其结构如下:
data <- data.table(ID=1:10,Tenure=c(2,3,4,2,1,1,3,4,5,2),Var=rnorm(10))
数据< - data.table(ID = 1:10,任期= c(2、3、4、2、1,1,3,4,5,2),Var = rnorm(10))
ID Tenure Var
1: 1 2 -0.72892371
2: 2 3 -1.73534591
3: 3 4 0.47007030
4: 4 2 1.33173044
5: 5 1 -0.07900914
6: 6 1 0.63493316
7: 7 3 -0.62710577
8: 8 4 -1.69238758
9: 9 5 -0.85709328
10: 10 2 0.10716830
I need to replicate each row N=Tenure
times. e.g. I need to replicate the first row 2 times (since Tenure = 2
.
我需要复制每一行N=保留率。我需要重复第一行2次(因为保留率= 2)。
I need my transformed dataset to look like the following:
我需要转换后的数据集如下所示:
setkey(data,ID)
print(data[,.(ID=rep(ID,Tenure))][data][, Indx := 1:.N, by=ID])
ID Tenure Var Indx
1: 1 2 -0.7289237 1
2: 1 2 -0.7289237 2
3: 2 3 -1.7353459 1
4: 2 3 -1.7353459 2
5: 2 3 -1.7353459 3
6: 3 4 0.4700703 1
...
...
Is there a more efficient way (a more data.table
way) to do this? My way is pretty slow. I was thinking there should be a way to do this using a by-without-by
merge usng .EACHI
?
是否有更有效的方法(更多的数据)。桌子的方式)做这?我的路相当慢。我在想,应该有一种方法可以通过合并usng。eachi来实现这一点。
2 个解决方案
#1
16
I don't think using a key/merge is helpful here. Just expand by passing a vector of row indices:
我认为使用键/合并在这里没有帮助。通过传递一个行索引向量来展开:
DT <- data[rep(1:.N,Tenure)][,Indx:=1:.N,by=ID]
#2
3
You could try:
你可以试试:
library(splitstackshape)
expandRows(data, "Tenure", drop = FALSE)[,Indx:=1:.N,by=ID][]
Or
或
library(dplyr)
library(splitstackshape)
expandRows(data, "Tenure", drop = FALSE) %>%
group_by(ID) %>%
mutate(Indx = row_number(Tenure))
Which gives:
这使:
ID Tenure Var Indx
1: 1 2 -0.8808717 1
2: 1 2 -0.8808717 2
3: 2 3 0.5962590 1
4: 2 3 0.5962590 2
5: 2 3 0.5962590 3
6: 3 4 0.1197176 1
7: 3 4 0.1197176 2
8: 3 4 0.1197176 3
9: 3 4 0.1197176 4
10: 4 2 -0.2821739 1
#1
16
I don't think using a key/merge is helpful here. Just expand by passing a vector of row indices:
我认为使用键/合并在这里没有帮助。通过传递一个行索引向量来展开:
DT <- data[rep(1:.N,Tenure)][,Indx:=1:.N,by=ID]
#2
3
You could try:
你可以试试:
library(splitstackshape)
expandRows(data, "Tenure", drop = FALSE)[,Indx:=1:.N,by=ID][]
Or
或
library(dplyr)
library(splitstackshape)
expandRows(data, "Tenure", drop = FALSE) %>%
group_by(ID) %>%
mutate(Indx = row_number(Tenure))
Which gives:
这使:
ID Tenure Var Indx
1: 1 2 -0.8808717 1
2: 1 2 -0.8808717 2
3: 2 3 0.5962590 1
4: 2 3 0.5962590 2
5: 2 3 0.5962590 3
6: 3 4 0.1197176 1
7: 3 4 0.1197176 2
8: 3 4 0.1197176 3
9: 3 4 0.1197176 4
10: 4 2 -0.2821739 1