如何根据r中的另一个键/ID列将列转换为行?

时间:2022-06-18 23:11:17

I have a the following dataset :

我有以下数据集:

> head(Dataset)

    IDCLIENT NUMDOSS PROV_200112 PROV_200212 PROV_200312  ....

      H44879 0104957           0           0           0  ....
     Z353025 0105289           0           0           0  ....
     B416771 0105309           0           0           0  ....  
       B5043 0105366           0           0           0  ....
     A725471 0105370           0           0           0  ....
    BE406627 0105371           0           0           0  ....

This a table of 125 columns and 250 000 obs

这是一个由125列和25万obs组成的表格

We have a table with two ID columns ID1, ID2 and the other columns register the values of a variables called Prov in a certain period of time from 2003 to 2017.

我们有一个表,其中有两个ID列ID1、ID2和其他列,它们在2003年至2017年的某一段时间内记录一个名为Prov的变量的值。

What I am trying to do is to transform this table so that I will have only 4 columns ( ID1, ID2, Date(or period) and Prov ) instead of 125 ones. Something like this :

我要做的是转换这个表,这样我将只有4列(ID1、ID2、Date(或period)和Prov),而不是125列。是这样的:

>head(DF)
    IDClient      IDDossier     Date             Prov     
    B416771        0104957     PROV_200110         5    
    B416771        0104957     PROV_200111         0          
    B416771        0104957     PROV_200112        99  
    B416771        0104957     PROV_200212        1,23          
    H44879         0105289     PROV_200212       36,1
    Z353025        0105309     PROV_200312      10436,175
    BE4410         0105366     PROV_200406      10438,9    

to do so I tried the following code, but it's really time consuming and I also got some errors/ warnings message from R:

为此,我尝试了下面的代码,但这真的很耗时,而且我还收到了一些来自R的错误/警告消息:

# N= dim(ProvTab)[1]*dim(ProvTab)[2]
# DF=data.frame(IDClient=rep("",N),IDDossier=rep("",N),
          # Date=rep("",N), Prov=rep("",N), stringsAsFactors = F)
DF=NULL
periodd=as.data.frame(colnames(ProvTab))
start.time= Sys.time() # to count running time !
for (k in 3:ncol(ProvTab)) 
     {
  for (j in 1:nrow(ProvTab))
       {
  DF=rbind(DF,
           data.frame(IDClient=ProvTab[j,1], IDDossier=ProvTab[j,2],
                         Date = periodd[k,1] , Prov=ProvTab[j,k]))
       }
     }

end.time= Sys.time()
end.time- start.time

I really tried to look for other solutions but failed.

我真的试图寻找其他的解决方法,但失败了。

PS: R or SQL codes are welcomed !

欢迎输入R或SQL代码!

Any suggestions ?

有什么建议吗?

1 个解决方案

#1


2  

I think this is a pretty simple wide to long transformation. The reshape2 package is great for this.

我认为这是一个相当简单的宽变换到长变换。reshape2包很适合这个。

require(reshape2)

DF <- melt(Dataset, id.vars= c("IDCLIENT", "NUMDOSS"))

names(DF) <- c("IDClient", "IDDossier", "Date", "Prov")

If that's slow, you can try with data.table instead which is probably faster.

如果很慢,你可以尝试数据。而表格可能更快。

require(data.table) 
setDT(Dataset)

DF <- melt(Dataset, id.vars= c("IDCLIENT", "NUMDOSS"))

#1


2  

I think this is a pretty simple wide to long transformation. The reshape2 package is great for this.

我认为这是一个相当简单的宽变换到长变换。reshape2包很适合这个。

require(reshape2)

DF <- melt(Dataset, id.vars= c("IDCLIENT", "NUMDOSS"))

names(DF) <- c("IDClient", "IDDossier", "Date", "Prov")

If that's slow, you can try with data.table instead which is probably faster.

如果很慢,你可以尝试数据。而表格可能更快。

require(data.table) 
setDT(Dataset)

DF <- melt(Dataset, id.vars= c("IDCLIENT", "NUMDOSS"))