按因子类别获取数据框中的最后一个值

时间:2022-12-16 22:51:52

I have a data frame like this:

我有这样的数据框:

a=c("A","A","A","A","B","B","C","C","C","D","D")
b=c(1,2,3,4,1,2,1,2,3,1,2)
c=c(1345,645,75,8,95,678,598,95,75,4,53)
mydf <- data.frame(a,b,c) # edit note: do _not_ use cbind inside data.frame

My aim is to get add an extra column on the new data frame which will take the last value of column "c" taking into account the factor in column "a". More specifically, in this examlpe the end result is like this:

我的目标是在新数据框上添加一个额外的列,该列将采用“c”列的最后一个值,并考虑“a”列中的因子。更具体地说,在这个考试中,最终结果是这样的:

   a b    c   d
1  A 1 1345   0
2  A 2  645   0
3  A 3   75   0
4  A 4    8   8
5  B 1   95   0
6  B 2  678 678
7  C 1  598   0
8  C 2   95   0
9  C 3   75  75
10 D 1    4   0
11 D 2   53  53

2 个解决方案

#1


2  

If you don't need your variables to be all fators, there's a nice solution with dplyr:

如果你不需要你的变量都是fators,那么dplyr就是一个很好的解决方案:

df <- data.frame(a = c("A","A","A","A","B","B","C","C","C","D","D"),
                 b=c(1,2,3,4,1,2,1,2,3,1,2),
                 c=c(1345,645,75,8,95,678,598,95,75,4,53),stringsAsFactors = F)    

library(dplyr)

df <- tbl_df(df)

df  %>% group_by(a)%>%
  mutate(d = ifelse(b == max(b),c[which(b == max(b))],0))



# A tibble: 11 x 4
# Groups:   a [4]
       a     b     c     d
   <chr> <dbl> <dbl> <dbl>
 1     A     1  1345     0
 2     A     2   645     0
 3     A     3    75     0
 4     A     4     8     8
 5     B     1    95     0
 6     B     2   678   678
 7     C     1   598     0
 8     C     2    95     0
 9     C     3    75    75
10     D     1     4     0
11     D     2    53    53

#2


0  

Using data.table:

 library(data.table) 
 df <- data.frame(a,b,c)    
 setDT(df)
 df[, idx := .N, by = a]
 df[, id := 1:.N, by = a]
 df <- df[id == idx, d := c]
 df[, c("id", "idx") := NULL]
 df[is.na(df)] <- 0

    a b    c   d
 1: A 1 1345   0
 2: A 2  645   0
 3: A 3   75   0
 4: A 4    8   8
 5: B 1   95   0
 6: B 2  678 678
 7: C 1  598   0
 8: C 2   95   0
 9: C 3   75  75
10: D 1    4   0
11: D 2   53  53

#1


2  

If you don't need your variables to be all fators, there's a nice solution with dplyr:

如果你不需要你的变量都是fators,那么dplyr就是一个很好的解决方案:

df <- data.frame(a = c("A","A","A","A","B","B","C","C","C","D","D"),
                 b=c(1,2,3,4,1,2,1,2,3,1,2),
                 c=c(1345,645,75,8,95,678,598,95,75,4,53),stringsAsFactors = F)    

library(dplyr)

df <- tbl_df(df)

df  %>% group_by(a)%>%
  mutate(d = ifelse(b == max(b),c[which(b == max(b))],0))



# A tibble: 11 x 4
# Groups:   a [4]
       a     b     c     d
   <chr> <dbl> <dbl> <dbl>
 1     A     1  1345     0
 2     A     2   645     0
 3     A     3    75     0
 4     A     4     8     8
 5     B     1    95     0
 6     B     2   678   678
 7     C     1   598     0
 8     C     2    95     0
 9     C     3    75    75
10     D     1     4     0
11     D     2    53    53

#2


0  

Using data.table:

 library(data.table) 
 df <- data.frame(a,b,c)    
 setDT(df)
 df[, idx := .N, by = a]
 df[, id := 1:.N, by = a]
 df <- df[id == idx, d := c]
 df[, c("id", "idx") := NULL]
 df[is.na(df)] <- 0

    a b    c   d
 1: A 1 1345   0
 2: A 2  645   0
 3: A 3   75   0
 4: A 4    8   8
 5: B 1   95   0
 6: B 2  678 678
 7: C 1  598   0
 8: C 2   95   0
 9: C 3   75  75
10: D 1    4   0
11: D 2   53  53