通过汇总一些列并保留其余列来对数据进行分组

I have a data frame structured like this :

我有一个这样的数据框:

exdataframe <- data.frame(c(rep("ma1",4),rep("ma2",3),rep("ma3",2),rep("ma4",1)),
                          c(rep("1",4),rep("2",3),rep("3",2),rep("1",1)),
                          c(rep("xxx",4),rep("yyyy",3),rep("zz",2),rep("xxx",1)),
                          c("2018-05-27","2018-06-24", "2018-07-01" ,"2018-07-08","2018-06-24", "2018-07-01" ,"2018-07-08","2018-05-27","2018-06-24", "2018-07-01"),
                          c(112,1,3,0,0,0,3,19,45,9),
                          c(1000,0,0,0,200,300,8,90.9,0,1))
colnames(exdataframe) <- c("ID","classid","classname","date","x","y")

I want to group this data frame with by column "ID" while summing the columns x and y and keeping all of the columns. When I do :

我想用列“ID”对这个数据框进行分组,同时对列x和y进行求和并保留所有列。当我做 :

exdataframe_gr <- exdataframe %>% group_by(ID) %>% filter(x == sum(x),y == sum(y))

I am getting a data frame with only one row which is the row corresponding one entry in the original data frame. The output that I want is :

我得到的数据帧只有一行,这是与原始数据帧中的一个条目相对应的行。我想要的输出是:

ID  ClassID Classname   Date                X   Y
ma1   1      xxx       "could be anything"  116 1000
ma2   2      yyyy      "could be anything"  3   508
ma3   3       zz       "could be anything"  64  90.9
ma4   1      xxx       "could be anything"  9   1

The date column could be anyhting - I dont care about its value. My original data is much bigger than this - 2000 rows, 45 columns.

日期栏可能是任何 - 我不关心它的价值。我的原始数据比这大得多 - 2000行,45列。

I searched internet and here but could not find a similar example. Any help is appreciated as I can not find a solution.

我搜索了互联网和这里,但找不到类似的例子。任何帮助都表示赞赏,因为我找不到解决方案。

2 个解决方案

#1

library(tidyverse)
exdataframe %>% group_by(ID)%>% mutate_if(is.factor,as.character) %>% nest() %>%
                mutate(classid = map_chr(data,function(x) as.character(x[,'classid'][1,])),
                       classname = map_chr(data,function(x) as.character(x[,'classname'][1,])),
                       date = map_chr(data, function(x) paste(x[,'date'][1], collapse = " | ")),
                       x = map_dbl(data,function(x)sum(x[,'x'])),
                       y = map_dbl(data,function(x)sum(x[,'y']))) %>% 
               select(-data)



    # A tibble: 4 x 6
     ID    classid classname date                                             
      x      y
    <fct> <chr>   <chr>     <chr>                                        <dbl>  <dbl>
   1 ma1   1       xxx       "c(\"2018-05-27\", \"2018-06-24\", \"2018-~ 116    1.00e3 
   2 ma2   2       yyyy      "c(\"2018-06-24\", \"2018-07-01\", \"2018-~   3.00 5.08e2
   3 ma3   3       zz        "c(\"2018-05-27\", \"2018-06-24\")"          64.0  9.09e1
   4 ma4   1       xxx       2018-07-01                                    9.00 1.00e0

#2

Tell me if that satysfying you. Unfortunately ther's no Date column, however as I see it "could be anything" so I suppose you don't need it.

告诉我,如果这让你感到满意。不幸的是,没有Date列,但是我认为它“可能是任何东西”所以我想你不需要它。

exdataframe %>% 
  group_by(ID, classid, classname) %>% 
  summarise(x = sum(x),y=sum(y))

# A tibble: 4 x 5
# Groups:   ID, classid [?]
  ID    classid classname     x      y
  <fct> <fct>   <fct>     <dbl>  <dbl>
1 ma1   1       xxx         116 1000  
2 ma2   2       yyyy          3  508  
3 ma3   3       zz           64   90.9
4 ma4   1       xxx           9    1

Solution which would keep all columns:

保留所有列的解决方案:

exdataframe_gr <- exdataframe %>% 
  group_by(ID) %>% 
  mutate(x = sum(x),y=sum(y)) %>%
  ungroup() %>%
  distinct(ID, .keep_all = TRUE)

# A tibble: 4 x 6
  ID    classid classname date           x      y
  <fct> <fct>   <fct>     <fct>      <dbl>  <dbl>
1 ma1   1       xxx       2018-05-27   116 1000  
2 ma2   2       yyyy      2018-06-24     3  508  
3 ma3   3       zz        2018-05-27    64   90.9
4 ma4   1       xxx       2018-07-01     9    1

#1

library(tidyverse)
exdataframe %>% group_by(ID)%>% mutate_if(is.factor,as.character) %>% nest() %>%
                mutate(classid = map_chr(data,function(x) as.character(x[,'classid'][1,])),
                       classname = map_chr(data,function(x) as.character(x[,'classname'][1,])),
                       date = map_chr(data, function(x) paste(x[,'date'][1], collapse = " | ")),
                       x = map_dbl(data,function(x)sum(x[,'x'])),
                       y = map_dbl(data,function(x)sum(x[,'y']))) %>% 
               select(-data)



    # A tibble: 4 x 6
     ID    classid classname date                                             
      x      y
    <fct> <chr>   <chr>     <chr>                                        <dbl>  <dbl>
   1 ma1   1       xxx       "c(\"2018-05-27\", \"2018-06-24\", \"2018-~ 116    1.00e3 
   2 ma2   2       yyyy      "c(\"2018-06-24\", \"2018-07-01\", \"2018-~   3.00 5.08e2
   3 ma3   3       zz        "c(\"2018-05-27\", \"2018-06-24\")"          64.0  9.09e1
   4 ma4   1       xxx       2018-07-01                                    9.00 1.00e0

#2