R:如何将数据帧的行与相同的id组合在一起,并取最新的非na值?

时间:2021-02-15 09:11:06

Example data frame

示例数据帧

date       name     speed  acceleration
1/1/17     bob      5      NA
1/1/15     george   5      NA
1/1/15     bob      NA     4
1/1/17     bob      4      NA

I want to condense all rows with the same name into one row and keep the newest non-na value for the speed and acceleration column.

我想将所有同名的行压缩为一行,并保留speed和加速度列的最新非na值。

Desired output

期望输出值

date       name     speed  acceleration
1/1/17     bob      5      4
1/1/15     george   5      NA

2 个解决方案

#1


3  

You can do it this way:

你可以这样做:

library(dplyr)
library(lubridate)

input = read.table(text = 
 "date       name     speed  acceleration
  1/1/17     bob      5      NA
  1/1/15     george   5      NA
  1/1/15     bob      NA     4
  1/1/17     bob      4      NA",
  header = TRUE, stringsAsFactors = FALSE)

output <- input %>%
  mutate(date = mdy(date)) %>% # or maybe dmy, depending on your date format
  group_by(name) %>%
  arrange(desc(date)) %>%
  summarise_all(funs(na.omit(.)[1]))

output
# # A tibble: 2 × 4
#     name       date speed acceleration
#    <chr>     <date> <int>        <int>
# 1    bob 2017-01-01     5            4
# 2 george 2015-01-01     5           NA

#2


0  

Here is an option using data.table. Convert the 'data.frame' to 'data.table' (setDT(input)), order the 'date' after converting to Date class, grouped by 'name', loop through the columns and get the first non-NA element

这里有一个使用data.table的选项。将“data.frame”转换为“data”。表(setDT(input))),将“date”转换为“date”类,按“name”分组,遍历列,得到第一个非na元素

library(data.table)
library(lubridate)
setDT(input)[order(-mdy(date)), lapply(.SD, function(x) x[!is.na(x)][1]), name]
#     name   date speed acceleration
#1:    bob 1/1/17     5            4
#2: george 1/1/15     5           NA

#1


3  

You can do it this way:

你可以这样做:

library(dplyr)
library(lubridate)

input = read.table(text = 
 "date       name     speed  acceleration
  1/1/17     bob      5      NA
  1/1/15     george   5      NA
  1/1/15     bob      NA     4
  1/1/17     bob      4      NA",
  header = TRUE, stringsAsFactors = FALSE)

output <- input %>%
  mutate(date = mdy(date)) %>% # or maybe dmy, depending on your date format
  group_by(name) %>%
  arrange(desc(date)) %>%
  summarise_all(funs(na.omit(.)[1]))

output
# # A tibble: 2 × 4
#     name       date speed acceleration
#    <chr>     <date> <int>        <int>
# 1    bob 2017-01-01     5            4
# 2 george 2015-01-01     5           NA

#2


0  

Here is an option using data.table. Convert the 'data.frame' to 'data.table' (setDT(input)), order the 'date' after converting to Date class, grouped by 'name', loop through the columns and get the first non-NA element

这里有一个使用data.table的选项。将“data.frame”转换为“data”。表(setDT(input))),将“date”转换为“date”类,按“name”分组,遍历列,得到第一个非na元素

library(data.table)
library(lubridate)
setDT(input)[order(-mdy(date)), lapply(.SD, function(x) x[!is.na(x)][1]), name]
#     name   date speed acceleration
#1:    bob 1/1/17     5            4
#2: george 1/1/15     5           NA