I am trying to calculate the amount of time between trials (trialnumber
) for each individual (sq_id
). I have been able to figure out how to calculate the time difference between trials (time_gap
), but my output has all these duplicate rows that should not be there.
我试图计算每个人(sq_id)的试验(试验编号)之间的时间量。我已经能够弄清楚如何计算试验之间的时差(time_gap),但是我的输出具有所有这些不应该存在的重复行。
A subset of my data can be found here. For reproducability purposes, I have included the dataset (called export
) below:
我的数据的一个子集可以在这里找到。出于可重现性的目的,我在下面包含了数据集(称为导出):
sq_id ageclass sex cohort year age grid trialnumber trialdate trialtime
6244 A F 2000 2005 5 AG 1 05/24/05 0:00
10212 A M 2006 2008 2 KL 1 05/04/08 6:13
10212 A M 2006 2010 4 KL 4 05/20/10 6:12
10212 A M 2006 2009 3 KL 2 06/10/09 6:14
10212 A M 2006 2009 3 KL 3 07/01/09 6:15
23052 J F 2017 2017 0 SU 2 08/02/17 11:00
23052 J F 2017 2017 0 SU 1 07/20/17 10:51
23080 J M 2017 2017 0 KL 2 07/29/17 10:20
23080 J M 2017 2017 0 KL 1 07/07/17 8:35
The first thing I did was calculate the time between trials, like so:
我做的第一件事是计算试验之间的时间,如下:
#adding time between trials to data
trialdate<-as.POSIXct(data$trialdate,format="%m/%d/%y")
data$datetime=as.POSIXct(paste(trialdate, data$trialtime),format= '%Y-%m-%d',usetz=FALSE)
#calculates time btw first trial and all other trials
timebtw <- data %>% group_by(sq_id) %>%
select(sq_id, trialnumber, datetime) %>%
mutate(time_gap = (datetime - nth(datetime, which.min((datetime)))), time_gap=time_gap/86400) #time_gap units are in seconds, changed to days
I then join the timebtw
dataset to my original dataset (called export
):
然后我将timebtw数据集加入到我的原始数据集(称为导出):
new<-dplyr::left_join(export, timebtw, by = "sq_id")
The output I get looks like so:
我得到的输出看起来像这样:
> export
sq_id ageclass sex cohort year age grid trialnumber.x trialdate trialtime datetime time_gap trialnumber.y
6244 A F 2000 2005 5 AG 1 05/24/05 0:00 2005-05-24 0 secs 1
10212 A M 2006 2008 2 KL 1 05/04/08 6:13 2008-05-04 0 secs 1
10212 A M 2006 2008 2 KL 1 05/04/08 6:13 2008-05-04 746 secs 4
10212 A M 2006 2008 2 KL 1 05/04/08 6:13 2008-05-04 402 secs 2
10212 A M 2006 2008 2 KL 1 05/04/08 6:13 2008-05-04 423 secs 3
10212 A M 2006 2010 4 KL 4 05/20/10 6:12 2010-05-20 0 secs 1
10212 A M 2006 2010 4 KL 4 05/20/10 6:12 2010-05-20 746 secs 4
10212 A M 2006 2010 4 KL 4 05/20/10 6:12 2010-05-20 402 secs 2
10212 A M 2006 2010 4 KL 4 05/20/10 6:12 2010-05-20 423 secs 3
10212 A M 2006 2009 3 KL 2 06/10/09 6:14 2009-06-10 0 secs 1
10212 A M 2006 2009 3 KL 2 06/10/09 6:14 2009-06-10 746 secs 4
10212 A M 2006 2009 3 KL 2 06/10/09 6:14 2009-06-10 402 secs 2
10212 A M 2006 2009 3 KL 2 06/10/09 6:14 2009-06-10 423 secs 3
10212 A M 2006 2009 3 KL 3 07/01/09 6:15 2009-07-01 0 secs 1
10212 A M 2006 2009 3 KL 3 07/01/09 6:15 2009-07-01 746 secs 4
10212 A M 2006 2009 3 KL 3 07/01/09 6:15 2009-07-01 402 secs 2
10212 A M 2006 2009 3 KL 3 07/01/09 6:15 2009-07-01 423 secs 3
23052 J F 2017 2017 0 SU 2 08/02/17 11:00 2017-08-02 13 secs 2
23052 J F 2017 2017 0 SU 2 08/02/17 11:00 2017-08-02 0 secs 1
23052 J F 2017 2017 0 SU 1 07/20/17 10:51 2017-07-20 13 secs 2
23052 J F 2017 2017 0 SU 1 07/20/17 10:51 2017-07-20 0 secs 1
23080 J M 2017 2017 0 KL 2 07/29/17 10:20 2017-07-29 22 secs 2
23080 J M 2017 2017 0 KL 2 07/29/17 10:20 2017-07-29 0 secs 1
23080 J M 2017 2017 0 KL 1 07/07/17 8:35 2017-07-07 22 secs 2
23080 J M 2017 2017 0 KL 1 07/07/17 8:35 2017-07-07 0 secs 1
This is a problem. There should only be one time_gap
value per trialnumber
.
这是个问题。每个trialnumber只应该有一个time_gap值。
So, for example, for sq_id
10212, the output should look like this:
因此,例如,对于sq_id 10212,输出应如下所示:
sq_id ageclass sex cohort year age grid trialnumber.x trialdate trialtime datetime time_gap trialnumber.y
10212 A M 2006 2008 2 KL 1 05/04/08 6:13 2008-05-04 0 secs 1
10212 A M 2006 2010 4 KL 4 05/20/10 6:12 2010-05-20 746 secs 4
10212 A M 2006 2009 3 KL 2 06/10/09 6:14 2009-06-10 402 secs 2
10212 A M 2006 2009 3 KL 3 07/01/09 6:15 2009-07-01 423 secs 3
I need the trialnumber.x
and trialnumber.y
columns to match up, this way there are only as many lines as there are trials (i.e. sq_id
6244 would have 1 line, sq_id
10212 4 lines, sq_id
23052 2 lines, and sq_id
23080 2 lines).
我需要trialnumber.x和trialnumber.y列来匹配,这样只有与试验一样多的行(即sq_id 6244将有1行,sq_id 10212 4行,sq_id 23052 2行和sq_id 23080 2线)。
Does anyone know how I can modify my code to get this output?
有谁知道如何修改我的代码以获得此输出?
1 个解决方案
#1
0
library(dplyr)
dat <- readr::read_table(
"sq_id ageclass sex cohort year age grid trialnumber trialdate trialtime
6244 A F 2000 2005 5 AG 1 05/24/05 0:00
10212 A M 2006 2008 2 KL 1 05/04/08 6:13
10212 A M 2006 2010 4 KL 4 05/20/10 6:12
10212 A M 2006 2009 3 KL 2 06/10/09 6:14
10212 A M 2006 2009 3 KL 3 07/01/09 6:15
23052 J F 2017 2017 0 SU 2 08/02/17 11:00
23052 J F 2017 2017 0 SU 1 07/20/17 10:51
23080 J M 2017 2017 0 KL 2 07/29/17 10:20
23080 J M 2017 2017 0 KL 1 07/07/17 8:35")
glimpse(dat)
# Observations: 9
# Variables: 9
# $ sq_id <int> 6244, 10212, 10212, 10212, 10212, 23052, 23052, 23080, 23080
# $ ageclass <chr> "A", "A", "A", "A", "A", "J", "J", "J", "J"
# $ sex <chr> "F", "M", "M", "M", "M", "F", "F", "M", "M"
# $ `cohort year` <chr> "2000 2005", "2006 2008", "2006 2010", "2006 2009", "2006 2009", "2017 2017", "2017 2017",...
# $ age <int> 5, 2, 4, 3, 3, 0, 0, 0, 0
# $ grid <chr> "AG", "KL", "KL", "KL", "KL", "SU", "SU", "KL", "KL"
# $ trialnumber <int> 1, 1, 4, 2, 3, 2, 1, 2, 1
# $ trialdate <chr> "05/24/05", "05/04/08", "05/20/10", "06/10/09", "07/01/09", "08/02/17", "07/20/17", "07/29...
# $ trialtime <time> 00:00:00, 06:13:00, 06:12:00, 06:14:00, 06:15:00, 11:00:00, 10:51:00, 10:20:00, 08:35:00
There doesn't seem to be a need to compute time gaps separately, thus no need to join:
似乎没有必要单独计算时间间隔,因此无需加入:
dat %>%
mutate(trial_dt = lubridate::mdy_hms(paste(trialdate, trialtime))) %>%
group_by(sq_id) %>%
mutate(time_gap = difftime(trial_dt, min(trial_dt), units = "days"))
# # A tibble: 9 x 11
# # Groups: sq_id [4]
# sq_id ageclass sex `cohort year` age grid trialnumber trialdate trialtime trial_dt time_gap
# <int> <chr> <chr> <chr> <int> <chr> <int> <chr> <time> <dttm> <time>
# 1 6244 A F 2000 2005 5 AG 1 05/24/05 00:00 2005-05-24 00:00:00 0
# 2 10212 A M 2006 2008 2 KL 1 05/04/08 06:13 2008-05-04 06:13:00 0
# 3 10212 A M 2006 2010 4 KL 4 05/20/10 06:12 2010-05-20 06:12:00 745.999305555556
# 4 10212 A M 2006 2009 3 KL 2 06/10/09 06:14 2009-06-10 06:14:00 402.000694444444
# 5 10212 A M 2006 2009 3 KL 3 07/01/09 06:15 2009-07-01 06:15:00 423.001388888889
# 6 23052 J F 2017 2017 0 SU 2 08/02/17 11:00 2017-08-02 11:00:00 13.00625
# 7 23052 J F 2017 2017 0 SU 1 07/20/17 10:51 2017-07-20 10:51:00 0
# 8 23080 J M 2017 2017 0 KL 2 07/29/17 10:20 2017-07-29 10:20:00 22.0729166666667
# 9 23080 J M 2017 2017 0 KL 1 07/07/17 08:35 2017-07-07 08:35:00 0
#1
0
library(dplyr)
dat <- readr::read_table(
"sq_id ageclass sex cohort year age grid trialnumber trialdate trialtime
6244 A F 2000 2005 5 AG 1 05/24/05 0:00
10212 A M 2006 2008 2 KL 1 05/04/08 6:13
10212 A M 2006 2010 4 KL 4 05/20/10 6:12
10212 A M 2006 2009 3 KL 2 06/10/09 6:14
10212 A M 2006 2009 3 KL 3 07/01/09 6:15
23052 J F 2017 2017 0 SU 2 08/02/17 11:00
23052 J F 2017 2017 0 SU 1 07/20/17 10:51
23080 J M 2017 2017 0 KL 2 07/29/17 10:20
23080 J M 2017 2017 0 KL 1 07/07/17 8:35")
glimpse(dat)
# Observations: 9
# Variables: 9
# $ sq_id <int> 6244, 10212, 10212, 10212, 10212, 23052, 23052, 23080, 23080
# $ ageclass <chr> "A", "A", "A", "A", "A", "J", "J", "J", "J"
# $ sex <chr> "F", "M", "M", "M", "M", "F", "F", "M", "M"
# $ `cohort year` <chr> "2000 2005", "2006 2008", "2006 2010", "2006 2009", "2006 2009", "2017 2017", "2017 2017",...
# $ age <int> 5, 2, 4, 3, 3, 0, 0, 0, 0
# $ grid <chr> "AG", "KL", "KL", "KL", "KL", "SU", "SU", "KL", "KL"
# $ trialnumber <int> 1, 1, 4, 2, 3, 2, 1, 2, 1
# $ trialdate <chr> "05/24/05", "05/04/08", "05/20/10", "06/10/09", "07/01/09", "08/02/17", "07/20/17", "07/29...
# $ trialtime <time> 00:00:00, 06:13:00, 06:12:00, 06:14:00, 06:15:00, 11:00:00, 10:51:00, 10:20:00, 08:35:00
There doesn't seem to be a need to compute time gaps separately, thus no need to join:
似乎没有必要单独计算时间间隔,因此无需加入:
dat %>%
mutate(trial_dt = lubridate::mdy_hms(paste(trialdate, trialtime))) %>%
group_by(sq_id) %>%
mutate(time_gap = difftime(trial_dt, min(trial_dt), units = "days"))
# # A tibble: 9 x 11
# # Groups: sq_id [4]
# sq_id ageclass sex `cohort year` age grid trialnumber trialdate trialtime trial_dt time_gap
# <int> <chr> <chr> <chr> <int> <chr> <int> <chr> <time> <dttm> <time>
# 1 6244 A F 2000 2005 5 AG 1 05/24/05 00:00 2005-05-24 00:00:00 0
# 2 10212 A M 2006 2008 2 KL 1 05/04/08 06:13 2008-05-04 06:13:00 0
# 3 10212 A M 2006 2010 4 KL 4 05/20/10 06:12 2010-05-20 06:12:00 745.999305555556
# 4 10212 A M 2006 2009 3 KL 2 06/10/09 06:14 2009-06-10 06:14:00 402.000694444444
# 5 10212 A M 2006 2009 3 KL 3 07/01/09 06:15 2009-07-01 06:15:00 423.001388888889
# 6 23052 J F 2017 2017 0 SU 2 08/02/17 11:00 2017-08-02 11:00:00 13.00625
# 7 23052 J F 2017 2017 0 SU 1 07/20/17 10:51 2017-07-20 10:51:00 0
# 8 23080 J M 2017 2017 0 KL 2 07/29/17 10:20 2017-07-29 10:20:00 22.0729166666667
# 9 23080 J M 2017 2017 0 KL 1 07/07/17 08:35 2017-07-07 08:35:00 0