将每小时的数据汇总为每日汇总

时间:2022-11-21 16:57:21

I have an hourly weather data in the following format:

我每小时有以下格式的天气资料:

Date,DBT
01/01/2000 01:00,30
01/01/2000 02:00,31
01/01/2000 03:00,33
...
...
12/31/2000 23:00,25

What I need is a daily aggregate of max, min, ave like this:

我需要的是每天的最大值,最小值,像这样:

Date,MaxDBT,MinDBT,AveDBT
01/01/2000,36,23,28
01/02/2000,34,22,29
01/03/2000,32,25,30
...
...
12/31/2000,35,9,20

How to do this in R?

怎么用R表示呢?

4 个解决方案

#1


18  

1) This can be done compactly using zoo:

1)这可以用zoo来实现:

L <- "Date,DBT
01/01/2000 01:00,30
01/01/2000 02:00,31
01/01/2000 03:00,33
12/31/2000 23:00,25"

library(zoo)
stat <- function(x) c(min = min(x), max = max(x), mean = mean(x))
z <- read.zoo(text = L, header = TRUE, sep = ",", format = "%m/%d/%Y", aggregate = stat)

This gives:

这给:

> z
           min max     mean
2000-01-01  30  33 31.33333
2000-12-31  25  25 25.00000

2) here is a solution that only uses core R:

2)这里有一个只使用核心R的解决方案:

DF <- read.csv(text = L)
DF$Date <- as.Date(DF$Date, "%m/%d/%Y")
ag <- aggregate(DBT ~ Date, DF, stat) # same stat as in zoo solution 

The last line gives:

给最后一行:

> ag
        Date  DBT.min  DBT.max DBT.mean
1 2000-01-01 30.00000 33.00000 31.33333
2 2000-12-31 25.00000 25.00000 25.00000

EDIT: (1) Since this first appeared the text= argument to read.zoo was added in the zoo package. (2) minor improvements.

编辑:(1)从第一次出现开始,文本=要读取的参数。动物园的包装上增加了动物园。(2)小的改进。

#2


5  

Using strptime(), trunc() and ddply() from the plyr package :

使用plyr包中的strptime()、trunc()和ddply():

#Make the data
ZZ <- textConnection("Date,DBT
01/01/2000 01:00,30
01/01/2000 02:00,31
01/01/2000 03:00,33
12/31/2000 23:00,25")
dataframe <- read.csv(ZZ,header=T)
close(ZZ)

# Do the calculations
dataframe$Date <- strptime(dataframe$Date,format="%m/%d/%Y %H:%M")
dataframe$day <- trunc(dataframe$Date,"day")

require(plyr)

ddply(dataframe,.(day),
      summarize,
      aveDBT=mean(DBT),
      maxDBT=max(DBT),
      minDBT=min(DBT)
)

gives

给了

         day   aveDBT maxDBT minDBT
1 2000-01-01 31.33333     33     30
2 2000-12-31 25.00000     25     25

To clarify :

澄清:

strptime converts the character to dates according to the format. To see how you can specify the format, see ?strptime. trunc will then truncate these date-times to the specified unit, which is day in this case.

strptime根据格式将字符转换为日期。要了解如何指定格式,请参阅?strptime。trunc会将这些日期时间截断到指定的单元(在本例中为day)。

ddply will evaluate the function summarize within the dataframe after splitting it up according to day. everything after summarize are arguments that are passed to the function summarize.

ddply会在dataframe中评估该函数的汇总,然后逐日分解。总结后的所有内容都是传递给函数总结的参数。

#3


2  

There is also a nice package called HydroTSM. It uses zoo objects and can convert to other aggregates in time

还有一个很好的包,叫做HydroTSM。它使用动物园的物体,并能及时转换成其他集合

The function in your case is subdaily2daily. You can choose if the aggregation should be based on min / max / mean...

这里的功能是subdaily2daily。您可以选择聚合是否应该基于min / max / mean…

#4


0  

You can use the tidyquant package for this. The process is involves using the tq_transmute function to return a data frame that is modified using the xts aggregation function, apply.daily. We'll apply a custom stat_fun, which returns the min, max and mean. However, you can apply any vector function you'd like such as quantile.

您可以使用tidyquant包。这个过程包括使用tq_transmute函数返回一个使用xts聚合函数进行修改的数据帧。我们将应用一个自定义的stat_fun,它返回最小值、最大值和平均值。然而,你可以应用任何你喜欢的向量函数,比如分位数。

library(tidyquant)

df
#> # A tibble: 4 x 2
#>                  Date   DBT
#>                <dttm> <dbl>
#> 1 2000-01-01 01:00:00    30
#> 2 2000-01-01 02:00:00    31
#> 3 2000-01-01 03:00:00    33
#> 4 2000-12-31 23:00:00    25

stat_fun <- function(x) c(min = min(x), max = max(x), mean = mean(x))

df %>%
    tq_transmute(select     = DBT,
                 mutate_fun = apply.daily,
                 FUN        = stat_fun)
# A tibble: 2 x 4
#>                 Date   min   max     mean
#>                <dttm> <dbl> <dbl>    <dbl>
#> 1 2000-01-01 03:00:00    30    33 31.33333
#> 2 2000-12-31 23:00:00    25    25 25.00000

#1


18  

1) This can be done compactly using zoo:

1)这可以用zoo来实现:

L <- "Date,DBT
01/01/2000 01:00,30
01/01/2000 02:00,31
01/01/2000 03:00,33
12/31/2000 23:00,25"

library(zoo)
stat <- function(x) c(min = min(x), max = max(x), mean = mean(x))
z <- read.zoo(text = L, header = TRUE, sep = ",", format = "%m/%d/%Y", aggregate = stat)

This gives:

这给:

> z
           min max     mean
2000-01-01  30  33 31.33333
2000-12-31  25  25 25.00000

2) here is a solution that only uses core R:

2)这里有一个只使用核心R的解决方案:

DF <- read.csv(text = L)
DF$Date <- as.Date(DF$Date, "%m/%d/%Y")
ag <- aggregate(DBT ~ Date, DF, stat) # same stat as in zoo solution 

The last line gives:

给最后一行:

> ag
        Date  DBT.min  DBT.max DBT.mean
1 2000-01-01 30.00000 33.00000 31.33333
2 2000-12-31 25.00000 25.00000 25.00000

EDIT: (1) Since this first appeared the text= argument to read.zoo was added in the zoo package. (2) minor improvements.

编辑:(1)从第一次出现开始,文本=要读取的参数。动物园的包装上增加了动物园。(2)小的改进。

#2


5  

Using strptime(), trunc() and ddply() from the plyr package :

使用plyr包中的strptime()、trunc()和ddply():

#Make the data
ZZ <- textConnection("Date,DBT
01/01/2000 01:00,30
01/01/2000 02:00,31
01/01/2000 03:00,33
12/31/2000 23:00,25")
dataframe <- read.csv(ZZ,header=T)
close(ZZ)

# Do the calculations
dataframe$Date <- strptime(dataframe$Date,format="%m/%d/%Y %H:%M")
dataframe$day <- trunc(dataframe$Date,"day")

require(plyr)

ddply(dataframe,.(day),
      summarize,
      aveDBT=mean(DBT),
      maxDBT=max(DBT),
      minDBT=min(DBT)
)

gives

给了

         day   aveDBT maxDBT minDBT
1 2000-01-01 31.33333     33     30
2 2000-12-31 25.00000     25     25

To clarify :

澄清:

strptime converts the character to dates according to the format. To see how you can specify the format, see ?strptime. trunc will then truncate these date-times to the specified unit, which is day in this case.

strptime根据格式将字符转换为日期。要了解如何指定格式,请参阅?strptime。trunc会将这些日期时间截断到指定的单元(在本例中为day)。

ddply will evaluate the function summarize within the dataframe after splitting it up according to day. everything after summarize are arguments that are passed to the function summarize.

ddply会在dataframe中评估该函数的汇总,然后逐日分解。总结后的所有内容都是传递给函数总结的参数。

#3


2  

There is also a nice package called HydroTSM. It uses zoo objects and can convert to other aggregates in time

还有一个很好的包,叫做HydroTSM。它使用动物园的物体,并能及时转换成其他集合

The function in your case is subdaily2daily. You can choose if the aggregation should be based on min / max / mean...

这里的功能是subdaily2daily。您可以选择聚合是否应该基于min / max / mean…

#4


0  

You can use the tidyquant package for this. The process is involves using the tq_transmute function to return a data frame that is modified using the xts aggregation function, apply.daily. We'll apply a custom stat_fun, which returns the min, max and mean. However, you can apply any vector function you'd like such as quantile.

您可以使用tidyquant包。这个过程包括使用tq_transmute函数返回一个使用xts聚合函数进行修改的数据帧。我们将应用一个自定义的stat_fun,它返回最小值、最大值和平均值。然而,你可以应用任何你喜欢的向量函数,比如分位数。

library(tidyquant)

df
#> # A tibble: 4 x 2
#>                  Date   DBT
#>                <dttm> <dbl>
#> 1 2000-01-01 01:00:00    30
#> 2 2000-01-01 02:00:00    31
#> 3 2000-01-01 03:00:00    33
#> 4 2000-12-31 23:00:00    25

stat_fun <- function(x) c(min = min(x), max = max(x), mean = mean(x))

df %>%
    tq_transmute(select     = DBT,
                 mutate_fun = apply.daily,
                 FUN        = stat_fun)
# A tibble: 2 x 4
#>                 Date   min   max     mean
#>                <dttm> <dbl> <dbl>    <dbl>
#> 1 2000-01-01 03:00:00    30    33 31.33333
#> 2 2000-12-31 23:00:00    25    25 25.00000