如何使用dplyr和RPostgreSQL将字符日期时间转换为可用?

时间:2022-10-10 23:09:04

I have time stamp, column Timelocal in my data that's formatted as follows:

我的时间戳,Timelocal列在我的数据中,格式如下:

2015-08-24T00:02:03.000Z

Normally, I use the following line to convert this format to convert it to a date format I can use.

通常,我使用以下行转换此格式以将其转换为我可以使用的日期格式。

timestamp2 = "2015-08-24T00:02:03.000Z"
timestamp2_formatted = strptime(timestamp2,"%Y-%m-%dT%H:%M:%S",tz="UTC") 
    # also works for dataframes (my main use of it)
        df$TimeNew = strptime(df$TimeLocal,"%Y-%m-%dT%H:%M:%S",tz="UTC")

This works fine on my machine. The problem is, I'm now working with a much bigger dataframe. It's on a Redshift cluster and I am accessing it using the RPostgreSQL package. I'm using dplyr to manipulate data as the documentation online indicates that it plays nicely with RPostgreSQL.

这在我的机器上工作正常。问题是,我现在正在使用更大的数据帧。它位于Redshift集群上,我使用RPostgreSQL包访问它。我正在使用dplyr来操作数据,因为在线文档表明它与RPostgreSQL很好地配合。

It does seem to, except for converting the date format. I'd like to convert the character format to a time format. Timelocal it was read into Redshift as "varchar". Thus, R is interpreting it as a character field.

它似乎确实如此,除了转换日期格式。我想将字符格式转换为时间格式。 Timelocal将其作为“varchar”读入Redshift。因此,R将其解释为字符字段。

I've tried the following:

我尝试过以下方法:

library(dplyr)
library(RPostgreSQL)
library(lubridate)

try 1 - using easy dplyr syntax

尝试1 - 使用简单的dplyr语法

mutate(elevate, timelocalnew = fast_strptime(timelocal, "%Y-%m-%dT%H:%M:%S",tz="UTC")) 

try 2 - using dplyr syntax from another online reference code

尝试2 - 使用来自其他在线参考代码的dplyr语法

elevate %>% 
  mutate(timelocalnew = timelocal %>% fast_strptime("%Y-%m-%dT%H:%M:%S",tz="UTC") %>% as.character()) %>%
  filter(!is.na(timelocalnew))

try 3 - using strptime instead of fast_strptime

尝试3 - 使用strptime而不是fast_strptime

elevate %>% 
  mutate(timelocalnew = timelocal %>% strptime("%Y-%m-%dT%H:%M:%S",tz="UTC") %>% as.character()) %>%
  filter(!is.na(timelocalnew))

I am trying to adapt code from here: http://www.markhneedham.com/blog/2014/12/08/r-dplyr-mutate-with-strptime-incompatible-sizewrong-result-size/

我正在尝试从这里调整代码:http://www.markhneedham.com/blog/2014/12/08/r-dplyr-mutate-with-strptime-incompatible-sizewrong-result-size/

My tries are erroring because:

我的尝试是错误的,因为:

Error in postgresqlExecStatement(conn, statement, ...) : 
  RS-DBI driver: (could not Retrieve the result : ERROR:  syntax error at or near "AS"
LINE 1: ...CAST(STRPTIME("timelocal", '%YSuccess2048568264T%H%M�����', 'UTC' AS "tz") A...
                                                             ^
)
In addition: Warning messages:
1: In postgresqlQuickSQL(conn, statement, ...) :
  Could not create executeSELECT count(*) FROM (SELECT "timelocal", "timeutc", "zipcode", "otherdata", "country", CAST(STRPTIME("timelocal", '%Y%m%dT%H%M%S', 'UTC' AS "tz") AS TEXT) AS "timelocalnew"
FROM "data") AS "master"
2: Named arguments ignored for SQL STRPTIME 

It would seem that strptime is incompatible with RPostgreSQL. Is this the right interpretation? If so, does this mean there is no means of handling date formats within R if the data is on Redshift? I checked the RPostgreSQL package documentation and did not see anything related to specifying time formats.

似乎strptime与RPostgreSQL不兼容。这是正确的解释吗?如果是这样,这是否意味着如果数据在Redshift上,则无法在R中处理日期格式?我检查了RPostgreSQL包文档,但没有看到任何与指定时间格式相关的内容。

Would appreciate any advice on getting date time columns formatted correctly with dplyr and RpostgreSQL.

非常感谢任何有关使用dplyr和RpostgreSQL正确格式化日期时间列的建议。

2 个解决方案

#1


0  

Does the following work?

以下工作如何?

as.Date(strptime(timelocal,format = "%YYYY/%MM/%DD %H:%M:%OS"),tz="UTC")

#2


0  

Traditional R functions will not work here.
Your should go with SQL translation which has been evolving in the latest versions of dplyr and dbplyr.
The following worked for me:

传统的R功能在这里不起作用。您应该使用最新版本的dplyr和dbplyr中不断发展的SQL转换。以下对我有用:

library(dbplyr)
mutate(date = to_date(timestamp2, 'YYYY-MM-DD'))  

Note, I am using AWS Redshift.

注意,我使用的是AWS Redshift。

#1


0  

Does the following work?

以下工作如何?

as.Date(strptime(timelocal,format = "%YYYY/%MM/%DD %H:%M:%OS"),tz="UTC")

#2


0  

Traditional R functions will not work here.
Your should go with SQL translation which has been evolving in the latest versions of dplyr and dbplyr.
The following worked for me:

传统的R功能在这里不起作用。您应该使用最新版本的dplyr和dbplyr中不断发展的SQL转换。以下对我有用:

library(dbplyr)
mutate(date = to_date(timestamp2, 'YYYY-MM-DD'))  

Note, I am using AWS Redshift.

注意,我使用的是AWS Redshift。