从COPY命令返回查询ID

时间:2022-05-21 02:00:57

I have a Python script that uses psycopg2 to execute a COPY command to copy data from S3 to Redshift, this is running fine on a cron schedule.

我有一个Python脚本,使用psycopg2执行COPY命令将数据从S3复制到Redshift,这在cron计划中运行正常。

Now I want to do some checks that the data has loaded properly each time and want to query the STL_LOAD_COMMITS and STL_LOAD_ERRORS tables.

现在我想做一些检查,每次都正确加载数据,并想查询STL_LOAD_COMMITS和STL_LOAD_ERRORS表。

Does anyone know if there is a way of getting the query ID returned from the COPY command so it can be used to query the tables above and retrieve the relevant log record?

有没有人知道是否有办法获取从COPY命令返回的查询ID,以便它可用于查询上面的表并检索相关的日志记录?

I don't believe COPY returns anything at all, but if someone has come across some clever way of getting checking loads in code I'd be interested.

我不相信COPY会返回任何内容,但如果有人遇到一些聪明的方法来检查代码中的负载,我会感兴趣。

EDIT: Perhaps the right way to do this is to query using the filename instead of the query ID since I know the names of the files I've loaded.

编辑:也许正确的方法是使用文件名而不是查询ID进行查询,因为我知道我加载的文件的名称。

select *
from STL_LOAD_COMMITS
where filename in ('s3://bucket/4f737c05-8f16-4ba7-8f50-30423369c389.csv.gz',
's3://bucket/5fe4fea9-a9e4-4622-b9f6-ed3f98f7d1e2.csv.gz')

1 个解决方案

#1


2  

Using PG_LAST_COPY_ID() will, as it suggests, return the last executed COPY query ID.

正如建议的那样,使用PG_LAST_COPY_ID()将返回上次执行的COPY查询ID。

Source AWS Redshift PG_LAST_COPY_ID()

源AWS Redshift PG_LAST_COPY_ID()

#1


2  

Using PG_LAST_COPY_ID() will, as it suggests, return the last executed COPY query ID.

正如建议的那样,使用PG_LAST_COPY_ID()将返回上次执行的COPY查询ID。

Source AWS Redshift PG_LAST_COPY_ID()

源AWS Redshift PG_LAST_COPY_ID()