在Google的Big Query中查询列中的不同时间戳

时间:2021-07-24 14:56:46

I am working to select data from a price database. The rows I want to query are the ones which occur every whole minute, and distinctly. So, if there's a minute that has two prices, I would rather the first price.

我正在努力从价格数据库中选择数据。我想要查询的行是每隔一分钟发生的行,并且清楚地显示。所以,如果有一分钟有两个价格,我宁愿第一个价格。

Here's what the data looks like this this VVV query:

以下是此VVV查询的数据:

SELECT price, timestamp FROM [database] WHERE stock="appl" AND second(timestamp) = 0 ORDER BY timestamp

SELECT price,timestamp FROM [database] WHERE stock =“appl”AND second(timestamp)= 0 ORDER BY timestamp

Result:

Row price timestamp
1 0.097947 2018-02-14 03:42:00.000 UTC
2 0.09796 2018-02-14 03:43:00.000 UTC
3 0.097959 2018-02-14 03:45:00.000 UTC
4 0.097969 2018-02-14 03:46:00.000 UTC
5 0.097984 2018-02-14 03:47:00.000 UTC
6 0.097986 2018-02-14 03:47:00.000 UTC (Duplicate time ^) 7 0.097899 2018-02-14 03:48:00.000 UTC
8 0.097927 2018-02-14 03:49:00.000 UTC
9 0.097984 2018-02-14 03:50:00.000 UTC
10 0.097995 2018-02-14 03:51:00.000 UTC
11 0.097972 2018-02-14 03:52:00.000 UTC
12 0.097924 2018-02-14 03:53:00.000 UTC
13 0.097935 2018-02-14 03:54:00.000 UTC

行价时间戳1 0.097947 2018-02-14 03:42:00.000 UTC 2 0.09796 2018-02-14 03:43:00.000 UTC 3 0.097959 2018-02-14 03:45:00.000 UTC 4 0.097969 2018-02-14 03 :46:00.000 UTC 5 0.097984 2018-02-14 03:47:00.000 UTC 6 0.097986 2018-02-14 03:47:00.000 UTC(重复时间^)7 0.097899 2018-02-14 03:48:00.000 UTC 8 0.097927 2018-02-14 03:49:00.000 UTC 9 0.097984 2018-02-14 03:50:00.000 UTC 10 0.097995 2018-02-14 03:51:00.000 UTC 11 0.097972 2018-02-14 03:52:00.000 UTC 12 0.097924 2018-02-14 03:53:00.000 UTC 13 0.097935 2018-02-14 03:54:00.000 UTC

When I add "GROUP BY price, timestamp", the data has no difference.

当我添加“GROUP BY price,timestamp”时,数据没有区别。

I want distinct timestamps. So, for this case the result should be:

我想要不同的时间戳。因此,对于这种情况,结果应该是:

Row price timestamp
1 0.097947 2018-02-14 03:42:00.000 UTC
2 0.09796 2018-02-14 03:43:00.000 UTC
3 0.097959 2018-02-14 03:45:00.000 UTC
4 0.097969 2018-02-14 03:46:00.000 UTC
5 0.097984 2018-02-14 03:47:00.000 UTC
6 0.097899 2018-02-14 03:48:00.000 UTC
7 0.097927 2018-02-14 03:49:00.000 UTC
8 0.097984 2018-02-14 03:50:00.000 UTC
9 0.097995 2018-02-14 03:51:00.000 UTC
10 0.097972 2018-02-14 03:52:00.000 UTC
11 0.097924 2018-02-14 03:53:00.000 UTC
12 0.097935 2018-02-14 03:54:00.000 UTC

行价时间戳1 0.097947 2018-02-14 03:42:00.000 UTC 2 0.09796 2018-02-14 03:43:00.000 UTC 3 0.097959 2018-02-14 03:45:00.000 UTC 4 0.097969 2018-02-14 03 :46:00.000 UTC 5 0.097984 2018-02-14 03:47:00.000 UTC 6 0.097899 2018-02-14 03:48:00.000 UTC 7 0.097927 2018-02-14 03:49:00.000 UTC 8 0.097984 2018-02- 14 03:50:00.000 UTC 9 0.097995 2018-02-14 03:51:00.000 UTC 10 0.097972 2018-02-14 03:52:00.000 UTC 11 0.097924 2018-02-14 03:53:00.000 UTC 12 0.097935 2018- 02-14 03:54:00.000 UTC

3 个解决方案

#1


1  

Below is for BigQuery Standard SQL (and assumes your ts field is of timestamp type)

下面是BigQuery Standard SQL(假设你的ts字段是时间戳类型)

SELECT 
  ARRAY_AGG(price ORDER BY ts LIMIT 1)[SAFE_OFFSET(0)] price,
  TIMESTAMP_TRUNC(ts, MINUTE) ts 
FROM `yourproject.yourdataset.yourtable`
WHERE stock = 'appl'
GROUP BY 2
ORDER BY 2  

Note: I use ts instead of timestamp as I prefer not using keywords as column names

注意:我使用ts而不是时间戳,因为我不喜欢使用关键字作为列名

#2


1  

There is no such thing as a "first" price, unless another column specifies that value. You can get one price per timestamp with something like this:

除非另一列指定该值,否则不存在“第一”价格。每个时间戳可以得到一个价格,如下所示:

SELECT MIN(price), timestamp
FROM [database]
WHERE stock = 'appl' AND second(timestamp) = 0
GROUP BY timestamp;

If you do have another column with the ordering, then you can use array agg and choose the first value.

如果您确实有另一个具有排序的列,那么您可以使用数组agg并选择第一个值。

#3


0  

SELECT MIN(price), timestamp
FROM [database]
WHERE stock = 'appl' AND second(timestamp) = 0
GROUP BY timestamp
ORDER BY timestamp

#1


1  

Below is for BigQuery Standard SQL (and assumes your ts field is of timestamp type)

下面是BigQuery Standard SQL(假设你的ts字段是时间戳类型)

SELECT 
  ARRAY_AGG(price ORDER BY ts LIMIT 1)[SAFE_OFFSET(0)] price,
  TIMESTAMP_TRUNC(ts, MINUTE) ts 
FROM `yourproject.yourdataset.yourtable`
WHERE stock = 'appl'
GROUP BY 2
ORDER BY 2  

Note: I use ts instead of timestamp as I prefer not using keywords as column names

注意:我使用ts而不是时间戳,因为我不喜欢使用关键字作为列名

#2


1  

There is no such thing as a "first" price, unless another column specifies that value. You can get one price per timestamp with something like this:

除非另一列指定该值,否则不存在“第一”价格。每个时间戳可以得到一个价格,如下所示:

SELECT MIN(price), timestamp
FROM [database]
WHERE stock = 'appl' AND second(timestamp) = 0
GROUP BY timestamp;

If you do have another column with the ordering, then you can use array agg and choose the first value.

如果您确实有另一个具有排序的列,那么您可以使用数组agg并选择第一个值。

#3


0  

SELECT MIN(price), timestamp
FROM [database]
WHERE stock = 'appl' AND second(timestamp) = 0
GROUP BY timestamp
ORDER BY timestamp