引用窗口函数的FILTER子句中的当前行

时间:2022-02-16 22:58:18

In PostgreSQL 9.4 the window functions have the new option of a FILTER to select a sub-set of the window frame for processing. The documentation mentions it, but provides no sample. An online search yields some samples, including from 2ndQuadrant but all that I found were rather trivial examples with constant expressions. What I am looking for is a filter expression that includes the value of the current row.

在PostgreSQL 9.4中,窗口函数具有FILTER的新选项,用于选择窗口框架的子集以进行处理。文档提到它,但没有提供样本。在线搜索产生了一些样本,包括来自2ndQuadrant的样本,但我发现的所有样本都是具有常量表达式的相当简单的例子。我要找的是一个包含当前行值的过滤器表达式。

Assume I have a table with a bunch of columns, one of which is of date type:

假设我有一堆包含一堆列的表,其中一列是日期类型:

col1 | col2 |     dt
------------------------
  1  |  a   | 2015-07-01
  2  |  b   | 2015-07-03
  3  |  c   | 2015-07-10
  4  |  d   | 2015-07-11
  5  |  e   | 2015-07-11
  6  |  f   | 2015-07-13
...

A window definition for processing on the date over the entire table is trivially constructed: WINDOW win AS (ORDER BY dt)

在整个表上处理日期的窗口定义很简单:WINDOW win AS(ORDER BY dt)

I am interested in knowing how many rows are present in, say, the 4 days prior to the current row (inclusive). So I want to generate this output:

我有兴趣知道在当前行(包括)之前的4天中存在多少行。所以我想生成这个输出:

col1 | col2 |     dt     | count
--------------------------------
  1  |  a   | 2015-07-01 |   1
  2  |  b   | 2015-07-03 |   2
  3  |  c   | 2015-07-10 |   1
  4  |  d   | 2015-07-11 |   3
  5  |  e   | 2015-07-11 |   3
  6  |  f   | 2015-07-13 |   4
...

The FILTER clause of the window functions seems like the obvious choice:

窗口函数的FILTER子句似乎是显而易见的选择:

count(*) FILTER (WHERE current_row.dt - dt <= 4) OVER win

But how do I specify current_row.dt (for lack of a better syntax)? Is this even possible?

但是如何指定current_row.dt(缺少更好的语法)?这有可能吗?

If this is not possible, are there other ways of selecting date ranges in a window frame? The frame specification is no help as it is all row-based.

如果无法做到这一点,是否还有其他方法可以在窗口框架中选择日期范围?框架规范没有帮助,因为它都是基于行的。

I am not interested in alternative solutions using sub-queries, it has to be based on window processing.

我对使用子查询的替代解决方案不感兴趣,它必须基于窗口处理。

2 个解决方案

#1


5  

You are not actually aggregating rows, so the new aggregate FILTER clause is not the right tool. A window function is more like it, a problem remains, however: the frame definition of a window cannot depend on values of the current row. It can only count a given number of rows preceding or following with the ROWS clause.

您实际上并未聚合行,因此新的聚合FILTER子句不是正确的工具。窗口函数更像是它,但问题仍然存在:窗口的帧定义不能依赖于当前行的值。它只能计算ROWS子句之前或之后的给定行数。

To make that work, aggregate counts per day and LEFT JOIN to a full set of days in range. Then you can apply a window function:

为了完成这项工作,每天汇总计数,将LEFT JOIN汇总到范围内的整套天数。然后你可以应用一个窗口函数:

SELECT t.*, ct.ct_last4days
FROM  (
   SELECT *, sum(ct) OVER (ORDER BY dt ROWS 3 PRECEDING) AS ct_last4days
   FROM  (
      SELECT generate_series(min(dt), max(dt), interval '1 day')::date AS dt
      FROM   tbl t1
      ) d
   LEFT   JOIN (SELECT dt, count(*) AS ct FROM tbl GROUP BY 1) t USING (dt)
   ) ct
JOIN  tbl t USING (dt);

Omitting ORDER BY dt in the widow frame definition usually works, since the order is carried over from generate_series() in the subquery. But there are no guarantees in the SQL standard without explicit ORDER BY and it might break in more complex queries.

在寡妇框架定义中省略ORDER BY dt通常有效,因为顺序是从子查询中的generate_series()继承的。但是如果没有显式的ORDER BY,SQL标准就没有任何保证,它可能会在更复杂的查询中中断。

SQL Fiddle.

Related:

#2


1  

I don't think there is any syntax that means "current row" in an expression. The gram.y file for postgres makes a filter clause take just an a_expr, which is just the normal expression clauses. There is nothing specific to window functions or filter clauses in an expression. As far as I can find, the only current row notion in a window clause is for specifying the window frame boundaries. I don't think this gets you what you want.

我认为在表达式中没有任何语法意味着“当前行”。 postgres的gram.y文件使得一个过滤子句只带一个a_expr,它只是普通的表达式子句。表达式中没有特定于窗口函数或过滤器子句的内容。据我所知,window子句中唯一的当前行概念是用于指定窗口框架边界。我不认为这会让你得到你想要的东西。

It's possible that you could get some traction from an enclosing query:

您可以通过封闭查询获得一些牵引力:

http://www.postgresql.org/docs/current/static/sql-expressions.html

When an aggregate expression appears in a subquery (see Section 4.2.11 and Section 9.22), the aggregate is normally evaluated over the rows of the subquery. But an exception occurs if the aggregate's arguments (and filter_clause if any) contain only outer-level variables: the aggregate then belongs to the nearest such outer level, and is evaluated over the rows of that query.

当聚合表达式出现在子查询中时(参见第4.2.11节和第9.22节),通常会对子查询的行计算聚合。但是如果聚合的参数(和filter_clause,如果有的话)只包含外层变量,则会发生异常:聚合然后属于最近的这样的外层,并在该查询的行上进行求值。

but it's not obvious to me how.

但这对我来说并不明显。

#1


5  

You are not actually aggregating rows, so the new aggregate FILTER clause is not the right tool. A window function is more like it, a problem remains, however: the frame definition of a window cannot depend on values of the current row. It can only count a given number of rows preceding or following with the ROWS clause.

您实际上并未聚合行,因此新的聚合FILTER子句不是正确的工具。窗口函数更像是它,但问题仍然存在:窗口的帧定义不能依赖于当前行的值。它只能计算ROWS子句之前或之后的给定行数。

To make that work, aggregate counts per day and LEFT JOIN to a full set of days in range. Then you can apply a window function:

为了完成这项工作,每天汇总计数,将LEFT JOIN汇总到范围内的整套天数。然后你可以应用一个窗口函数:

SELECT t.*, ct.ct_last4days
FROM  (
   SELECT *, sum(ct) OVER (ORDER BY dt ROWS 3 PRECEDING) AS ct_last4days
   FROM  (
      SELECT generate_series(min(dt), max(dt), interval '1 day')::date AS dt
      FROM   tbl t1
      ) d
   LEFT   JOIN (SELECT dt, count(*) AS ct FROM tbl GROUP BY 1) t USING (dt)
   ) ct
JOIN  tbl t USING (dt);

Omitting ORDER BY dt in the widow frame definition usually works, since the order is carried over from generate_series() in the subquery. But there are no guarantees in the SQL standard without explicit ORDER BY and it might break in more complex queries.

在寡妇框架定义中省略ORDER BY dt通常有效,因为顺序是从子查询中的generate_series()继承的。但是如果没有显式的ORDER BY,SQL标准就没有任何保证,它可能会在更复杂的查询中中断。

SQL Fiddle.

Related:

#2


1  

I don't think there is any syntax that means "current row" in an expression. The gram.y file for postgres makes a filter clause take just an a_expr, which is just the normal expression clauses. There is nothing specific to window functions or filter clauses in an expression. As far as I can find, the only current row notion in a window clause is for specifying the window frame boundaries. I don't think this gets you what you want.

我认为在表达式中没有任何语法意味着“当前行”。 postgres的gram.y文件使得一个过滤子句只带一个a_expr,它只是普通的表达式子句。表达式中没有特定于窗口函数或过滤器子句的内容。据我所知,window子句中唯一的当前行概念是用于指定窗口框架边界。我不认为这会让你得到你想要的东西。

It's possible that you could get some traction from an enclosing query:

您可以通过封闭查询获得一些牵引力:

http://www.postgresql.org/docs/current/static/sql-expressions.html

When an aggregate expression appears in a subquery (see Section 4.2.11 and Section 9.22), the aggregate is normally evaluated over the rows of the subquery. But an exception occurs if the aggregate's arguments (and filter_clause if any) contain only outer-level variables: the aggregate then belongs to the nearest such outer level, and is evaluated over the rows of that query.

当聚合表达式出现在子查询中时(参见第4.2.11节和第9.22节),通常会对子查询的行计算聚合。但是如果聚合的参数(和filter_clause,如果有的话)只包含外层变量,则会发生异常:聚合然后属于最近的这样的外层,并在该查询的行上进行求值。

but it's not obvious to me how.

但这对我来说并不明显。