在同一分区上应用多个窗口函数

时间:2022-06-01 16:34:13

Is it possible to apply multiple window functions to the same partition? (Correct me if I'm not using the right vocabulary)

是否可以将多个窗口函数应用于同一分区? (如果我没有使用正确的词汇,请纠正我)

For example you can do

例如,你可以做到

SELECT name, first_value() over (partition by name order by date) from table1

But is there a way to do something like:

但有没有办法做一些事情:

SELECT name, (first_value() as f, last_value() as l (partition by name order by date)) from table1

Where we are applying two functions onto the same window?

我们在同一个窗口上应用两个函数的位置?

Reference: http://postgresql.ro/docs/8.4/static/tutorial-window.html

参考:http://postgresql.ro/docs/8.4/static/tutorial-window.html

2 个解决方案

#1


13  

Can you not just use the window per selection

你能不能只根据选择使用窗口

Something like

就像是

SELECT  name, 
        first_value() OVER (partition by name order by date) as f, 
        last_value() OVER (partition by name order by date) as l 
from table1

Also from your reference you can do it like this

另外从你的参考,你可以这样做

SELECT sum(salary) OVER w, avg(salary) OVER w
FROM empsalary
WINDOW w AS (PARTITION BY depname ORDER BY salary DESC)

#2


13  

Warning : I don't delete this answer since it seems technically correct and therefore may be helpful, but beware that PARTITION BY bar ORDER BY foo is probably not what you want to do anyway. Indeed, aggregate functions won't compute the partition elements as a whole. That is, SELECT avg(foo) OVER (PARTITION BY bar ORDER BY foo) is not equivalent to SELECT avg(foo) OVER (PARTITION BY bar) (see proof at the end of the answer).

警告:我不删除这个答案,因为它似乎在技术上是正确的,因此可能会有所帮助,但要注意PARTITION BY bar ORDER BY foo可能不是你想要做的。实际上,聚合函数不会整体计算分区元素。也就是说,SELECT avg(foo)OVER(PARTITION BY bar ORDER BY foo)不等同于SELECT avg(foo)OVER(PARTITION BY bar)(请参阅答案末尾的证明)。

Though it doesn't improve performance per se, if you use multiple times the same partition, you probably want to use the second syntax proposed by astander, and not only because it's cheaper to write. Here is why.

虽然它本身并没有提高性能,但如果你多次使用相同的分区,你可能想要使用astander提出的第二种语法,而不仅仅是因为它的编写成本更低。这就是原因。

Consider the following query :

请考虑以下查询:

SELECT 
  array_agg(foo)
    OVER (PARTITION BY bar ORDER BY foo), 
  avg(baz)
    OVER (PARTITION BY bar ORDER BY foo) 
FROM 
  foobar;

Since in principle the ordering has no effect on the computation of the average, you might be tempted to use the following query instead (no ordering on the second partition) :

因为原则上排序对平均值的计算没有影响,所以您可能会尝试使用以下查询(在第二个分区上没有排序):

SELECT 
  array_agg(foo) 
    OVER (PARTITION BY bar ORDER BY foo), 
  avg(baz)
    OVER (PARTITION BY bar) 
FROM 
  foobar;

This is a big mistake, as it will take much longer. Proof :

这是一个很大的错误,因为它需要更长的时间。证明:

> EXPLAIN ANALYZE SELECT array_agg(foo) OVER (PARTITION BY bar ORDER BY foo), avg(baz) OVER (PARTITION BY bar ORDER BY foo) FROM foobar;
                                                           QUERY PLAN                                                        
---------------------------------------------------------------------------------------------------------------------------------
 WindowAgg  (cost=215781.92..254591.76 rows=1724882 width=12) (actual time=969.659..2353.865 rows=1724882 loops=1)
   ->  Sort  (cost=215781.92..220094.12 rows=1724882 width=12) (actual time=969.640..1083.039 rows=1724882 loops=1)
         Sort Key: bar, foo
         Sort Method: quicksort  Memory: 130006kB
         ->  Seq Scan on foobar  (cost=0.00..37100.82 rows=1724882 width=12) (actual time=0.027..393.815 rows=1724882 loops=1)
 Total runtime: 2458.969 ms
(6 lignes)

> EXPLAIN ANALYZE SELECT array_agg(foo) OVER (PARTITION BY bar ORDER BY foo), avg(baz) OVER (PARTITION BY bar) FROM foobar;
                                                              QUERY PLAN                                                           
---------------------------------------------------------------------------------------------------------------------------------------
 WindowAgg  (cost=215781.92..276152.79 rows=1724882 width=12) (actual time=938.733..2958.811 rows=1724882 loops=1)
   ->  WindowAgg  (cost=215781.92..250279.56 rows=1724882 width=12) (actual time=938.699..2033.172 rows=1724882 loops=1)
         ->  Sort  (cost=215781.92..220094.12 rows=1724882 width=12) (actual time=938.683..1062.568 rows=1724882 loops=1)
               Sort Key: bar, foo
               Sort Method: quicksort  Memory: 130006kB
               ->  Seq Scan on foobar  (cost=0.00..37100.82 rows=1724882 width=12) (actual time=0.028..377.299 rows=1724882 loops=1)
 Total runtime: 3060.041 ms
(7 lignes)

Now, if you are aware of this issue, of course you will use the same partition everywhere. But when you have ten times or more the same partition and you are updating it over days, it is quite easy to forget to add the ORDER BY clause on a partition which doesn't need it by itself.

现在,如果你知道这个问题,你当然会在任何地方使用相同的分区。但是当你有十倍或更多相同的分区并且你要在几天内更新它时,很容易忘记在一个不需要它的分区上添加ORDER BY子句。

Here comes the WINDOW syntax, which will prevent you from such careless mistakes (provided, of course, you're aware it's better to minimize the number of different window functions). The following is strictly equivalent (as far as I can tell from EXPLAIN ANALYZE) to the first query :

这里有WINDOW语法,它可以防止你出现这种粗心的错误(前提是,你知道最好尽量减少不同窗口函数的数量)。以下是第一个查询严格等同(据我可以从EXPLAIN ANALYZE告诉):

SELECT
  array_agg(foo)
    OVER qux,
  avg(baz)
    OVER qux
FROM
  foobar
WINDOW
  qux AS (PARTITION BY bar ORDER BY bar)

Post-warning update :

I understand the statement that "SELECT avg(foo) OVER (PARTITION BY bar ORDER BY foo) is not equivalent to SELECT avg(foo) OVER (PARTITION BY bar)" seems questionable, so here is an example :

我理解“SELECT avg(foo)OVER(PARTITION BY bar ORDER BY foo)不等同于SELECT avg(foo)OVER(PARTITION BY bar)”的说法“似乎有问题,所以这是一个例子:

# SELECT * FROM foobar;
 foo | bar 
-----+-----
   1 |   1
   2 |   2
   3 |   1
   4 |   2
(4 lines)

# SELECT array_agg(foo) OVER qux, avg(foo) OVER qux FROM foobar WINDOW qux AS (PARTITION BY bar);
 array_agg | avg 
-----------+-----
 {1,3}     |   2
 {1,3}     |   2
 {2,4}     |   3
 {2,4}     |   3
 (4 lines)

# SELECT array_agg(foo) OVER qux, avg(foo) OVER qux FROM foobar WINDOW qux AS (PARTITION BY bar ORDER BY foo);
 array_agg | avg 
-----------+-----
 {1}       |   1
 {1,3}     |   2
 {2}       |   2
 {2,4}     |   3
(4 lines)

#1


13  

Can you not just use the window per selection

你能不能只根据选择使用窗口

Something like

就像是

SELECT  name, 
        first_value() OVER (partition by name order by date) as f, 
        last_value() OVER (partition by name order by date) as l 
from table1

Also from your reference you can do it like this

另外从你的参考,你可以这样做

SELECT sum(salary) OVER w, avg(salary) OVER w
FROM empsalary
WINDOW w AS (PARTITION BY depname ORDER BY salary DESC)

#2


13  

Warning : I don't delete this answer since it seems technically correct and therefore may be helpful, but beware that PARTITION BY bar ORDER BY foo is probably not what you want to do anyway. Indeed, aggregate functions won't compute the partition elements as a whole. That is, SELECT avg(foo) OVER (PARTITION BY bar ORDER BY foo) is not equivalent to SELECT avg(foo) OVER (PARTITION BY bar) (see proof at the end of the answer).

警告:我不删除这个答案,因为它似乎在技术上是正确的,因此可能会有所帮助,但要注意PARTITION BY bar ORDER BY foo可能不是你想要做的。实际上,聚合函数不会整体计算分区元素。也就是说,SELECT avg(foo)OVER(PARTITION BY bar ORDER BY foo)不等同于SELECT avg(foo)OVER(PARTITION BY bar)(请参阅答案末尾的证明)。

Though it doesn't improve performance per se, if you use multiple times the same partition, you probably want to use the second syntax proposed by astander, and not only because it's cheaper to write. Here is why.

虽然它本身并没有提高性能,但如果你多次使用相同的分区,你可能想要使用astander提出的第二种语法,而不仅仅是因为它的编写成本更低。这就是原因。

Consider the following query :

请考虑以下查询:

SELECT 
  array_agg(foo)
    OVER (PARTITION BY bar ORDER BY foo), 
  avg(baz)
    OVER (PARTITION BY bar ORDER BY foo) 
FROM 
  foobar;

Since in principle the ordering has no effect on the computation of the average, you might be tempted to use the following query instead (no ordering on the second partition) :

因为原则上排序对平均值的计算没有影响,所以您可能会尝试使用以下查询(在第二个分区上没有排序):

SELECT 
  array_agg(foo) 
    OVER (PARTITION BY bar ORDER BY foo), 
  avg(baz)
    OVER (PARTITION BY bar) 
FROM 
  foobar;

This is a big mistake, as it will take much longer. Proof :

这是一个很大的错误,因为它需要更长的时间。证明:

> EXPLAIN ANALYZE SELECT array_agg(foo) OVER (PARTITION BY bar ORDER BY foo), avg(baz) OVER (PARTITION BY bar ORDER BY foo) FROM foobar;
                                                           QUERY PLAN                                                        
---------------------------------------------------------------------------------------------------------------------------------
 WindowAgg  (cost=215781.92..254591.76 rows=1724882 width=12) (actual time=969.659..2353.865 rows=1724882 loops=1)
   ->  Sort  (cost=215781.92..220094.12 rows=1724882 width=12) (actual time=969.640..1083.039 rows=1724882 loops=1)
         Sort Key: bar, foo
         Sort Method: quicksort  Memory: 130006kB
         ->  Seq Scan on foobar  (cost=0.00..37100.82 rows=1724882 width=12) (actual time=0.027..393.815 rows=1724882 loops=1)
 Total runtime: 2458.969 ms
(6 lignes)

> EXPLAIN ANALYZE SELECT array_agg(foo) OVER (PARTITION BY bar ORDER BY foo), avg(baz) OVER (PARTITION BY bar) FROM foobar;
                                                              QUERY PLAN                                                           
---------------------------------------------------------------------------------------------------------------------------------------
 WindowAgg  (cost=215781.92..276152.79 rows=1724882 width=12) (actual time=938.733..2958.811 rows=1724882 loops=1)
   ->  WindowAgg  (cost=215781.92..250279.56 rows=1724882 width=12) (actual time=938.699..2033.172 rows=1724882 loops=1)
         ->  Sort  (cost=215781.92..220094.12 rows=1724882 width=12) (actual time=938.683..1062.568 rows=1724882 loops=1)
               Sort Key: bar, foo
               Sort Method: quicksort  Memory: 130006kB
               ->  Seq Scan on foobar  (cost=0.00..37100.82 rows=1724882 width=12) (actual time=0.028..377.299 rows=1724882 loops=1)
 Total runtime: 3060.041 ms
(7 lignes)

Now, if you are aware of this issue, of course you will use the same partition everywhere. But when you have ten times or more the same partition and you are updating it over days, it is quite easy to forget to add the ORDER BY clause on a partition which doesn't need it by itself.

现在,如果你知道这个问题,你当然会在任何地方使用相同的分区。但是当你有十倍或更多相同的分区并且你要在几天内更新它时,很容易忘记在一个不需要它的分区上添加ORDER BY子句。

Here comes the WINDOW syntax, which will prevent you from such careless mistakes (provided, of course, you're aware it's better to minimize the number of different window functions). The following is strictly equivalent (as far as I can tell from EXPLAIN ANALYZE) to the first query :

这里有WINDOW语法,它可以防止你出现这种粗心的错误(前提是,你知道最好尽量减少不同窗口函数的数量)。以下是第一个查询严格等同(据我可以从EXPLAIN ANALYZE告诉):

SELECT
  array_agg(foo)
    OVER qux,
  avg(baz)
    OVER qux
FROM
  foobar
WINDOW
  qux AS (PARTITION BY bar ORDER BY bar)

Post-warning update :

I understand the statement that "SELECT avg(foo) OVER (PARTITION BY bar ORDER BY foo) is not equivalent to SELECT avg(foo) OVER (PARTITION BY bar)" seems questionable, so here is an example :

我理解“SELECT avg(foo)OVER(PARTITION BY bar ORDER BY foo)不等同于SELECT avg(foo)OVER(PARTITION BY bar)”的说法“似乎有问题,所以这是一个例子:

# SELECT * FROM foobar;
 foo | bar 
-----+-----
   1 |   1
   2 |   2
   3 |   1
   4 |   2
(4 lines)

# SELECT array_agg(foo) OVER qux, avg(foo) OVER qux FROM foobar WINDOW qux AS (PARTITION BY bar);
 array_agg | avg 
-----------+-----
 {1,3}     |   2
 {1,3}     |   2
 {2,4}     |   3
 {2,4}     |   3
 (4 lines)

# SELECT array_agg(foo) OVER qux, avg(foo) OVER qux FROM foobar WINDOW qux AS (PARTITION BY bar ORDER BY foo);
 array_agg | avg 
-----------+-----
 {1}       |   1
 {1,3}     |   2
 {2}       |   2
 {2,4}     |   3
(4 lines)