如何使用SQL窗口函数计算聚合的百分比

时间:2021-07-19 22:59:10

I need to calculate percentages of various dimensions in a table. I'd like to simplify things by using window functions to calculate the denominator, however I am having an issue because the numerator has to be an aggregate as well.

我需要计算表中各种尺寸的百分比。我想通过使用窗口函数来计算分母来简化事情,但是我遇到了问题,因为分子也必须是聚合。

As a simple example, take the following table:

举个简单的例子,请看下表:

create temp table test (d1 text, d2 text, v numeric);
insert into test values ('a','x',5), ('a','y',5), ('a','y',10), ('b','x',20);

If I just want to calculate the share of each individual row out of d1, then windowing functions work fine:

如果我只想计算d1中每一行的份额,那么窗口函数可以正常工作:

select d1, d2, v/sum(v) over (partition by d1)
from test;

"b";"x";1.00
"a";"x";0.25
"a";"y";0.25
"a";"y";0.50

However, what I need to do is calculate the overall share for the sum of d2 out of d1. The output I am looking for is this:

但是,我需要做的是计算d1中d2总和的总份额。我正在寻找的输出是这样的:

"b";"x";1.00
"a";"x";0.25
"a";"y";0.75

So I try this:

所以我试试这个:

select d1, d2, sum(v)/sum(v) over (partition by d1)
from test
group by d1, d2;

However, now I get an error:

但是,现在我收到一个错误:

ERROR:  column "test.v" must appear in the GROUP BY clause or be used in an aggregate function

I'm assuming this is because it is complaining that the window function is not accounted for in the grouping clause, however windowing functions cannot be put in the grouping clause anyway.

我假设这是因为它抱怨在分组子句中没有考虑窗口函数,但是无论如何窗口函数都不能放在分组子句中。

This is using Greenplum 4.1, which is a fork of Postgresql 8.4 and shares the same windowing functions. Note that Greenplum cannot do correlated subqueries.

这是使用Greenplum 4.1,它是Postgresql 8.4的一个分支,并共享相同的窗口函数。请注意,Greenplum无法执行相关子查询。

2 个解决方案

#1


19  

I think what you are actually looking for is this:

我认为你真正想要的是:

SELECT d1, d2, sum(v)/sum(sum(v)) OVER (PARTITION BY d1) AS share
FROM   test
GROUP  BY d1, d2;

Produces the requested result.

产生请求的结果。

Window functions are applied after aggregate functions. The outer sum() in sum(sum(v)) is a window function in this example and is attached to the OVER ... clause, while the inner sum() is an aggregate.

窗口函数在聚合函数之后应用。 sum(sum(v))中的外部sum()是此示例中的窗函数,并附加到OVER ...子句,而内部sum()是聚合。

Effectively the same as:

实际上与:

WITH x AS (
    SELECT d1, d2, sum(v) AS sv
    FROM   test
    GROUP  BY d1, d2
    )
SELECT d1, d2, sv/sum(sv) OVER (PARTITION BY d1) AS share
FROM   x;

Or (without CTE):

或(没有CTE):

SELECT d1, d2, sv/sum(sv) OVER (PARTITION BY d1) AS share
FROM   (
    SELECT d1, d2, sum(v) AS sv
    FROM   test
    GROUP  BY d1, d2
    ) x;

Or @Mu's variant.

或@ Mu的变种。

Aside: Greenplum introduced correlated subqueries with version 4.2. See release notes.

旁白:Greenplum引入了4.2版的相关子查询。请参阅发行说明。

#2


1  

Do you need to do it all with window functions? Sounds like you just need to group the result you have by d1 and d2 and then sum the sums:

你需要用窗口功能完成所有这些吗?听起来你只需要将d1和d2的结果分组,然后求和:

select d1, d2, sum(p)
from (
    select d1, d2, v/sum(v) over (partition by d1) as p
    from test
) as dt
group by d1, d2

That gives me this:

这给了我这个:

 d1 | d2 |          sum           
----+----+------------------------
 a  | x  | 0.25000000000000000000
 a  | y  | 0.75000000000000000000
 b  | x  | 1.00000000000000000000

#1


19  

I think what you are actually looking for is this:

我认为你真正想要的是:

SELECT d1, d2, sum(v)/sum(sum(v)) OVER (PARTITION BY d1) AS share
FROM   test
GROUP  BY d1, d2;

Produces the requested result.

产生请求的结果。

Window functions are applied after aggregate functions. The outer sum() in sum(sum(v)) is a window function in this example and is attached to the OVER ... clause, while the inner sum() is an aggregate.

窗口函数在聚合函数之后应用。 sum(sum(v))中的外部sum()是此示例中的窗函数,并附加到OVER ...子句,而内部sum()是聚合。

Effectively the same as:

实际上与:

WITH x AS (
    SELECT d1, d2, sum(v) AS sv
    FROM   test
    GROUP  BY d1, d2
    )
SELECT d1, d2, sv/sum(sv) OVER (PARTITION BY d1) AS share
FROM   x;

Or (without CTE):

或(没有CTE):

SELECT d1, d2, sv/sum(sv) OVER (PARTITION BY d1) AS share
FROM   (
    SELECT d1, d2, sum(v) AS sv
    FROM   test
    GROUP  BY d1, d2
    ) x;

Or @Mu's variant.

或@ Mu的变种。

Aside: Greenplum introduced correlated subqueries with version 4.2. See release notes.

旁白:Greenplum引入了4.2版的相关子查询。请参阅发行说明。

#2


1  

Do you need to do it all with window functions? Sounds like you just need to group the result you have by d1 and d2 and then sum the sums:

你需要用窗口功能完成所有这些吗?听起来你只需要将d1和d2的结果分组,然后求和:

select d1, d2, sum(p)
from (
    select d1, d2, v/sum(v) over (partition by d1) as p
    from test
) as dt
group by d1, d2

That gives me this:

这给了我这个:

 d1 | d2 |          sum           
----+----+------------------------
 a  | x  | 0.25000000000000000000
 a  | y  | 0.75000000000000000000
 b  | x  | 1.00000000000000000000