使用和不使用ORDER BY子句对分区进行分析计数

时间:2022-12-09 22:58:48

I don't understand why there are different results when using an ORDER BY clause in an analytic COUNT function.

我不明白为什么在分析COUNT函数中使用ORDER BY子句时会有不同的结果。

Using a simple example:

使用一个简单的例子:

with req as
 (select 1 as n, 'A' as cls
    from dual
  union
  select 2 as n, 'A' as cls
    from dual)
select req.*, count(*) over(partition by cls) as cnt from req;

gives the the following result:

给出以下结果:

N   CLS CNT
2   A   2
1   A   2

Whereas, when adding an ORDER BY in the analytic clause, the result is different!

然而,当在分析子句中添加ORDER BY时,结果是不同的!

with req as
 (select 1 as n, 'A' as cls
    from dual
  union
  select 2 as n, 'A' as cls
    from dual)
select req.*, count(*) over(partition by cls order by n) as cnt from req;

CNT column changed:

CNT专栏改变了:

N   CLS CNT
1   A   1
2   A   2

Can someone explain please?

有人可以解释一下吗?

Thanks

2 个解决方案

#1


5  

First, a link to docs. It's somewhat obscure, however.

首先,链接到docs。然而,这有点模糊。

Analytic clause consists of query_partition_clause, order_by_clause and windowing_clause. And, a really important thing about windowing_clause is

Analytic子句由query_partition_clause,order_by_clause和windowing_clause组成。而且,关于windowing_clause的一个非常重要的事情是

You cannot specify this clause unless you have specified the order_by_clause. Some window boundaries defined by the RANGE clause let you specify only one expression in the order_by_clause. Refer to "Restrictions on the ORDER BY Clause".

除非已指定order_by_clause,否则不能指定此子句。 RANGE子句定义的某些窗口边界允许您在order_by_clause中仅指定一个表达式。请参阅“ORDER BY子句的限制”。

But not only can you not use windowing_clause without the order_by_clause, they are tied together.

但是,如果没有order_by_clause,你不仅可以使用windowing_clause,而且它们是捆绑在一起的。

If you omit the windowing_clause entirely, then the default is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

如果您完全省略windowing_clause,则默认值为RANGE BETWEEN UNBOUNDED PRECEDING和CURRENT ROW。

The default windowing clause produces something like running total. COUNT returns 1 for first row, as there is only one row between the top of the window and the current row, 2 for the second row and so on.

默认的窗口子句产生类似于运行总计的内容。 COUNT为第一行返回1,因为窗口顶部和当前行之间只有一行,第二行有2行,依此类推。

So in your first query there is no windowing at all, but there is the default windowing in the second one.

因此,在您的第一个查询中根本没有窗口,但在第二个查询中有默认窗口。

And you can simulate the behavior of the first query by specifying fully unbounded window.

您可以通过指定完全*窗口来模拟第一个查询的行为。

with req as
 (select 1 as n, 'A' as cls
    from dual
  union
  select 2 as n, 'A' as cls
    from dual)
select req.*, count(*) over(partition by cls order by n RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as cnt from req;

Yep

N   CLS CNT
1   A   2
2   A   2

#2


2  

The easiest way to think about this - leaving the ORDER BY out is equivalent to "ordering" in a way that all rows in the partition are "equal" to each other. Indeed, you can get the same effect by explicitly adding the ORDER BY clause like this: ORDER BY 0 (or "order by" any constant expression), or even, more emphatically, ORDER BY NULL.

考虑这一点的最简单方法 - 将ORDER BY排除在外,等同于“排序”,使得分区中的所有行彼此“相等”。实际上,您可以通过显式添加ORDER BY子句来获得相同的效果:ORDER BY 0(或“order by”任何常量表达式),甚至更重要的是ORDER BY NULL。

Why you get the COUNT() or SUM() etc. for the entire partition has to do with the default windowing clause: RANGE between unbounded preceding and current row. "Range" (as opposed to "ROWS") means all rows "tied" with the current row are also included, even if they don't precede it. Since all rows are tied, this means the entire partition is included, no matter which row is "current."

为什么获得整个分区的COUNT()或SUM()等与默认的窗口子句有关:*前行和当前行之间的RANGE。 “范围”(与“ROWS”相对)意味着所有与当前行“绑定”的行也包括在内,即使它们不在它之前。由于所有行都是绑定的,这意味着包括整个分区,无论哪一行是“当前的”。

#1


5  

First, a link to docs. It's somewhat obscure, however.

首先,链接到docs。然而,这有点模糊。

Analytic clause consists of query_partition_clause, order_by_clause and windowing_clause. And, a really important thing about windowing_clause is

Analytic子句由query_partition_clause,order_by_clause和windowing_clause组成。而且,关于windowing_clause的一个非常重要的事情是

You cannot specify this clause unless you have specified the order_by_clause. Some window boundaries defined by the RANGE clause let you specify only one expression in the order_by_clause. Refer to "Restrictions on the ORDER BY Clause".

除非已指定order_by_clause,否则不能指定此子句。 RANGE子句定义的某些窗口边界允许您在order_by_clause中仅指定一个表达式。请参阅“ORDER BY子句的限制”。

But not only can you not use windowing_clause without the order_by_clause, they are tied together.

但是,如果没有order_by_clause,你不仅可以使用windowing_clause,而且它们是捆绑在一起的。

If you omit the windowing_clause entirely, then the default is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

如果您完全省略windowing_clause,则默认值为RANGE BETWEEN UNBOUNDED PRECEDING和CURRENT ROW。

The default windowing clause produces something like running total. COUNT returns 1 for first row, as there is only one row between the top of the window and the current row, 2 for the second row and so on.

默认的窗口子句产生类似于运行总计的内容。 COUNT为第一行返回1,因为窗口顶部和当前行之间只有一行,第二行有2行,依此类推。

So in your first query there is no windowing at all, but there is the default windowing in the second one.

因此,在您的第一个查询中根本没有窗口,但在第二个查询中有默认窗口。

And you can simulate the behavior of the first query by specifying fully unbounded window.

您可以通过指定完全*窗口来模拟第一个查询的行为。

with req as
 (select 1 as n, 'A' as cls
    from dual
  union
  select 2 as n, 'A' as cls
    from dual)
select req.*, count(*) over(partition by cls order by n RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as cnt from req;

Yep

N   CLS CNT
1   A   2
2   A   2

#2


2  

The easiest way to think about this - leaving the ORDER BY out is equivalent to "ordering" in a way that all rows in the partition are "equal" to each other. Indeed, you can get the same effect by explicitly adding the ORDER BY clause like this: ORDER BY 0 (or "order by" any constant expression), or even, more emphatically, ORDER BY NULL.

考虑这一点的最简单方法 - 将ORDER BY排除在外,等同于“排序”,使得分区中的所有行彼此“相等”。实际上,您可以通过显式添加ORDER BY子句来获得相同的效果:ORDER BY 0(或“order by”任何常量表达式),甚至更重要的是ORDER BY NULL。

Why you get the COUNT() or SUM() etc. for the entire partition has to do with the default windowing clause: RANGE between unbounded preceding and current row. "Range" (as opposed to "ROWS") means all rows "tied" with the current row are also included, even if they don't precede it. Since all rows are tied, this means the entire partition is included, no matter which row is "current."

为什么获得整个分区的COUNT()或SUM()等与默认的窗口子句有关:*前行和当前行之间的RANGE。 “范围”(与“ROWS”相对)意味着所有与当前行“绑定”的行也包括在内,即使它们不在它之前。由于所有行都是绑定的,这意味着包括整个分区,无论哪一行是“当前的”。