rowsBetween和rangeBetween之间有什么区别?

时间:2022-01-10 01:44:43

From the PySpark docs rangeBetween:

从PySpark docs rangeBetween:

rangeBetween(start, end)

Defines the frame boundaries, from start (inclusive) to end (inclusive).

定义从起点(包括)到结束(包括)的帧边界。

Both start and end are relative from the current row. For example, “0” means “current row”, while “-1” means one off before the current row, and “5” means the five off after the current row.

start和end都是当前行的相对值。例如,“0”表示“当前行”,而“-1”表示当前行之前的一个关闭,“5”表示当前行之后的五个关闭。

Parameters:

  • start – boundary start, inclusive. The frame is unbounded if this is -sys.maxsize (or lower).
  • 开始 - 边界开始,包括。如果这是-sys.maxsize(或更低),则该帧是*的。

  • end – boundary end, inclusive. The frame is unbounded if this is sys.maxsize (or higher). New in version 1.4.
  • 结束 - 边界端,包括端点。如果这是sys.maxsize(或更高),则该框架是*的。版本1.4中的新功能。

while rowsBetween

rowsBetween(start, end)

Defines the frame boundaries, from start (inclusive) to end (inclusive).

定义从起点(包括)到结束(包括)的帧边界。

Both start and end are relative positions from the current row. For example, “0” means “current row”, while “-1” means the row before the current row, and “5” means the fifth row after the current row.

start和end都是当前行的相对位置。例如,“0”表示“当前行”,而“-1”表示当前行之前的行,“5”表示当前行之后的第五行。

Parameters:

  • start – boundary start, inclusive. The frame is unbounded if this is -sys.maxsize (or lower).
  • 开始 - 边界开始,包括。如果这是-sys.maxsize(或更低),则该帧是*的。

  • end – boundary end, inclusive. The frame is unbounded if this is sys.maxsize (or higher). New in version 1.4.
  • 结束 - 边界端,包括端点。如果这是sys.maxsize(或更高),则该框架是*的。版本1.4中的新功能。

For rangeBetween how is "1 off" different from "1 row", for example?

对于rangeBetween例如,“1 off”与“1行”有何不同?

1 个解决方案

#1


12  

It is simple:

很简单:

  • ROWS BETWEEN doesn't care about the exact values. It cares only about the order of rows when computing frame.
  • ROWS BETWEEN并不关心确切的值。它只关心计算帧时的行顺序。

  • RANGE BETWEEN considers values when computing frame.
  • RANGE BETWEEN在计算帧时考虑值。

Let's use an example using two window definitions:

让我们使用两个窗口定义的示例:

  • ORDER BY x ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
  • 在前2行和当前行之间排序x行

  • ORDER BY x RANGE BETWEEN 2 PRECEDING AND CURRENT ROW
  • 在前2行和当前行之间按x范围排序

and data as

和数据为

+---+
|  x|
+---+
| 10|
| 20|
| 30|
| 31|
+---+

Assuming the current row is the one with value 31 for the first window following rows will be included (current one, and two preceding):

假设当前行是第一个窗口的值为31的行,将包含以下行(当前一个,前两个):

+---+----------------------------------------------------+
|  x|ORDER BY x ROWS BETWEEN 2  PRECEDING AND CURRENT ROW|
+---+----------------------------------------------------+
| 10|                                               false|
| 20|                                                true|
| 30|                                                true|
| 31|                                                true|
+---+----------------------------------------------------+

and for the second one following (current one, and all preceding where x >= 31 - 2):

对于后面的第二个(当前的一个,以及前面的所有,其中x> = 31 - 2):

+---+-----------------------------------------------------+
|  x|ORDER BY x RANGE BETWEEN 2  PRECEDING AND CURRENT ROW|
+---+-----------------------------------------------------+
| 10|                                                false|
| 20|                                                false|
| 30|                                                 true|
| 31|                                                 true|
+---+-----------------------------------------------------+

#1


12  

It is simple:

很简单:

  • ROWS BETWEEN doesn't care about the exact values. It cares only about the order of rows when computing frame.
  • ROWS BETWEEN并不关心确切的值。它只关心计算帧时的行顺序。

  • RANGE BETWEEN considers values when computing frame.
  • RANGE BETWEEN在计算帧时考虑值。

Let's use an example using two window definitions:

让我们使用两个窗口定义的示例:

  • ORDER BY x ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
  • 在前2行和当前行之间排序x行

  • ORDER BY x RANGE BETWEEN 2 PRECEDING AND CURRENT ROW
  • 在前2行和当前行之间按x范围排序

and data as

和数据为

+---+
|  x|
+---+
| 10|
| 20|
| 30|
| 31|
+---+

Assuming the current row is the one with value 31 for the first window following rows will be included (current one, and two preceding):

假设当前行是第一个窗口的值为31的行,将包含以下行(当前一个,前两个):

+---+----------------------------------------------------+
|  x|ORDER BY x ROWS BETWEEN 2  PRECEDING AND CURRENT ROW|
+---+----------------------------------------------------+
| 10|                                               false|
| 20|                                                true|
| 30|                                                true|
| 31|                                                true|
+---+----------------------------------------------------+

and for the second one following (current one, and all preceding where x >= 31 - 2):

对于后面的第二个(当前的一个,以及前面的所有,其中x> = 31 - 2):

+---+-----------------------------------------------------+
|  x|ORDER BY x RANGE BETWEEN 2  PRECEDING AND CURRENT ROW|
+---+-----------------------------------------------------+
| 10|                                                false|
| 20|                                                false|
| 30|                                                 true|
| 31|                                                 true|
+---+-----------------------------------------------------+