在SQL Server 2012中使用FIRST_VALUE()时出现意外结果

时间:2022-02-16 22:57:54

When I use FIRST_VALUE on a data set that I construct by hand I get one result, and when I use it on a data set that results from a left join, I get a different result - even though the data sets appear to me to contain the exact same data values. I've reproduced the issue with a simple data set below.

当我在手工构造的数据集上使用FIRST_VALUE时,我得到一个结果,当我在由左连接产生的数据集上使用它时,我得到了不同的结果 - 即使数据集在我看来包含完全相同的数据值。我用下面的简单数据集重现了这个问题。

Can someone tell me if I have misunderstood something?

有人能告诉我,如果我误解了什么吗?

This SQL produces the expected result, that FIRST_VALUE is NULL and LAST_VALUE is 30.

此SQL生成预期结果,FIRST_VALUE为NULL且LAST_VALUE为30。

SELECT
  agroup,
  aval,
  FIRST_VALUE(aval) OVER (PARTITION BY agroup ORDER BY aval ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) fv,
  LAST_VALUE(aval) OVER (PARTITION BY agroup ORDER BY aval ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) lv
FROM
(
  SELECT 1 agroup, 10 aval
  UNION ALL SELECT 1, NULL
  UNION ALL SELECT 1, 30
) T

This SQL uses a LEFT JOIN that results in the same data set as above, but FIRST_VALUE appears to ignore the NULL.

此SQL使用LEFT JOIN导致与上面相同的数据集,但FIRST_VALUE似乎忽略NULL。

SELECT 
  agroup,
  aval,
  FIRST_VALUE(aval) OVER (PARTITION BY agroup ORDER BY aval ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) fv,
  LAST_VALUE(aval) OVER (PARTITION BY agroup ORDER BY aval ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) lv
FROM
(
  SELECT 
    T1.agroup,
    T1.akey,
    T2.aval 
  FROM 
  (
    SELECT 1 agroup, 1 akey
    UNION ALL SELECT 1, 2
    UNION ALL SELECT 1, 3
  ) T1
  LEFT JOIN
  (
    SELECT 1 akey, 10 aval
    UNION ALL SELECT 3,30
  ) T2 ON T1.akey = T2.akey
) T

I can also show that the left join behavior is different when using a table variable vs. a CTE. When using a CTE to generate the data, FIRST_VALUE ignores the NULL. Using the exact same SQL but putting the results in a table variable or a temporary table results in the NULL being taken into account.

我还可以显示使用表变量与CTE时左连接行为是不同的。使用CTE生成数据时,FIRST_VALUE忽略NULL。使用完全相同的SQL但将结果放在表变量或临时表中会导致将NULL考虑在内。

With a CTE the SQL Server results don't include NULL in the FIRST_VALUE determination:

使用CTE,SQL Server结果在FIRST_VALUE确定中不包含NULL:

WITH T AS
(
  SELECT 
    T1.agroup,
    T1.akey,
    T2.aval 
  FROM 
  (
    SELECT 1 agroup, 1 akey
    UNION ALL SELECT 1, 2
    UNION ALL SELECT 1, 3
  ) T1
  LEFT JOIN
  (
    SELECT 1 akey, 10 aval
    UNION ALL SELECT 3,30
  ) T2 ON T1.akey = T2.akey
)

SELECT 
  agroup,
  aval,
  FIRST_VALUE(aval) OVER (PARTITION BY agroup ORDER BY aval ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) fv,
  LAST_VALUE(aval) OVER (PARTITION BY agroup ORDER BY aval ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) lv
FROM
 T

But with a table variable, it does:

但是使用表变量,它会:

DECLARE @T TABLE (agroup INT,akey INT,aval INT)

INSERT INTO
  @T
SELECT 
  T1.agroup,
  T1.akey,
  T2.aval 
FROM 
(
  SELECT 1 agroup, 1 akey
  UNION ALL SELECT 1, 2
  UNION ALL SELECT 1, 3
) T1
LEFT JOIN
(
  SELECT 1 akey, 10 aval
  UNION ALL SELECT 3,30
) T2 ON T1.akey = T2.akey


SELECT 
agroup,
aval,
FIRST_VALUE(aval) OVER (PARTITION BY agroup ORDER BY aval ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) fv,
LAST_VALUE(aval) OVER (PARTITION BY agroup ORDER BY aval ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) lv
FROM
@T

2 个解决方案

#1


7  

The provided examples show very clearly that there is an inconsistency in the implementation of the FIRST_VALUE() analytic function.

提供的示例非常清楚地表明FIRST_VALUE()分析函数的实现存在不一致。

Depending on whether the underlying table in the FROM clause is a base table (or temporary or a table variable or even a derived table created on the fly) in one case and a derived table (or cte) created by the LEFT JOIN of two created on the fly tables in the second case, the results are different. Seems like the NULL values are ignored in the 2nd case or treated as high values.

取决于FROM子句中的基础表是一个基表(或临时或表变量,甚至是动态创建的派生表),还是由两个LEFT JOIN创建的派生表(或cte)在第二种情况下的飞行表上,结果是不同的。似乎在第二种情况下忽略NULL值或将其视为高值。

And they shouldn't be different, because the result of a SQL query should not depend on how the FROM clause gets the values of the table it provides to the SELECT clause and also because documentation of the OVER clause clearly states how NULL values should be treated:

并且它们不应该不同,因为SQL查询的结果不应该取决于FROM子句如何获取它为SELECT子句提供的表的值,还因为OVER子句的文档清楚地说明NULL值应该如何处理:

order_by_expression

order_by_expression

Specifies a column or expression on which to sort. order_by_expression can only refer to columns made available by the FROM clause. An integer cannot be specified to represent a column name or alias.

指定要排序的列或表达式。 order_by_expression只能引用FROM子句提供的列。不能指定整数来表示列名或别名。

...

...

ASC | DESC

ASC | DESC

Specifies that the values in the specified column should be sorted in ascending or descending order. ASC is the default sort order. Null values are treated as the lowest possible values.

指定指定列中的值应按升序或降序排序。 ASC是默认排序顺序。空值被视为可能的最低值。

So, the correct results - according to the SQL-Server documentation - are the ones that do not ignore NULL values. Any other result should not happen and since it does happen, it is a bug.

因此,根据SQL-Server文档,正确的结果是不忽略NULL值的结果。不应该发生任何其他结果,因为它确实发生了,这是一个错误。

I suggested you test in the most recent version (and not only in the RTM) as it may have been identified and corrected in some service pack or update and if it's still there (or if you don't have a newrer version available) to submit this as a bug in the Connect site.

我建议您在最新版本(而不仅仅是在RTM中)进行测试,因为它可能已经在某些服务包或更新中被识别和更正,如果它仍然存在(或者如果您没有可用的新版本)将此提交为Connect站点中的错误。


Update

更新

For future reference, the bug was submitted by the OP. The link is: Connect item and (our) @Aaron Bertrand has commented there that it also appears in most current SQL 2014 builds.

为了将来参考,该错误由OP提交。链接是:连接项目和(我们的)@Aaron Bertrand评论说它也出现在大多数当前的SQL 2014版本中。

#2


-1  

A bit of a late answer to this post, but one to share nevertheless.

对这篇文章有点迟到的回答,不过要分享一下。

You can use the order by flag to "demote" the null values.

您可以使用order by flag来“降级”空值。

So in your case ... you can use

所以在你的情况下...你可以使用

... FIRST_VALUE(aval) OVER (PARTITION BY agroup ORDER BY (iif(aval is null, 1,0)), aval ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) fv ...

... FIRST_VALUE(aval)OVER(分组为agroup ORDER BY(iif(aval为null,1,0)),aval ROWS在UNBOUNDED PRECEDING和UNBOUNDED FOLLOWING之间)fv ...

(Note that I use the value 1 for null values as it should sort the field ascending, so the non-null values will take precedence)

(请注意,我将值1用于空值,因为它应该对字段进行排序,因此非空值将优先)

Cheers - LA.

干杯 - 洛杉矶。

#1


7  

The provided examples show very clearly that there is an inconsistency in the implementation of the FIRST_VALUE() analytic function.

提供的示例非常清楚地表明FIRST_VALUE()分析函数的实现存在不一致。

Depending on whether the underlying table in the FROM clause is a base table (or temporary or a table variable or even a derived table created on the fly) in one case and a derived table (or cte) created by the LEFT JOIN of two created on the fly tables in the second case, the results are different. Seems like the NULL values are ignored in the 2nd case or treated as high values.

取决于FROM子句中的基础表是一个基表(或临时或表变量,甚至是动态创建的派生表),还是由两个LEFT JOIN创建的派生表(或cte)在第二种情况下的飞行表上,结果是不同的。似乎在第二种情况下忽略NULL值或将其视为高值。

And they shouldn't be different, because the result of a SQL query should not depend on how the FROM clause gets the values of the table it provides to the SELECT clause and also because documentation of the OVER clause clearly states how NULL values should be treated:

并且它们不应该不同,因为SQL查询的结果不应该取决于FROM子句如何获取它为SELECT子句提供的表的值,还因为OVER子句的文档清楚地说明NULL值应该如何处理:

order_by_expression

order_by_expression

Specifies a column or expression on which to sort. order_by_expression can only refer to columns made available by the FROM clause. An integer cannot be specified to represent a column name or alias.

指定要排序的列或表达式。 order_by_expression只能引用FROM子句提供的列。不能指定整数来表示列名或别名。

...

...

ASC | DESC

ASC | DESC

Specifies that the values in the specified column should be sorted in ascending or descending order. ASC is the default sort order. Null values are treated as the lowest possible values.

指定指定列中的值应按升序或降序排序。 ASC是默认排序顺序。空值被视为可能的最低值。

So, the correct results - according to the SQL-Server documentation - are the ones that do not ignore NULL values. Any other result should not happen and since it does happen, it is a bug.

因此,根据SQL-Server文档,正确的结果是不忽略NULL值的结果。不应该发生任何其他结果,因为它确实发生了,这是一个错误。

I suggested you test in the most recent version (and not only in the RTM) as it may have been identified and corrected in some service pack or update and if it's still there (or if you don't have a newrer version available) to submit this as a bug in the Connect site.

我建议您在最新版本(而不仅仅是在RTM中)进行测试,因为它可能已经在某些服务包或更新中被识别和更正,如果它仍然存在(或者如果您没有可用的新版本)将此提交为Connect站点中的错误。


Update

更新

For future reference, the bug was submitted by the OP. The link is: Connect item and (our) @Aaron Bertrand has commented there that it also appears in most current SQL 2014 builds.

为了将来参考,该错误由OP提交。链接是:连接项目和(我们的)@Aaron Bertrand评论说它也出现在大多数当前的SQL 2014版本中。

#2


-1  

A bit of a late answer to this post, but one to share nevertheless.

对这篇文章有点迟到的回答,不过要分享一下。

You can use the order by flag to "demote" the null values.

您可以使用order by flag来“降级”空值。

So in your case ... you can use

所以在你的情况下...你可以使用

... FIRST_VALUE(aval) OVER (PARTITION BY agroup ORDER BY (iif(aval is null, 1,0)), aval ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) fv ...

... FIRST_VALUE(aval)OVER(分组为agroup ORDER BY(iif(aval为null,1,0)),aval ROWS在UNBOUNDED PRECEDING和UNBOUNDED FOLLOWING之间)fv ...

(Note that I use the value 1 for null values as it should sort the field ascending, so the non-null values will take precedence)

(请注意,我将值1用于空值,因为它应该对字段进行排序,因此非空值将优先)

Cheers - LA.

干杯 - 洛杉矶。