选择顶部是慢的,不管顺序如何。

时间:2022-10-22 09:39:45

I have a fairly complex query in SQL Server running against a view, in the form:

我在SQL Server中有一个非常复杂的查询,它针对一个视图运行,其形式是:

SELECT *
   FROM myview, foo, bar 
   WHERE shared=1 AND [joins and other stuff]
   ORDER BY sortcode;

The query plan as shown above shows a Sort operation just before the final SELECT, which is what I would expect. There are only 35 matching records, and the query takes well under 2 seconds.

如上所示的查询计划显示了在最终选择之前的排序操作,这是我所期望的。只有35条匹配记录,查询所需时间远低于2秒。

But if I add TOP 30, the query takes almost 3 minutes! Using SET ROWCOUNT is just as slow.

但如果我添加TOP 30,查询将花费近3分钟!使用SET ROWCOUNT同样慢。

Looking at the query plan, it now appears to sort all 2+ million records in myview before the joins and filters.

查看查询计划,它现在似乎在连接和筛选器之前对myview中的所有2+万条记录进行排序。

This "sorting" is shown on the query plan as an Index Scan on the sortcode index, a Clustered Index Seek on the main table, and a Nested Loop between them, all before the joins and filters.

这个“排序”在查询计划中显示为sortcode索引的索引扫描、主表的聚集索引查找和它们之间的嵌套循环,所有这些都在连接和过滤器之前。

How can I force SQL Server to SORT just before TOP, like it does when TOP isn't specified?

如何强制SQL Server在TOP之前排序,就像没有指定TOP时那样?

I don't think the construction of myview is the issue, but just in case, it is something like this:

我不认为myview的构造是问题所在,但以防万一,它是这样的:

CREATE VIEW myview AS
   SELECT columns..., sortcode, 0 as shared FROM mytable
   UNION ALL
   SELECT columns..., sortcode, 1 as shared FROM [anotherdb].dbo.mytable

The local mytable has a few thousand records, and mytable in the other database in the same MSSQL instance has a few million records. Both tables do have indexes on their respective sortcode column.

本地mytable有几千条记录,同一个MSSQL实例中的另一个数据库中的mytable有几百万条记录。两个表在各自的sortcode列上都有索引。

1 个解决方案

#1


8  

And so starts the unfortunate game of "trying to outsmart the optimizer (because it doesn't always know best)".

于是,一场不幸的游戏开始了:“试图比优化器更聪明(因为它并不总是知道最好的)”。

You can try putting the filtering portions into a subquery or CTE:

您可以尝试将过滤部分放入子查询或CTE:

SELECT TOP 30 *
FROM
   (SELECT *
   FROM myview, foo, bar 
   WHERE shared=1 AND [joins and other stuff]) t
ORDER BY sortcode;

Which may be enough to force it to filter first (but the optimizer gets "smarter" with each release, and can sometimes see through such shenanigans). Or you might have to go as far as putting this code into a UDF. If you write the UDF as a multistatement table-valued function, with the filtering inside, and then query that UDF with your TOP x/ORDER BY, you've pretty well forced the querying order (because SQL Server is currently unable to optimize around multistatement UDFs).

这可能足以迫使它首先过滤(但是优化器在每次发布时都变得“更聪明”,有时还能看穿这些诡计)。或者,您可能需要将这些代码放入UDF中。如果您将UDF编写为一个多语句表值函数,并在其中进行过滤,然后使用顶部的x/ORDER对UDF进行查询,那么您就很好地强制了查询顺序(因为SQL Server目前无法对多语句UDF进行优化)。


Of course, thinking about it, introducing the UDF is just a way of hiding what we're really doing - create a temp table, use one query to populate it (based on WHERE filters), then another query to find the TOP x from the temp table.

当然,考虑到它,引入UDF只是一种隐藏我们正在做的事情的方法——创建一个临时表,使用一个查询来填充它(基于过滤器),然后另一个查询从temp表中找到最上面的x。

#1


8  

And so starts the unfortunate game of "trying to outsmart the optimizer (because it doesn't always know best)".

于是,一场不幸的游戏开始了:“试图比优化器更聪明(因为它并不总是知道最好的)”。

You can try putting the filtering portions into a subquery or CTE:

您可以尝试将过滤部分放入子查询或CTE:

SELECT TOP 30 *
FROM
   (SELECT *
   FROM myview, foo, bar 
   WHERE shared=1 AND [joins and other stuff]) t
ORDER BY sortcode;

Which may be enough to force it to filter first (but the optimizer gets "smarter" with each release, and can sometimes see through such shenanigans). Or you might have to go as far as putting this code into a UDF. If you write the UDF as a multistatement table-valued function, with the filtering inside, and then query that UDF with your TOP x/ORDER BY, you've pretty well forced the querying order (because SQL Server is currently unable to optimize around multistatement UDFs).

这可能足以迫使它首先过滤(但是优化器在每次发布时都变得“更聪明”,有时还能看穿这些诡计)。或者,您可能需要将这些代码放入UDF中。如果您将UDF编写为一个多语句表值函数,并在其中进行过滤,然后使用顶部的x/ORDER对UDF进行查询,那么您就很好地强制了查询顺序(因为SQL Server目前无法对多语句UDF进行优化)。


Of course, thinking about it, introducing the UDF is just a way of hiding what we're really doing - create a temp table, use one query to populate it (based on WHERE filters), then another query to find the TOP x from the temp table.

当然,考虑到它,引入UDF只是一种隐藏我们正在做的事情的方法——创建一个临时表,使用一个查询来填充它(基于过滤器),然后另一个查询从temp表中找到最上面的x。