如何提高集群索引查找的性能

时间:2022-06-19 05:39:59

I'm trying to improve the performance on a query that is running very slowly. After going through the Actual Execution Plan; I found that a Clustered Index Seek was taking up 82%. Is there any way for me to improve the performance on an Index Seek? Below is an image of the problem Index Seek from the execution plan as well as the index and table it is using.

我试图改进运行速度很慢的查询的性能。完成实际执行计划后;我发现聚集索引查找占82%。对于我来说,有没有什么方法可以提高索引搜索的性能呢?下面是问题索引从执行计划以及它正在使用的索引和表中查找的图像。

alt text http://img340.imageshack.us/img340/1346/seek.png

alt文本http://img340.imageshack.us/img340/1346/seek.png

Index:

指数:

/****** Object:  Index [IX_Stu]    Script Date: 12/28/2009 11:11:43 ******/
CREATE CLUSTERED INDEX [IX_Stu] ON [dbo].[stu] 
(
 [StuKey] ASC
)WITH (PAD_INDEX  = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF) ON [PRIMARY]

Table (some columns omitted for brevity):

表(为简洁省略了一些列):

CREATE TABLE [dbo].[stu](
 [StuCertKey] [int] IDENTITY(1,1) NOT NULL,
 [StuKey] [int] NULL
 CONSTRAINT [PK_Stu] PRIMARY KEY NONCLUSTERED 
(
 [StuCertKey] ASC
)WITH (PAD_INDEX  = OFF, IGNORE_DUP_KEY = OFF, FILLFACTOR = 80) ON [PRIMARY]
) ON [PRIMARY]

9 个解决方案

#1


17  

I'm generalizing here, but...

我在这里概括,但是……

A clustered index seek is, for the most part, the best-case scenario. The only ways I can think of to improve performance would be:

聚集索引查找在很大程度上是最好的情况。我能想到的提高性能的唯一方法是:

  • Update the query to return fewer rows/columns, if possible;
  • 如果可能,更新查询以返回更少的行/列;
  • Defragment or rebuild the index;
  • 整理或重建索引;
  • Partition the index across multiple disks/servers.
  • 跨多个磁盘/服务器对索引进行分区。

If it's only returning 138 rows, and it's that slow... maybe it's being blocked by some other process? Are you testing this in isolation, or are other users/processes online at the same time? Or maybe it's even a hardware problem, like a disk failure.

如果它只返回138行,而且很慢……也许它被其他进程阻止了?您是在单独测试它,还是其他用户/进程同时在线?或者可能是硬件问题,比如磁盘故障。

#2


11  

Clustered Index seeks occur when non-clustered indexes are used and aren't necessarily bad.

当使用非聚集索引时,会出现聚集索引查找,而且不一定是坏索引。

Consider the following query:

考虑下面的查询:

SELECT s.StuKey, s.Name, s.Address, s.City, s.State FROM stu s WHERE State='TX'

If there is only a clustered index on StuKey, then Sql Server only has 1 option, it must scan the entire table looking for rows where State="TX' and return those rows.

如果在StuKey上只有一个聚集索引,那么Sql Server只有一个选项,它必须扫描整个表,查找状态="TX'并返回这些行的行。

If you add a non-clustered index on State

如果在状态上添加非聚集索引

CREATE INDEX IX_Stu_State on Stu (State)

Now Sql server has a new option. It can choose to seek using the non-clustered index, which will produce the rows where State='TX'. However, in order to get the remaining columns to return in the SELECT, it has to look up those columns by doing a clustered index seek for each row.

现在Sql server有了一个新选项。它可以选择使用非聚集索引进行查找,该索引将生成状态='TX'的行。但是,为了在SELECT中返回其余的列,它必须通过对每一行执行集群索引查找来查找这些列。

If you want to reduce the clustered index seeks, then you can make your index "covering" by including extra columns in it.

如果您想减少聚集索引的查找,那么您可以通过在索引中包含额外的列来使索引“覆盖”。

 CREATE INDEX IX_Stu_State2 on Stu (State) INCLUDE (name, address, city )

This index now contains all the columns needed to answer the query above. The query will do an index seek to return only the rows where State='TX', and the additional columns can be pulled out of the non-clustered index, so the clustered index seeks go away.

这个索引现在包含了回答上面查询所需的所有列。查询将执行一个索引查找,只返回状态='TX'的行,并且可以从非聚集索引中提取额外的列,因此聚集索引查找消失。

#3


7  

A clustered index range seek that returns 138 rows is not your problem.

一个聚集索引范围寻找返回138行不是您的问题。

Technically you can improve the seek performance by making the clustered index narrower:

技术上,您可以通过使聚集索引更窄来提高搜索性能:

  • evict all varlenght into a separate allocation unit by setting 'large value types out of row' to 1 and recreating the table from scratch).
  • 通过将“大值类型从行中”设置为1并从头重新创建表,将所有varlenght驱逐到单独的分配单元中)。
  • enable page compression (SQL 2008 EE only).
  • 启用页面压缩(仅适用于SQL 2008 EE)。

Both can have quite a dramatic impact on range seek time, as they reduce the IO and the need to hit physical reads. Of course, as usually, the result will vary on a big number of other factors, like what columns do you project (evicting a projected column into BLOB allocation unit may actually have adverse effects on certain queries). As a side note, usually fragmentation will have only a marginal impact on such a short range scan. Again, it depends.

两者都可以对范围搜索时间产生相当大的影响,因为它们减少了IO,减少了物理读取的需要。当然,通常情况下,结果会因大量其他因素而异,比如您的项目是什么列(将投影的列驱逐到BLOB分配单元可能会对某些查询产生不利影响)。作为补充说明,通常碎片化对这样的短程扫描只会产生很小的影响。再一次,这取决于。

But as I say, I highly doubt this is your true problem. You have only posted selected parts of the plan and the results of your own analysis. The true root cause may lay completely elsewhere.

但正如我所说,我非常怀疑这是否是你真正的问题。您只发布了计划的选定部分和您自己分析的结果。真正的根本原因可能完全在别处。

#4


3  

Thoughts...

的想法……

  • Why is IX_Stu clustered? Internally, SQL Server adds a 4 byte "uniqueifier" to non-unique clustered indexes. What is the justification? This also bloats your PK too

    为什么IX_Stu集群?在内部,SQL Server将一个4字节的“uniqueifier”添加到非唯一的集群索引中。理由是什么?这也会使你的PK膨胀

  • What is the actual query you are running?

    您正在运行的实际查询是什么?

  • Finally, why FILLFACTOR 80%?

    最后,为什么FILLFACTOR 80% ?

Edit:

编辑:

  • A "normal" FILLFACTOR would be 90%, but this is a rule of thumb only

    一个“正常”的填充因子是90%,但这只是一个经验法则。

  • An 11 join query? That's most likely your problem. What are your JOINs, WHERE clauses etc? What is the full text plan?

    11加入查询吗?那很可能是你的问题。您的连接是什么,WHERE子句等?什么是全文计划?

#5


1  

Have you tried some maintenance on this index? Like defrag it? Seems really strange that it costs THAT much (120.381). Index seek is the fastest index operation, shouldn't take that long. Can you post the query?

你试过这个索引的维护吗?喜欢整理磁盘碎片吗?看起来真的很奇怪,花费这么多(120.381)。索引查找是最快的索引操作,不应该花那么长时间。你能发布查询吗?

#6


1  

Some general advice: when I have to do query optimization, I start by writing out what I think the execution plan should be.

一些一般的建议:当我必须进行查询优化时,我首先要写出我认为执行计划应该是什么。

Once I've decided what I think the execution plan should be, I try to make the actual query fit this plan. The techniques to do this are different for each DBMS, and do not necessarily transfer from one to the other, or even, sometimes, between different versions of the DBMS.

一旦我确定了执行计划应该是什么,我就尝试使实际的查询符合这个计划。实现这一点的技术对于每个DBMS都是不同的,并不一定要在不同的DBMS版本之间,甚至有时候,在不同的DBMS版本之间进行转换。

The thing to keep in mind is that the DBMS can only execute one join at a time: it starts with two initial tables, joins those, and then takes the result of that operation and joins it to the next table. The goal at each step is to minimize the number of rows in the intermediate result set (more correctly, to minimize the number of blocks that have to be read to produce the intermediate results, but this generally means fewest rows).

要记住的是,DBMS只能一次执行一个连接:它从两个初始表开始,连接这些表,然后接受该操作的结果并将其连接到下一个表。每一步的目标是最小化中间结果集中的行数(更准确地说,最小化必须读取的块数以产生中间结果,但这通常意味着最少的行数)。

#7


1  

What happens if you hard-code your WHERE criteria, like this:

如果你硬编码你的WHERE标准,比如:

SELECT StuCertKey, StuKey FROM stu 
WHERE stuKey in (/* list 50 values of StuKey here */)

If it's still very slow, you have an internal problem of some kind. If it's faster, then the index isn't your bottleneck, it's the JOINs that you're doing to create the WHERE filter.

如果它仍然很慢,你就会有某种内部问题。如果速度更快,那么索引不是瓶颈,而是您创建WHERE过滤器的连接。

Note that SELECT * can be very slow if there are many large columns, and especially if there are BLOBs.

注意,如果有许多大的列,选择*可能非常慢,特别是如果有blob。

#8


1  

Check the index statictics.

检查索引的时候。

reCalculating the clustered-index statistics will solve the problem.

重新计算聚类指数统计将解决这个问题。

in my case, i was looking for 30 records in 40M recored. the execution plan says it's going through the clustered-index but it took about 200ms. and the index wasn't defragmented. after recalculating it's stats, it's getting done under 10ms!

在我的例子中,我正在寻找40米记录中的30张记录。执行计划说它正在通过聚集指数,但它花了大约200毫秒。指数并不是分散的。在重新计算它的统计数据后,它在10ms以下完成!

#9


0  

Rebuild the index, and calculate stats?

重建索引,并计算统计?

The only other way that I can think to speed it up is to partition the table, which may or may not be possible.

另一种加快速度的方法是对表进行分区,这可能是可能是也可能不是可能的。

#1


17  

I'm generalizing here, but...

我在这里概括,但是……

A clustered index seek is, for the most part, the best-case scenario. The only ways I can think of to improve performance would be:

聚集索引查找在很大程度上是最好的情况。我能想到的提高性能的唯一方法是:

  • Update the query to return fewer rows/columns, if possible;
  • 如果可能,更新查询以返回更少的行/列;
  • Defragment or rebuild the index;
  • 整理或重建索引;
  • Partition the index across multiple disks/servers.
  • 跨多个磁盘/服务器对索引进行分区。

If it's only returning 138 rows, and it's that slow... maybe it's being blocked by some other process? Are you testing this in isolation, or are other users/processes online at the same time? Or maybe it's even a hardware problem, like a disk failure.

如果它只返回138行,而且很慢……也许它被其他进程阻止了?您是在单独测试它,还是其他用户/进程同时在线?或者可能是硬件问题,比如磁盘故障。

#2


11  

Clustered Index seeks occur when non-clustered indexes are used and aren't necessarily bad.

当使用非聚集索引时,会出现聚集索引查找,而且不一定是坏索引。

Consider the following query:

考虑下面的查询:

SELECT s.StuKey, s.Name, s.Address, s.City, s.State FROM stu s WHERE State='TX'

If there is only a clustered index on StuKey, then Sql Server only has 1 option, it must scan the entire table looking for rows where State="TX' and return those rows.

如果在StuKey上只有一个聚集索引,那么Sql Server只有一个选项,它必须扫描整个表,查找状态="TX'并返回这些行的行。

If you add a non-clustered index on State

如果在状态上添加非聚集索引

CREATE INDEX IX_Stu_State on Stu (State)

Now Sql server has a new option. It can choose to seek using the non-clustered index, which will produce the rows where State='TX'. However, in order to get the remaining columns to return in the SELECT, it has to look up those columns by doing a clustered index seek for each row.

现在Sql server有了一个新选项。它可以选择使用非聚集索引进行查找,该索引将生成状态='TX'的行。但是,为了在SELECT中返回其余的列,它必须通过对每一行执行集群索引查找来查找这些列。

If you want to reduce the clustered index seeks, then you can make your index "covering" by including extra columns in it.

如果您想减少聚集索引的查找,那么您可以通过在索引中包含额外的列来使索引“覆盖”。

 CREATE INDEX IX_Stu_State2 on Stu (State) INCLUDE (name, address, city )

This index now contains all the columns needed to answer the query above. The query will do an index seek to return only the rows where State='TX', and the additional columns can be pulled out of the non-clustered index, so the clustered index seeks go away.

这个索引现在包含了回答上面查询所需的所有列。查询将执行一个索引查找,只返回状态='TX'的行,并且可以从非聚集索引中提取额外的列,因此聚集索引查找消失。

#3


7  

A clustered index range seek that returns 138 rows is not your problem.

一个聚集索引范围寻找返回138行不是您的问题。

Technically you can improve the seek performance by making the clustered index narrower:

技术上,您可以通过使聚集索引更窄来提高搜索性能:

  • evict all varlenght into a separate allocation unit by setting 'large value types out of row' to 1 and recreating the table from scratch).
  • 通过将“大值类型从行中”设置为1并从头重新创建表,将所有varlenght驱逐到单独的分配单元中)。
  • enable page compression (SQL 2008 EE only).
  • 启用页面压缩(仅适用于SQL 2008 EE)。

Both can have quite a dramatic impact on range seek time, as they reduce the IO and the need to hit physical reads. Of course, as usually, the result will vary on a big number of other factors, like what columns do you project (evicting a projected column into BLOB allocation unit may actually have adverse effects on certain queries). As a side note, usually fragmentation will have only a marginal impact on such a short range scan. Again, it depends.

两者都可以对范围搜索时间产生相当大的影响,因为它们减少了IO,减少了物理读取的需要。当然,通常情况下,结果会因大量其他因素而异,比如您的项目是什么列(将投影的列驱逐到BLOB分配单元可能会对某些查询产生不利影响)。作为补充说明,通常碎片化对这样的短程扫描只会产生很小的影响。再一次,这取决于。

But as I say, I highly doubt this is your true problem. You have only posted selected parts of the plan and the results of your own analysis. The true root cause may lay completely elsewhere.

但正如我所说,我非常怀疑这是否是你真正的问题。您只发布了计划的选定部分和您自己分析的结果。真正的根本原因可能完全在别处。

#4


3  

Thoughts...

的想法……

  • Why is IX_Stu clustered? Internally, SQL Server adds a 4 byte "uniqueifier" to non-unique clustered indexes. What is the justification? This also bloats your PK too

    为什么IX_Stu集群?在内部,SQL Server将一个4字节的“uniqueifier”添加到非唯一的集群索引中。理由是什么?这也会使你的PK膨胀

  • What is the actual query you are running?

    您正在运行的实际查询是什么?

  • Finally, why FILLFACTOR 80%?

    最后,为什么FILLFACTOR 80% ?

Edit:

编辑:

  • A "normal" FILLFACTOR would be 90%, but this is a rule of thumb only

    一个“正常”的填充因子是90%,但这只是一个经验法则。

  • An 11 join query? That's most likely your problem. What are your JOINs, WHERE clauses etc? What is the full text plan?

    11加入查询吗?那很可能是你的问题。您的连接是什么,WHERE子句等?什么是全文计划?

#5


1  

Have you tried some maintenance on this index? Like defrag it? Seems really strange that it costs THAT much (120.381). Index seek is the fastest index operation, shouldn't take that long. Can you post the query?

你试过这个索引的维护吗?喜欢整理磁盘碎片吗?看起来真的很奇怪,花费这么多(120.381)。索引查找是最快的索引操作,不应该花那么长时间。你能发布查询吗?

#6


1  

Some general advice: when I have to do query optimization, I start by writing out what I think the execution plan should be.

一些一般的建议:当我必须进行查询优化时,我首先要写出我认为执行计划应该是什么。

Once I've decided what I think the execution plan should be, I try to make the actual query fit this plan. The techniques to do this are different for each DBMS, and do not necessarily transfer from one to the other, or even, sometimes, between different versions of the DBMS.

一旦我确定了执行计划应该是什么,我就尝试使实际的查询符合这个计划。实现这一点的技术对于每个DBMS都是不同的,并不一定要在不同的DBMS版本之间,甚至有时候,在不同的DBMS版本之间进行转换。

The thing to keep in mind is that the DBMS can only execute one join at a time: it starts with two initial tables, joins those, and then takes the result of that operation and joins it to the next table. The goal at each step is to minimize the number of rows in the intermediate result set (more correctly, to minimize the number of blocks that have to be read to produce the intermediate results, but this generally means fewest rows).

要记住的是,DBMS只能一次执行一个连接:它从两个初始表开始,连接这些表,然后接受该操作的结果并将其连接到下一个表。每一步的目标是最小化中间结果集中的行数(更准确地说,最小化必须读取的块数以产生中间结果,但这通常意味着最少的行数)。

#7


1  

What happens if you hard-code your WHERE criteria, like this:

如果你硬编码你的WHERE标准,比如:

SELECT StuCertKey, StuKey FROM stu 
WHERE stuKey in (/* list 50 values of StuKey here */)

If it's still very slow, you have an internal problem of some kind. If it's faster, then the index isn't your bottleneck, it's the JOINs that you're doing to create the WHERE filter.

如果它仍然很慢,你就会有某种内部问题。如果速度更快,那么索引不是瓶颈,而是您创建WHERE过滤器的连接。

Note that SELECT * can be very slow if there are many large columns, and especially if there are BLOBs.

注意,如果有许多大的列,选择*可能非常慢,特别是如果有blob。

#8


1  

Check the index statictics.

检查索引的时候。

reCalculating the clustered-index statistics will solve the problem.

重新计算聚类指数统计将解决这个问题。

in my case, i was looking for 30 records in 40M recored. the execution plan says it's going through the clustered-index but it took about 200ms. and the index wasn't defragmented. after recalculating it's stats, it's getting done under 10ms!

在我的例子中,我正在寻找40米记录中的30张记录。执行计划说它正在通过聚集指数,但它花了大约200毫秒。指数并不是分散的。在重新计算它的统计数据后,它在10ms以下完成!

#9


0  

Rebuild the index, and calculate stats?

重建索引,并计算统计?

The only other way that I can think to speed it up is to partition the table, which may or may not be possible.

另一种加快速度的方法是对表进行分区,这可能是可能是也可能不是可能的。