视图聚集索引查找超过50万行需要7分钟

时间:2022-06-01 18:41:47

Take a look at this execution plan: http://sdrv.ms/1agLg7K
It’s not estimated, it’s actual. From an actual execution that took roughly 30 minutes.

看看这个执行计划:http://sdrv。ms/1agLg7K不是估计的,而是实际的。从实际执行开始,大约需要30分钟。

Select the second statement (takes 47.8% of the total execution time – roughly 15 minutes).
Look at the top operation in that statement – View Clustered Index Seek over _Security_Tuple4. The operation costs 51.2% of the statement – roughly 7 minutes.

选择第二个语句(占用总执行时间的47.8%——大约15分钟)。查看该语句的顶部操作——视图聚集索引寻找_Security_Tuple4。手术费用为报表的51.2%,大约7分钟。

The view contains about 0.5M rows (for reference, log2(0.5M) ~= 19 – a mere 19 steps given the index tree node size is two, which in reality is probably higher).
The result of that operator is zero rows (doesn’t match the estimate, but never mind that for now).
Actual executions – zero.

该视图包含大约0.5M行(供参考,log2(0.5M) ~= 19—给定索引树节点大小的仅19个步骤,实际上可能更高)。该运算符的结果是零行(不匹配估计值,但现在不必考虑)。实际执行——零。

So the question is: how the bleep could that take seven minutes?! (and of course, how do I fix it?)

所以问题是:bleep怎么可能需要7分钟?(当然,我该如何修复呢?)


EDIT: Some clarification on what I'm asking here.
I am not interested in general performance-related advice, such as "look at indexes", "look at sizes", "parameter sniffing", "different execution plans for different data", etc.
I know all that already, I can do all that kind of analysis myself.

编辑:澄清一下我的问题。我对与性能相关的一般建议不感兴趣,比如“查看索引”、“查看大小”、“参数嗅探”、“针对不同数据的不同执行计划”等等。

What I really need is to know what could cause that one particular clustered index seek to be so slow, and then what could I do to speed it up.

我真正需要的是知道是什么原因导致一个特定的聚集索引变得如此缓慢,然后我能做什么来加速它。

Not the whole query.
Not any part of the query.
Just that one particular index seek.
END EDIT

不是整个查询。不是查询的任何部分。这只是一个特定的指数。最后编辑


Also note how the second and third most expensive operations are seeks over _Security_Tuple3 and _Security_Tuple2 respectively, and they only take 7.5% and 3.7% of time. Meanwhile, _Security_Tuple3 contains roughly 2.8M rows, which is six times that of _Security_Tuple4.

还要注意,第二和第三个最昂贵的操作分别是如何查找_Security_Tuple3和_Security_Tuple2,它们只需要花费7.5%和3.7%的时间。同时,_Security_Tuple3大约包含2.8M行,是_Security_Tuple4的6倍。

Also, some background:

此外,一些背景:

  1. This is the only database from this project that misbehaves. There are a couple dozen other databases of the same schema, none of them exhibit this problem.
  2. 这是该项目中惟一一个出现错误行为的数据库。有几十个其他的相同模式的数据库,没有一个显示出这个问题。
  3. The first time this problem was discovered, it turned out that the indexes were 99% fragmented. Rebuilding the indexes did speed it up, but not significantly: the whole query took 45 minutes before rebuild and 30 minutes after.
  4. 第一次发现这个问题时,发现指数是99%碎片化的。重新构建索引确实加快了速度,但并不显著:整个查询在重新构建之前花费了45分钟,而在重构之后花费了30分钟。
  5. While playing with the database, I have noticed that simple queries like “select count(*) from _Security_Tuple4” take several minutes. WTF?!
  6. 在使用数据库时,我注意到像“从_Security_Tuple4中选择count(*)”这样的简单查询需要花费几分钟的时间。WTF ? !
  7. However, they only took several minutes on the first run, and after that they were instant.
  8. 然而,他们在第一次跑的时候只花了几分钟,之后他们就立刻跑了。
  9. The problem is not connected to the particular server, neither to the particular SQL Server instance: if I back up the database and then restore it on another computer, the behavior remains the same.
  10. 问题没有连接到特定的服务器,也没有连接到特定的SQL服务器实例:如果我备份数据库,然后在另一台计算机上恢复它,那么行为将保持不变。

3 个解决方案

#1


3  

First I'd like to point out a little misconception here: although the delete statement is said to take nearly 48% of the entire execution, this does not have to mean it takes 48% of the time needed; in fact, the 51% assigned inside that part of the query plan most definitely should NOT be interpreted as taking 'half of the time' of the entire operation!

首先,我想在这里指出一个误解:尽管删除语句据说占用了整个执行的48%,但这并不意味着它需要花费48%的时间;事实上,在查询计划的那个部分中分配的51%绝对不应该被解释为占用整个操作的“一半时间”!

Anyway, going by your remark that it takes a couple of minutes to do a COUNT(*) of the table 'the first time' I'm inclined to say that you have an IO issue related to said table/view. Personally I don't like materialized views very much so I have no real experience with them and how they behave internally but normally I would suggest that fragmentation is causing its toll on the underlying storage system. The reason it works fast the second time is because it's much faster to access the pages from the cache than it was when fetching them from disk, especially when they are all over the place. (Are there any (max) fields in the view ?)

不管怎样,根据你的评论,计算“第一次”表的数量(*)需要几分钟,我倾向于说你有一个与表/视图相关的IO问题。就我个人而言,我不太喜欢物化视图,因此我对它们以及它们在内部的行为没有真正的经验,但通常我认为,碎片化正在对底层存储系统造成损害。第二次运行速度快的原因是,从缓存访问页面要比从磁盘获取页面快得多,特别是当它们到处都是时。视图中有(max)字段吗?

Anyway, to find out what is taking so long I'd suggest you rather take this code out of the trigger it's currently in, 'fake' an inserted and deleted table and then try running the queries again adding times-stamps and/or using some program like SQL Sentry Plan Explorer to see how long each part REALLY takes (it has a duration column when you run a script from within the program). It might well be that you're looking at the wrong part; experience shows that cost and actual execution times are not always as related as we'd like to think.

无论如何,找出是什么花这么长时间我建议你把这段代码从触发器目前,“假”一个插入和删除表,然后再次尝试运行查询添加times-stamps和/或使用SQL哨兵计划等项目浏览器看到每一部分真正需要多久(它有一个持续时间列从程序中运行一个脚本时)。很可能你看错了部分;经验表明,成本和实际执行时间并不总是像我们想的那样相关。

#2


1  

Observations include:

观察结果包括:

  1. Is this the biggest of these databases that you are working with? If so, size matters to the optimizer. It will make quite a different plan for large datasets versus smaller data sets.
  2. 这是你使用的最大的数据库吗?如果是这样,大小对优化器很重要。它将为大数据集和小数据集制定一个完全不同的计划。
  3. The estimated rows and the actual rows are quite divergent. This is most apparent on the fourth query. "delete c from @alternativeRoutes...." where the _Security_Tuple5 estimates returning 16 rows, but actually used 235,904 rows. For that many rows an Index Scan could be more performant than Index Seeks. Are the statistics on the table up to date or do they need to be updated?
  4. 估计的行和实际的行是完全不同的。这在第四个查询中最为明显。“删除c从@alternativeRoutes ....“_Security_Tuple5估计返回16行,但实际使用235904行。对于如此多的行,索引扫描可能比索引查找更有效。表上的统计数据是最新的还是需要更新?
  5. The "select count(*) from _Security_Tuple4" takes several minutes, the first time. The second time is instant. This is because the data is all now cached in memory (until it ages out) and the second query is fast.
  6. 第一次从_Security_Tuple4中选择count(*)需要几分钟。第二次是瞬间。这是因为数据现在都被缓存在内存中(直到它老化),第二个查询速度很快。
  7. Because the problem moves with the database then the statistics, any missing indexes, et cetera are in the database. I would also suggest checking that the indexes match with other databases using the same schema.
  8. 因为问题随着数据库移动,所以统计信息,所有缺失的索引等等都在数据库中。我还建议检查索引是否与使用相同模式的其他数据库匹配。

This is not a full analysis, but it gives you some things to look at.

这并不是一个完整的分析,但是它给了你一些东西。

#3


0  

Fyodor,

费奥多,

First:

第一:

The problem is not connected to the particular server, neither to the particular SQL Server instance: if I back up the database and then restore it on another computer, the behavior remains the same.

问题没有连接到特定的服务器,也没有连接到特定的SQL服务器实例:如果我备份数据库,然后在另一台计算机上恢复它,那么行为将保持不变。

I presume that you: a) run this query in isolated environment, b) the data is not under mutation.

我假设你:a)在隔离的环境中运行这个查询,b)数据没有被突变。

Is this correct?

这是正确的吗?

Second: post here your CREATE INDEX script. Do you have a funny FILLFACTOR? SORT_IN_TEMPDB?

第二:在这里发布您的创建索引脚本。你有有趣的填充因子吗?SORT_IN_TEMPDB吗?

Third: which type is your ParentId, ObjectId? int, smallint, uniqueidentifier, varchar?

第三:你的父母是哪一种?int,smallint uniqueidentifier,varchar ?

#1


3  

First I'd like to point out a little misconception here: although the delete statement is said to take nearly 48% of the entire execution, this does not have to mean it takes 48% of the time needed; in fact, the 51% assigned inside that part of the query plan most definitely should NOT be interpreted as taking 'half of the time' of the entire operation!

首先,我想在这里指出一个误解:尽管删除语句据说占用了整个执行的48%,但这并不意味着它需要花费48%的时间;事实上,在查询计划的那个部分中分配的51%绝对不应该被解释为占用整个操作的“一半时间”!

Anyway, going by your remark that it takes a couple of minutes to do a COUNT(*) of the table 'the first time' I'm inclined to say that you have an IO issue related to said table/view. Personally I don't like materialized views very much so I have no real experience with them and how they behave internally but normally I would suggest that fragmentation is causing its toll on the underlying storage system. The reason it works fast the second time is because it's much faster to access the pages from the cache than it was when fetching them from disk, especially when they are all over the place. (Are there any (max) fields in the view ?)

不管怎样,根据你的评论,计算“第一次”表的数量(*)需要几分钟,我倾向于说你有一个与表/视图相关的IO问题。就我个人而言,我不太喜欢物化视图,因此我对它们以及它们在内部的行为没有真正的经验,但通常我认为,碎片化正在对底层存储系统造成损害。第二次运行速度快的原因是,从缓存访问页面要比从磁盘获取页面快得多,特别是当它们到处都是时。视图中有(max)字段吗?

Anyway, to find out what is taking so long I'd suggest you rather take this code out of the trigger it's currently in, 'fake' an inserted and deleted table and then try running the queries again adding times-stamps and/or using some program like SQL Sentry Plan Explorer to see how long each part REALLY takes (it has a duration column when you run a script from within the program). It might well be that you're looking at the wrong part; experience shows that cost and actual execution times are not always as related as we'd like to think.

无论如何,找出是什么花这么长时间我建议你把这段代码从触发器目前,“假”一个插入和删除表,然后再次尝试运行查询添加times-stamps和/或使用SQL哨兵计划等项目浏览器看到每一部分真正需要多久(它有一个持续时间列从程序中运行一个脚本时)。很可能你看错了部分;经验表明,成本和实际执行时间并不总是像我们想的那样相关。

#2


1  

Observations include:

观察结果包括:

  1. Is this the biggest of these databases that you are working with? If so, size matters to the optimizer. It will make quite a different plan for large datasets versus smaller data sets.
  2. 这是你使用的最大的数据库吗?如果是这样,大小对优化器很重要。它将为大数据集和小数据集制定一个完全不同的计划。
  3. The estimated rows and the actual rows are quite divergent. This is most apparent on the fourth query. "delete c from @alternativeRoutes...." where the _Security_Tuple5 estimates returning 16 rows, but actually used 235,904 rows. For that many rows an Index Scan could be more performant than Index Seeks. Are the statistics on the table up to date or do they need to be updated?
  4. 估计的行和实际的行是完全不同的。这在第四个查询中最为明显。“删除c从@alternativeRoutes ....“_Security_Tuple5估计返回16行,但实际使用235904行。对于如此多的行,索引扫描可能比索引查找更有效。表上的统计数据是最新的还是需要更新?
  5. The "select count(*) from _Security_Tuple4" takes several minutes, the first time. The second time is instant. This is because the data is all now cached in memory (until it ages out) and the second query is fast.
  6. 第一次从_Security_Tuple4中选择count(*)需要几分钟。第二次是瞬间。这是因为数据现在都被缓存在内存中(直到它老化),第二个查询速度很快。
  7. Because the problem moves with the database then the statistics, any missing indexes, et cetera are in the database. I would also suggest checking that the indexes match with other databases using the same schema.
  8. 因为问题随着数据库移动,所以统计信息,所有缺失的索引等等都在数据库中。我还建议检查索引是否与使用相同模式的其他数据库匹配。

This is not a full analysis, but it gives you some things to look at.

这并不是一个完整的分析,但是它给了你一些东西。

#3


0  

Fyodor,

费奥多,

First:

第一:

The problem is not connected to the particular server, neither to the particular SQL Server instance: if I back up the database and then restore it on another computer, the behavior remains the same.

问题没有连接到特定的服务器,也没有连接到特定的SQL服务器实例:如果我备份数据库,然后在另一台计算机上恢复它,那么行为将保持不变。

I presume that you: a) run this query in isolated environment, b) the data is not under mutation.

我假设你:a)在隔离的环境中运行这个查询,b)数据没有被突变。

Is this correct?

这是正确的吗?

Second: post here your CREATE INDEX script. Do you have a funny FILLFACTOR? SORT_IN_TEMPDB?

第二:在这里发布您的创建索引脚本。你有有趣的填充因子吗?SORT_IN_TEMPDB吗?

Third: which type is your ParentId, ObjectId? int, smallint, uniqueidentifier, varchar?

第三:你的父母是哪一种?int,smallint uniqueidentifier,varchar ?