SQL Server性能和聚簇索引值

时间:2022-10-25 02:47:03

I have a table myTable with a unique clustered index myId with fill factor 100% Its an integer, starting at zero (but its not an identity column for the table) I need to add a new type of row to the table. It might be nice if I could distinguish these rows by using negative values of myId.

我有一个表myTable,带有一个唯一的聚簇索引myId,填充因子为100%它是一个整数,从零开始(但它不是表的标识列)我需要在表中添加一种新类型的行。如果我可以通过使用myId的负值来区分这些行,那可能会很好。

Would having negative values incur extra page splitting and slow down inserts?

负值会导致额外的页面拆分和减慢插入吗?

Extra Background: This table exists as part of the etl for a data warehouse that gathers data from disparate systems. I now want to accomodate a new type of data. A way for me to do this is to reserve negative ids for this new data, which will thus be automatically clustered. This will also avoid major key changes or extra columns in the schema.

额外背景:此表作为数据仓库的etl的一部分存在,该数据仓库从不同的系统收集数据。我现在想要容纳一种新类型的数据。我这样做的一种方法是为这些新数据保留负数,因此将自动聚类。这还将避免架构中的主要键更改或额外列。

Answer Summary: Fill factors of 100% will noramlly slow down the inserts. But not inserts that happen sequentially, and that includes the sequntial negative inserts.

答案摘要:100%的填充因子将极慢地降低插入速度。但不是按顺序发生的插入,而是包括相等的负插入。

5 个解决方案

#1


Besides the practical administration points you already got and the suspect dubious use of negative ids to represent data model attributes, there is also a valid question here: give a table with int ids from 0 to N, inserting new negative values where would those value go and would they cause additional splits?

除了你已经得到的实际管理点和可疑的使用负id来表示数据模型属性之外,这里还有一个有效的问题:给出一个从0到N的int id的表,插入新的负值,这些值会去哪里他们会引起额外的分裂?

The initial rows will be placed on the clustered index leaf pages, row with id 0 on first page and row with id N on the last page, filling the pages in between. When the first row with value of -1 is inserted, this will sort ahead of row with id 0 and as such will add a new page to the tree (will allocate an extent of 8 pages actually, but that is a different point) and will link the page in front of the leaf level linked list of pages. This will NOT cause a page split of the former first page. On further inserts of values -2, -3 etc they will go to the same new page and they will be inserted in the proper position (-2 ahead of -1, -3 ahead of -2 etc) until the page fills. Further inserts will add a new page ahead of this one, that will accommodate further new values. Inserts of positive values N+1, N+2 will go at the last page and be placed in it until it fills, then they'll cause a new page to be added and will start filling that page.

初始行将放置在聚簇索引叶页上,第一页上ID为0的行和最后一页上ID为N的行,填充其间的页面。当插入值为-1的第一行时,这将在id为0的行之前排序,因此将向树添加一个新页面(实际上将分配8个页面,但这是一个不同的点)和将链接页面在叶级链接页面前面。这不会导致前一页的页面拆分。在进一步插入值-2,-3等时,它们将转到相同的新页面,它们将被插入到正确的位置(-2之前的-2,-3之前的-2等)直到页面填充。进一步插入将在此页面之前添加新页面,以适应更多新值。插入正值N + 1,N + 2将放在最后一页并放入其中直到它填满,然后它们将导致添加新页面并开始填充该页面。

So basically the answer is this: inserts at either end of a clustered index should not cause page splits. Page splits can be caused only by inserts between two existing keys. This actually extends to the non-leaf pages as well, an index at either end of the cluster may not split a non-leaf page either. I do not discuss here the impact of updates of course (they can can cause splits if the increase the length of a variable length column).

所以基本上答案是这样的:在聚簇索引的任何一端插入都不应该导致页面拆分。只能通过两个现有密钥之间的插入来导致页面拆分。这实际上也扩展到非叶子页面,集群任一端的索引也可能不会拆分非叶子页面。我不在这里讨论更新的影响当然(如果增加可变长度列的长度,它们可能会导致拆分)。

Lately has been a lot of talk in the SQL Server blogosphere about the potential performance problems of page splits, but I must warn against going to unnecessary extremes to avoid them. Page splits are a normal index operation. If you find yourself in an environment where the page split performance hit is visible during inserts, then you'll be probably worse hit by the 'mitigation' measures because you'll create artificial page latch hot spots that are far worse as they'll affect every insert. What is true is that prolonged operation with frequent splits will result in high fragmentation which impacts the data access time. I say that is best mitigated with off-peak periodical index maintenance operation (reorganize). Avoid premature optimizations, always measure first.

最近在SQL Server博客圈中有很多关于页面拆分的潜在性能问题的讨论,但我必须警告不要采取不必要的极端来避免它们。页面拆分是正常的索引操作。如果您发现自己处于插入过程中可以看到页面拆分性能的环境中,那么您可能会受到“缓解”措施的影响更大,因为您将创建更糟糕的人工页面锁定热点影响每个插入。真实的是,频繁拆分的长时间操作会导致高度碎片,从而影响数据访问时间。我说通过非高峰期刊索引维护操作(重组)可以最好地减轻这种情况。避免过早优化,始终先测量。

#2


Not enough to notice for any reasonable system.

没有足够的注意到任何合理的系统。

Page splits happen when a page is full, either at the start or at the end of the range. As long as you regular index maintenance...

页面分割发生在页面已满时,无论是在范围的开头还是结尾。只要你定期维护索引......

Edit, after Fill factor comments:

在填充因子注释后编辑:

After a page split wth 90 or 100 FF, each page will be 50% full. FF = 100 only means an insert will happen sooner (probably 1st insert).

页面拆分90或100 FF后,每页将满50%。 FF = 100仅表示插入将更快发生(可能是第一次插入)。

With a strictly monotonically increasing (or decreasing) key (+ve or -ve), a page split happens at either end of the range.

使用严格单调递增(或递减)的键(+ ve或-ve),页面拆分发生在范围的任一端。

However, from BOL, FILLFACTOR

但是,来自BOL,FILLFACTOR

Fill

Adding Data to the End of the Table

将数据添加到表的末尾

A nonzero fill factor other than 0 or 100 can be good for performance if the new data is evenly distributed throughout the table. However, if all the data is added to the end of the table, the empty space in the index pages will not be filled. For example, if the index key column is an IDENTITY column, the key for new rows is always increasing and the index rows are logically added to the end of the index. If existing rows will be updated with data that lengthens the size of the rows, use a fill factor of less than 100. The extra bytes on each page will help to minimize page splits caused by extra length in the rows.

如果新数据在整个表中均匀分布,则0或100以外的非零填充因子可以有利于提高性能。但是,如果将所有数据添加到表的末尾,则不会填充索引页中的空白区域。例如,如果索引键列是IDENTITY列,则新行的键始终在增加,索引行在逻辑上会添加到索引的末尾。如果现有行将使用延长行大小的数据进行更新,请使用小于100的填充因子。每个页面上的额外字节将有助于最大程度地减少行中额外长度导致的页面拆分。

So does, fillfactor matter for strictly monotonic keys...? Especially if it's low volume writes

那么,fillfactor对于严格单调的密钥很重要......?特别是如果它的低音量写入

#3


No, not at all. Negative values are just as valid as INTegers as positive ones. No problem. Basically, internally, they're all just 4 bytes worth of zeroes and ones :-)

一点都不。负值与INTegers一样有效。没问题。基本上,在内部,它们都只有4个字节的零和一个:-)

Marc

#4


You are asking the wrong question!

你问的是错误的问题!

If you create a clustered index that has a fillfactor of 100%, every time a record is inserted, deleted or even modified, page splits can occur because there is likely no room on the existing index data page to write the change.

如果创建具有100%fillfactor的聚簇索引,则每次插入,删除甚至修改记录时,都可能发生页面拆分,因为现有索引数据页面上可能没有空间来编写更改。

Even with regular index maintenance, a fill factor of 100% is counter productive on a table where you know inserts are going to be performed. A more usual value would be 90%.

即使使用常规索引维护,填充因子100%也会在您知道将要执行插入的表上产生相反的效果。更常见的价值是90%。

#5


I'm concerned that this post may have taken a wrong turn, in that there seems to be an underlying design issue at work here, irrespective of the resultant page splits.

我担心这篇文章可能出现了错误的转折,因为这里似乎存在一个潜在的设计问题,无论结果页面分裂如何。

Why do you need to introduce a negative ID?

为什么需要引入负面ID?

An integer primary key, for example, should uniquely indentify a row, it's sign should be irrelevant. I suspect that there may be a definition issue with the primary key for your table if this is not the case.

例如,整数主键应该唯一地标识一行,它的符号应该是无关紧要的。我怀疑如果不是这种情况,表的主键可能存在定义问题。

If you need to flag/identify the newly inserted records then create a column specifically for this purpose.

如果需要标记/标识新插入的记录,则专门为此目的创建一个列。

This solution would be ideal because you may then be able to ensure that your primary key is sequential (perhaps using an Identity data type, although not essential), thereby avoiding issues with page splits (on insert) altogether.

这个解决方案是理想的,因为您可以确保您的主键是顺序的(可能使用Identity数据类型,尽管不是必需的),从而避免了页面拆分(插入时)的问题。

Also, to confirm if I may, a fill factor of 100% for a clustered index primary key (identity integer for example), will not cause page splits for sequential inserts!

另外,为了确认我是否可以,聚簇索引主键(例如,标识整数)的填充因子为100%,不会导致顺序插入的页面拆分!

#1


Besides the practical administration points you already got and the suspect dubious use of negative ids to represent data model attributes, there is also a valid question here: give a table with int ids from 0 to N, inserting new negative values where would those value go and would they cause additional splits?

除了你已经得到的实际管理点和可疑的使用负id来表示数据模型属性之外,这里还有一个有效的问题:给出一个从0到N的int id的表,插入新的负值,这些值会去哪里他们会引起额外的分裂?

The initial rows will be placed on the clustered index leaf pages, row with id 0 on first page and row with id N on the last page, filling the pages in between. When the first row with value of -1 is inserted, this will sort ahead of row with id 0 and as such will add a new page to the tree (will allocate an extent of 8 pages actually, but that is a different point) and will link the page in front of the leaf level linked list of pages. This will NOT cause a page split of the former first page. On further inserts of values -2, -3 etc they will go to the same new page and they will be inserted in the proper position (-2 ahead of -1, -3 ahead of -2 etc) until the page fills. Further inserts will add a new page ahead of this one, that will accommodate further new values. Inserts of positive values N+1, N+2 will go at the last page and be placed in it until it fills, then they'll cause a new page to be added and will start filling that page.

初始行将放置在聚簇索引叶页上,第一页上ID为0的行和最后一页上ID为N的行,填充其间的页面。当插入值为-1的第一行时,这将在id为0的行之前排序,因此将向树添加一个新页面(实际上将分配8个页面,但这是一个不同的点)和将链接页面在叶级链接页面前面。这不会导致前一页的页面拆分。在进一步插入值-2,-3等时,它们将转到相同的新页面,它们将被插入到正确的位置(-2之前的-2,-3之前的-2等)直到页面填充。进一步插入将在此页面之前添加新页面,以适应更多新值。插入正值N + 1,N + 2将放在最后一页并放入其中直到它填满,然后它们将导致添加新页面并开始填充该页面。

So basically the answer is this: inserts at either end of a clustered index should not cause page splits. Page splits can be caused only by inserts between two existing keys. This actually extends to the non-leaf pages as well, an index at either end of the cluster may not split a non-leaf page either. I do not discuss here the impact of updates of course (they can can cause splits if the increase the length of a variable length column).

所以基本上答案是这样的:在聚簇索引的任何一端插入都不应该导致页面拆分。只能通过两个现有密钥之间的插入来导致页面拆分。这实际上也扩展到非叶子页面,集群任一端的索引也可能不会拆分非叶子页面。我不在这里讨论更新的影响当然(如果增加可变长度列的长度,它们可能会导致拆分)。

Lately has been a lot of talk in the SQL Server blogosphere about the potential performance problems of page splits, but I must warn against going to unnecessary extremes to avoid them. Page splits are a normal index operation. If you find yourself in an environment where the page split performance hit is visible during inserts, then you'll be probably worse hit by the 'mitigation' measures because you'll create artificial page latch hot spots that are far worse as they'll affect every insert. What is true is that prolonged operation with frequent splits will result in high fragmentation which impacts the data access time. I say that is best mitigated with off-peak periodical index maintenance operation (reorganize). Avoid premature optimizations, always measure first.

最近在SQL Server博客圈中有很多关于页面拆分的潜在性能问题的讨论,但我必须警告不要采取不必要的极端来避免它们。页面拆分是正常的索引操作。如果您发现自己处于插入过程中可以看到页面拆分性能的环境中,那么您可能会受到“缓解”措施的影响更大,因为您将创建更糟糕的人工页面锁定热点影响每个插入。真实的是,频繁拆分的长时间操作会导致高度碎片,从而影响数据访问时间。我说通过非高峰期刊索引维护操作(重组)可以最好地减轻这种情况。避免过早优化,始终先测量。

#2


Not enough to notice for any reasonable system.

没有足够的注意到任何合理的系统。

Page splits happen when a page is full, either at the start or at the end of the range. As long as you regular index maintenance...

页面分割发生在页面已满时,无论是在范围的开头还是结尾。只要你定期维护索引......

Edit, after Fill factor comments:

在填充因子注释后编辑:

After a page split wth 90 or 100 FF, each page will be 50% full. FF = 100 only means an insert will happen sooner (probably 1st insert).

页面拆分90或100 FF后,每页将满50%。 FF = 100仅表示插入将更快发生(可能是第一次插入)。

With a strictly monotonically increasing (or decreasing) key (+ve or -ve), a page split happens at either end of the range.

使用严格单调递增(或递减)的键(+ ve或-ve),页面拆分发生在范围的任一端。

However, from BOL, FILLFACTOR

但是,来自BOL,FILLFACTOR

Fill

Adding Data to the End of the Table

将数据添加到表的末尾

A nonzero fill factor other than 0 or 100 can be good for performance if the new data is evenly distributed throughout the table. However, if all the data is added to the end of the table, the empty space in the index pages will not be filled. For example, if the index key column is an IDENTITY column, the key for new rows is always increasing and the index rows are logically added to the end of the index. If existing rows will be updated with data that lengthens the size of the rows, use a fill factor of less than 100. The extra bytes on each page will help to minimize page splits caused by extra length in the rows.

如果新数据在整个表中均匀分布,则0或100以外的非零填充因子可以有利于提高性能。但是,如果将所有数据添加到表的末尾,则不会填充索引页中的空白区域。例如,如果索引键列是IDENTITY列,则新行的键始终在增加,索引行在逻辑上会添加到索引的末尾。如果现有行将使用延长行大小的数据进行更新,请使用小于100的填充因子。每个页面上的额外字节将有助于最大程度地减少行中额外长度导致的页面拆分。

So does, fillfactor matter for strictly monotonic keys...? Especially if it's low volume writes

那么,fillfactor对于严格单调的密钥很重要......?特别是如果它的低音量写入

#3


No, not at all. Negative values are just as valid as INTegers as positive ones. No problem. Basically, internally, they're all just 4 bytes worth of zeroes and ones :-)

一点都不。负值与INTegers一样有效。没问题。基本上,在内部,它们都只有4个字节的零和一个:-)

Marc

#4


You are asking the wrong question!

你问的是错误的问题!

If you create a clustered index that has a fillfactor of 100%, every time a record is inserted, deleted or even modified, page splits can occur because there is likely no room on the existing index data page to write the change.

如果创建具有100%fillfactor的聚簇索引,则每次插入,删除甚至修改记录时,都可能发生页面拆分,因为现有索引数据页面上可能没有空间来编写更改。

Even with regular index maintenance, a fill factor of 100% is counter productive on a table where you know inserts are going to be performed. A more usual value would be 90%.

即使使用常规索引维护,填充因子100%也会在您知道将要执行插入的表上产生相反的效果。更常见的价值是90%。

#5


I'm concerned that this post may have taken a wrong turn, in that there seems to be an underlying design issue at work here, irrespective of the resultant page splits.

我担心这篇文章可能出现了错误的转折,因为这里似乎存在一个潜在的设计问题,无论结果页面分裂如何。

Why do you need to introduce a negative ID?

为什么需要引入负面ID?

An integer primary key, for example, should uniquely indentify a row, it's sign should be irrelevant. I suspect that there may be a definition issue with the primary key for your table if this is not the case.

例如,整数主键应该唯一地标识一行,它的符号应该是无关紧要的。我怀疑如果不是这种情况,表的主键可能存在定义问题。

If you need to flag/identify the newly inserted records then create a column specifically for this purpose.

如果需要标记/标识新插入的记录,则专门为此目的创建一个列。

This solution would be ideal because you may then be able to ensure that your primary key is sequential (perhaps using an Identity data type, although not essential), thereby avoiding issues with page splits (on insert) altogether.

这个解决方案是理想的,因为您可以确保您的主键是顺序的(可能使用Identity数据类型,尽管不是必需的),从而避免了页面拆分(插入时)的问题。

Also, to confirm if I may, a fill factor of 100% for a clustered index primary key (identity integer for example), will not cause page splits for sequential inserts!

另外,为了确认我是否可以,聚簇索引主键(例如,标识整数)的填充因子为100%,不会导致顺序插入的页面拆分!