如何知道何时使用索引和哪种类型?

时间:2021-06-03 04:19:46

I've searched a bit and didn't see any similar question, so here goes.

我搜索了一下,没有看到任何类似的问题,所以这里。

How do you know when to put an index in a table? How do you decide which columns to include in the index? When should a clustered index be used?

你怎么知道什么时候在表中放一个索引?您如何确定要包含在索引中的列?应该何时使用聚集索引?

Can an index ever slow down the performance of select statements? How many indexes is too many and how big of a table do you need for it to benefit from an index?

索引是否会降低select语句的性能?有多少索引太多,你需要多大的表才能从索引中受益?

EDIT:

What about column data types? Is it ok to have an index on a varchar or datetime?

列数据类型怎么样?在varchar或datetime上有索引可以吗?

6 个解决方案

#1


3  

Well, the first question is easy:

那么,第一个问题很简单:

When should a clustered index be used?

应该何时使用聚集索引?

Always. Period. Except for a very few, rare, edge cases. A clustered index makes a table faster, for every operation. YES! It does. See Kim Tripp's excellent The Clustered Index Debate continues for background info. She also mentions her main criteria for a clustered index:

总是。期。除了极少数罕见的边缘情况。对于每个操作,聚簇索引使表更快。是!确实如此。请参阅Kim Tripp的优秀The Clustered Index辩论继续获取背景信息。她还提到了她对聚集索引的主要标准:

  • narrow
  • static (never changes)
  • 静态(永不改变)

  • unique
  • if ever possible: ever increasing
  • 如果可能的话:不断增加

INT IDENTITY fulfills this perfectly - GUID's do not. See GUID's as Primary Key for extensive background info.

INT IDENTITY完美地实现了这一点 - GUID不会。有关详细背景信息,请参阅GUID作为主键。

Why narrow? Because the clustering key is added to each and every index page of each and every non-clustered index on the same table (in order to be able to actually look up the data row, if needed). You don't want to have VARCHAR(200) in your clustering key....

为何缩小?因为聚簇键被添加到同一个表上的每个非聚集索引的每个索引页面(为了能够实际查找数据行,如果需要)。您不希望在群集密钥中使用VARCHAR(200)....

Why unique?? See above - the clustering key is the item and mechanism that SQL Server uses to uniquely find a data row. It has to be unique. If you pick a non-unique clustering key, SQL Server itself will add a 4-byte uniqueifier to your keys. Be careful of that!

为什么独特?请参阅上文 - 聚类键是SQL Server用于唯一查找数据行的项和机制。它必须是独一无二的。如果您选择一个非唯一的群集键,SQL Server本身将为您的键添加一个4字节的唯一键。小心那个!

Next: non-clustered indices. Basically there's one rule: any foreign key in a child table referencing another table should be indexed, it'll speed up JOINs and other operations.

下一篇:非聚集索引。基本上有一条规则:引用另一个表的子表中的任何外键都应该被索引,它将加速JOIN和其他操作。

Furthermore, any queries that have WHERE clauses are a good candidate - pick those first which are executed a lot. Put indices on columns that show up in WHERE clauses, in ORDER BY statements.

此外,任何具有WHERE子句的查询都是一个很好的选择 - 首先选择那些执行很多的子句。在ORDER BY语句中将索引放在WHERE子句中显示的列上。

Next: measure your system, check the DMV's (dynamic management views) for hints about unused or missing indices, and tweak your system over and over again. It's an ongoing process, you'll never be done! See here for info on those two DMV's (missing and unused indices).

下一步:测量您的系统,检查DMV(动态管理视图)以获取有关未使用或缺失索引的提示,并反复调整您的系统。这是一个持续的过程,你永远不会完成!有关这两个DMV(缺失和未使用的索引)的信息,请参见此处。

Another word of warning: with a truckload of indices, you can make any SELECT query go really really fast. But at the same time, INSERTs, UPDATEs and DELETEs which have to update all the indices involved might suffer. If you only ever SELECT - go nuts! Otherwise, it's a fine and delicate balancing act. You can always tweak a single query beyond belief - but the rest of your system might suffer in doing so. Don't over-index your database! Put a few good indices in place, check and observe how the system behaves, and then maybe add another one or two, and again: observe how the total system performance is affected by that.

另一个警告:使用大量索引,您可以使任何SELECT查询真的非常快。但与此同时,必须更新所有相关索引的INSERT,UPDATE和DELETE可能会受到影响。如果你只选择SELECT - 坚果!否则,这是一个精细而微妙的平衡行为。您可以随时调整单个查询 - 但系统的其余部分可能会受到影响。不要过度索引数据库!放置一些好的索引,检查并观察系统的行为,然后再添加一个或两个,再次:观察整体系统性能如何受到影响。

#2


1  

Rule of thumb is primary key (implied and defaults to clustered) and each foreign key column

经验法则是主键(隐含和默认为群集)和每个外键列

There is more but you could do worse than using SQL Server's missing index DMVs

还有更多,但你可能会比使用SQL Server缺少的索引DMV更糟糕

An index may slow down a SELECT if the optimiser makes a bad choice, and it is possible to have too many. Too many will slow writes but it's also possible to overlap indexes

如果优化器做出错误选择,索引可能会减慢SELECT,并且可能有太多。太多会减慢写入速度,但也可能重叠索引

#3


1  

Answering the ones I can I would say that every table, no matter how small, will always benefit from at least one index as there has to be at least one way in which you are interested in looking up the data; otherwise why store it?

回答那些我可以说的每个表,无论多小,总是会从至少一个索引中受益,因为必须至少有一种方式,你有兴趣查找数据;否则为什么存储呢?

A general rule for adding indexes would be if you need to find data in the table using a particular field, or set of fields. This leads on to how many indexes are too many, generally the more indexes you have the slower inserts and updates will be as they also have to modify the indexes but it all depends on how you use your data. If you need fast inserts then don't use too many. In reporting "read only" type data stores you can have a number of them to make all your lookups faster.

添加索引的一般规则是,如果需要使用特定字段或字段集在表中查找数据。这会导致有多少索引太多,通常你的插入和更新速度越慢,因为它们也必须修改索引,但这一切都取决于你如何使用数据。如果您需要快速插入,请不要使用太多。在报告“只读”类型数据存储时,您可以使用其中的一些来更快地进行所有查找。

Unfortunately there is no one rule to guide you on the number or type of indexes to use, although the query optimiser of your chosen DB can give hints based on the queries you are executing.

遗憾的是,没有一条规则可以指导您使用的索引的数量或类型,尽管您选择的数据库的查询优化器可以根据您正在执行的查询给出提示。

As to clustered indexes they are the Ace card you only get to use once, so choose carefully. It's worth calculating the selectivity of the field you are thinking of putting it on as it can be wasted to put it on something like a boolean field (contrived example) as the selectivity of the data is very low.

至于聚簇索引,它们是你只能使用一次的Ace卡,所以请仔细选择。值得计算你想要放置它的字段的选择性,因为它可能被浪费在像布尔字段(设计示例)这样的东西上,因为数据的选择性非常低。

#4


0  

This is really a very involved question, though a good starting place would be to index any column that you will filter results on. ie. If you often break products into groups by sale price, index the sale_price column of the products table to improve scan times for that query, etc.

这实际上是一个非常复杂的问题,尽管一个好的起点是索引任何要过滤结果的列。即。如果您经常按销售价格将产品分组,请索引products表的sale_price列以改善该查询的扫描时间等。

#5


0  

If you are querying based on the value in a column, you probably want to index that column.

如果您根据列中的值进行查询,则可能需要索引该列。

i.e.

SELECT a,b,c FROM MyTable WHERE x = 1

You would want an index on X.

你会想要一个关于X的索引。

Generally, I add indexes for columns which are frequently queried, and I add compound indexes when I'm querying on more than one column.

通常,我为经常查询的列添加索引,并在我查询多个列时添加复合索引。

Indexes won't hurt the performance of a SELECT, but they may slow down INSERTS (or UPDATES) if you have too many indexes columns per table.

索引不会损害SELECT的性能,但如果每个表的索引列太多,它们可能会减慢INSERTS(或UPDATES)的速度。

As a rule of thumb - start off by adding indexes when you find yourself saying WHERE a = 123 (in this case, an index for "a").

根据经验 - 当您发现自己说WHERE a = 123(在这种情况下,是“a”的索引)时,首先添加索引。

#6


0  

You should use an index on columns that you use for selection and ordering - i.e. the WHERE and ORDER BY clauses.

您应该对用于选择和排序的列使用索引 - 即WHERE和ORDER BY子句。

Indexes can slow down select statements if there are many of them and you are using WHERE and ORDER BY on columns that have not been indexed.

如果有多个语句并且您在未编制索引的列上使用WHERE和ORDER BY,则索引可以减慢select语句的速度。

As for size of table - several thousands rows and upwards would start showing real benefits to index usage.

至于表的大小 - 数千行及以上将开始显示索引使用的真正好处。

Having said that, there are automated tools to do this, and SQL server has an Database Tuning Advisor that will help with this.

话虽如此,有自动化工具可以做到这一点,SQL服务器有一个数据库调优顾问,将有助于此。

#1


3  

Well, the first question is easy:

那么,第一个问题很简单:

When should a clustered index be used?

应该何时使用聚集索引?

Always. Period. Except for a very few, rare, edge cases. A clustered index makes a table faster, for every operation. YES! It does. See Kim Tripp's excellent The Clustered Index Debate continues for background info. She also mentions her main criteria for a clustered index:

总是。期。除了极少数罕见的边缘情况。对于每个操作,聚簇索引使表更快。是!确实如此。请参阅Kim Tripp的优秀The Clustered Index辩论继续获取背景信息。她还提到了她对聚集索引的主要标准:

  • narrow
  • static (never changes)
  • 静态(永不改变)

  • unique
  • if ever possible: ever increasing
  • 如果可能的话:不断增加

INT IDENTITY fulfills this perfectly - GUID's do not. See GUID's as Primary Key for extensive background info.

INT IDENTITY完美地实现了这一点 - GUID不会。有关详细背景信息,请参阅GUID作为主键。

Why narrow? Because the clustering key is added to each and every index page of each and every non-clustered index on the same table (in order to be able to actually look up the data row, if needed). You don't want to have VARCHAR(200) in your clustering key....

为何缩小?因为聚簇键被添加到同一个表上的每个非聚集索引的每个索引页面(为了能够实际查找数据行,如果需要)。您不希望在群集密钥中使用VARCHAR(200)....

Why unique?? See above - the clustering key is the item and mechanism that SQL Server uses to uniquely find a data row. It has to be unique. If you pick a non-unique clustering key, SQL Server itself will add a 4-byte uniqueifier to your keys. Be careful of that!

为什么独特?请参阅上文 - 聚类键是SQL Server用于唯一查找数据行的项和机制。它必须是独一无二的。如果您选择一个非唯一的群集键,SQL Server本身将为您的键添加一个4字节的唯一键。小心那个!

Next: non-clustered indices. Basically there's one rule: any foreign key in a child table referencing another table should be indexed, it'll speed up JOINs and other operations.

下一篇:非聚集索引。基本上有一条规则:引用另一个表的子表中的任何外键都应该被索引,它将加速JOIN和其他操作。

Furthermore, any queries that have WHERE clauses are a good candidate - pick those first which are executed a lot. Put indices on columns that show up in WHERE clauses, in ORDER BY statements.

此外,任何具有WHERE子句的查询都是一个很好的选择 - 首先选择那些执行很多的子句。在ORDER BY语句中将索引放在WHERE子句中显示的列上。

Next: measure your system, check the DMV's (dynamic management views) for hints about unused or missing indices, and tweak your system over and over again. It's an ongoing process, you'll never be done! See here for info on those two DMV's (missing and unused indices).

下一步:测量您的系统,检查DMV(动态管理视图)以获取有关未使用或缺失索引的提示,并反复调整您的系统。这是一个持续的过程,你永远不会完成!有关这两个DMV(缺失和未使用的索引)的信息,请参见此处。

Another word of warning: with a truckload of indices, you can make any SELECT query go really really fast. But at the same time, INSERTs, UPDATEs and DELETEs which have to update all the indices involved might suffer. If you only ever SELECT - go nuts! Otherwise, it's a fine and delicate balancing act. You can always tweak a single query beyond belief - but the rest of your system might suffer in doing so. Don't over-index your database! Put a few good indices in place, check and observe how the system behaves, and then maybe add another one or two, and again: observe how the total system performance is affected by that.

另一个警告:使用大量索引,您可以使任何SELECT查询真的非常快。但与此同时,必须更新所有相关索引的INSERT,UPDATE和DELETE可能会受到影响。如果你只选择SELECT - 坚果!否则,这是一个精细而微妙的平衡行为。您可以随时调整单个查询 - 但系统的其余部分可能会受到影响。不要过度索引数据库!放置一些好的索引,检查并观察系统的行为,然后再添加一个或两个,再次:观察整体系统性能如何受到影响。

#2


1  

Rule of thumb is primary key (implied and defaults to clustered) and each foreign key column

经验法则是主键(隐含和默认为群集)和每个外键列

There is more but you could do worse than using SQL Server's missing index DMVs

还有更多,但你可能会比使用SQL Server缺少的索引DMV更糟糕

An index may slow down a SELECT if the optimiser makes a bad choice, and it is possible to have too many. Too many will slow writes but it's also possible to overlap indexes

如果优化器做出错误选择,索引可能会减慢SELECT,并且可能有太多。太多会减慢写入速度,但也可能重叠索引

#3


1  

Answering the ones I can I would say that every table, no matter how small, will always benefit from at least one index as there has to be at least one way in which you are interested in looking up the data; otherwise why store it?

回答那些我可以说的每个表,无论多小,总是会从至少一个索引中受益,因为必须至少有一种方式,你有兴趣查找数据;否则为什么存储呢?

A general rule for adding indexes would be if you need to find data in the table using a particular field, or set of fields. This leads on to how many indexes are too many, generally the more indexes you have the slower inserts and updates will be as they also have to modify the indexes but it all depends on how you use your data. If you need fast inserts then don't use too many. In reporting "read only" type data stores you can have a number of them to make all your lookups faster.

添加索引的一般规则是,如果需要使用特定字段或字段集在表中查找数据。这会导致有多少索引太多,通常你的插入和更新速度越慢,因为它们也必须修改索引,但这一切都取决于你如何使用数据。如果您需要快速插入,请不要使用太多。在报告“只读”类型数据存储时,您可以使用其中的一些来更快地进行所有查找。

Unfortunately there is no one rule to guide you on the number or type of indexes to use, although the query optimiser of your chosen DB can give hints based on the queries you are executing.

遗憾的是,没有一条规则可以指导您使用的索引的数量或类型,尽管您选择的数据库的查询优化器可以根据您正在执行的查询给出提示。

As to clustered indexes they are the Ace card you only get to use once, so choose carefully. It's worth calculating the selectivity of the field you are thinking of putting it on as it can be wasted to put it on something like a boolean field (contrived example) as the selectivity of the data is very low.

至于聚簇索引,它们是你只能使用一次的Ace卡,所以请仔细选择。值得计算你想要放置它的字段的选择性,因为它可能被浪费在像布尔字段(设计示例)这样的东西上,因为数据的选择性非常低。

#4


0  

This is really a very involved question, though a good starting place would be to index any column that you will filter results on. ie. If you often break products into groups by sale price, index the sale_price column of the products table to improve scan times for that query, etc.

这实际上是一个非常复杂的问题,尽管一个好的起点是索引任何要过滤结果的列。即。如果您经常按销售价格将产品分组,请索引products表的sale_price列以改善该查询的扫描时间等。

#5


0  

If you are querying based on the value in a column, you probably want to index that column.

如果您根据列中的值进行查询,则可能需要索引该列。

i.e.

SELECT a,b,c FROM MyTable WHERE x = 1

You would want an index on X.

你会想要一个关于X的索引。

Generally, I add indexes for columns which are frequently queried, and I add compound indexes when I'm querying on more than one column.

通常,我为经常查询的列添加索引,并在我查询多个列时添加复合索引。

Indexes won't hurt the performance of a SELECT, but they may slow down INSERTS (or UPDATES) if you have too many indexes columns per table.

索引不会损害SELECT的性能,但如果每个表的索引列太多,它们可能会减慢INSERTS(或UPDATES)的速度。

As a rule of thumb - start off by adding indexes when you find yourself saying WHERE a = 123 (in this case, an index for "a").

根据经验 - 当您发现自己说WHERE a = 123(在这种情况下,是“a”的索引)时,首先添加索引。

#6


0  

You should use an index on columns that you use for selection and ordering - i.e. the WHERE and ORDER BY clauses.

您应该对用于选择和排序的列使用索引 - 即WHERE和ORDER BY子句。

Indexes can slow down select statements if there are many of them and you are using WHERE and ORDER BY on columns that have not been indexed.

如果有多个语句并且您在未编制索引的列上使用WHERE和ORDER BY,则索引可以减慢select语句的速度。

As for size of table - several thousands rows and upwards would start showing real benefits to index usage.

至于表的大小 - 数千行及以上将开始显示索引使用的真正好处。

Having said that, there are automated tools to do this, and SQL server has an Database Tuning Advisor that will help with this.

话虽如此,有自动化工具可以做到这一点,SQL服务器有一个数据库调优顾问,将有助于此。