如何加快简单连接

时间:2023-02-02 02:43:39

I am no good at SQL.

我不擅长SQL。

I am looking for a way to speed up a simple join like this:

我正在寻找一种方法来加速像这样的简单连接:

SELECT
    E.expressionID,
    A.attributeName,
    A.attributeValue
FROM 
    attributes A
JOIN
    expressions E
ON 
    E.attributeId = A.attributeId

I am doing this dozens of thousands times and it's taking more and more as the table gets bigger.

我这样做了成千上万次,而且随着桌子变大,花费越来越多。

I am thinking indexes - If I was to speed up selects on the single tables I'd probably put nonclustered indexes on expressionID for the expressions table and another on (attributeName, attributeValue) for the attributes table - but I don't know how this could apply to the join.

我在考虑索引——如果我要加快对单个表的选择,我可能会在表达式表的expressionID和属性表的另一个(attributeName, attributeValue)上添加非聚集索引——但我不知道如何将其应用到联接。

EDIT: I already have a clustered index on expressionId (PK), attributeId (PK, FK) on the expressions table and another clustered index on attributeId (PK) on the attributes table

编辑:我已经在表达式表上有一个关于expressionId (PK)、attributeId (PK, FK)的集群索引,在属性表上有一个关于attributeId (PK)的集群索引

I've seen this question but I am asking for something more general and probably far simpler.

我已经看过这个问题了,但我想要的是更一般更简单的东西。

Any help appreciated!

任何帮助表示赞赏!

6 个解决方案

#1


16  

You definitely want to have indexes on attributeID on both the attributes and expressions table. If you don't currently have those indexes in place, I think you'll see a big speedup.

您肯定希望属性和表达式表上的attributeID上都有索引。如果您目前还没有这些索引,我认为您将看到一个巨大的加速。

#2


6  

In fact, because there are so few columns being returned, I would consider a covered index for this query

事实上,由于返回的列非常少,所以我将为这个查询考虑一个覆盖索引

i.e. an index that includes all the fields in the query.

即包含查询中所有字段的索引。

#3


3  

Some things you need to care about are indexes, the query plan and statistics.

您需要关心的一些事情是索引、查询计划和统计信息。

Put indexes on attributeId. Or, make sure indexes exist where attributeId is the first column in the key (SQL Server can still use indexes if it's not the 1st column, but it's not as fast).

给attributeId索引。或者,确保索引存在,其中attributeId是键中的第一列(如果不是第一列,SQL Server仍然可以使用索引,但速度没有第一列快)。

Highlight the query in Query Analyzer and hit ^L to see the plan. You can see how tables are joined together. Almost always, using indexes is better than not (there are fringe cases where if a table is small enough, indexes can slow you down -- but for now, just be aware that 99% of the time indexes are good).

强调查询在查询分析器和L ^看到这个计划。您可以看到表是如何连接在一起的。几乎总是,使用索引总比不使用要好(如果一个表足够小,那么索引会使您慢下来——但是现在,请注意99%的时间索引都是好的)。

Pay attention to the order in which tables are joined. SQL Server maintains statistics on table sizes and will determine which one is better to join first. Do some investigation on internal SQL Server procedures to update statistics -- it's been too long so I don't have that info handy.

注意表的连接顺序。SQL Server维护表大小的统计信息,并将决定首先加入哪个表更好。对内部SQL Server过程进行一些调查,以更新统计数据——它太长了,所以我手边没有这些信息。

That should get you started. Really, an entire chapter can be written on how a database can optimize even such a simple query.

这样你就可以开始了。实际上,我们可以用一整章来描述数据库是如何优化一个如此简单的查询的。

#4


2  

I bet your problem is the huge number of rows that are being inserted into that temp table. Is there any way you can add a WHERE clause before you SELECT every row in the database?

我敢打赌,您的问题是将大量的行插入到临时表中。在选择数据库中的每一行之前,是否可以添加WHERE子句?

#5


1  

Another thing to do is add some indexes like this:

另一件要做的事是添加一些这样的索引:

attributes.{attributeId, attributeName, attributeValue}
expressions.{attributeId, expressionID}

This is hacky! But useful if it's a last resort.

这是出租汽车司机!但如果这是最后一招,那就有用了。

What this does is create a query plan that can be "entirely answered" by indexes. Usually, an index actually causes a double-I/O in your above query: one to hit the index (i.e. probe into the table), another to fetch the actual row referred to by the index (to pull attributeName, etc).

它所做的是创建一个可以被索引“完全回答”的查询计划。通常,一个索引实际上会在上面的查询中产生一个双i /O:一个是命中索引(即探查到表),另一个是获取索引引用的实际行(提取attributeName,等等)。

This is especially helpful if "attributes" or "expresssions" is a wide table. That is, a table that's expensive to fetch the rows from.

如果“属性”或“expresssions”是一个很大的表,这就特别有用。也就是说,从表中获取行是很昂贵的。

Finally, the best way to speed your query is to add a WHERE clause!

最后,加快查询速度的最好方法是添加WHERE子句!

#6


1  

If I'm understanding your schema correctly, you're stating that your tables kinda look like this:

如果我理解正确的话,你的表格是这样的

Expressions: PK - ExpressionID, AttributeID
Attributes:  PK - AttributeID

Assuming that each PK is a clustered index, that still means that an Index Scan is required on the Expressions table. You might want to consider creating an Index on the Expressions table such as: AttributeID, ExpressionID. This would help to stop the Index Scanning that currently occurs.

假设每个PK都是一个聚集索引,这仍然意味着在表达式表上需要一个索引扫描。您可能需要考虑在表达式表上创建一个索引,例如:AttributeID、ExpressionID。这将有助于停止当前发生的索引扫描。

#1


16  

You definitely want to have indexes on attributeID on both the attributes and expressions table. If you don't currently have those indexes in place, I think you'll see a big speedup.

您肯定希望属性和表达式表上的attributeID上都有索引。如果您目前还没有这些索引,我认为您将看到一个巨大的加速。

#2


6  

In fact, because there are so few columns being returned, I would consider a covered index for this query

事实上,由于返回的列非常少,所以我将为这个查询考虑一个覆盖索引

i.e. an index that includes all the fields in the query.

即包含查询中所有字段的索引。

#3


3  

Some things you need to care about are indexes, the query plan and statistics.

您需要关心的一些事情是索引、查询计划和统计信息。

Put indexes on attributeId. Or, make sure indexes exist where attributeId is the first column in the key (SQL Server can still use indexes if it's not the 1st column, but it's not as fast).

给attributeId索引。或者,确保索引存在,其中attributeId是键中的第一列(如果不是第一列,SQL Server仍然可以使用索引,但速度没有第一列快)。

Highlight the query in Query Analyzer and hit ^L to see the plan. You can see how tables are joined together. Almost always, using indexes is better than not (there are fringe cases where if a table is small enough, indexes can slow you down -- but for now, just be aware that 99% of the time indexes are good).

强调查询在查询分析器和L ^看到这个计划。您可以看到表是如何连接在一起的。几乎总是,使用索引总比不使用要好(如果一个表足够小,那么索引会使您慢下来——但是现在,请注意99%的时间索引都是好的)。

Pay attention to the order in which tables are joined. SQL Server maintains statistics on table sizes and will determine which one is better to join first. Do some investigation on internal SQL Server procedures to update statistics -- it's been too long so I don't have that info handy.

注意表的连接顺序。SQL Server维护表大小的统计信息,并将决定首先加入哪个表更好。对内部SQL Server过程进行一些调查,以更新统计数据——它太长了,所以我手边没有这些信息。

That should get you started. Really, an entire chapter can be written on how a database can optimize even such a simple query.

这样你就可以开始了。实际上,我们可以用一整章来描述数据库是如何优化一个如此简单的查询的。

#4


2  

I bet your problem is the huge number of rows that are being inserted into that temp table. Is there any way you can add a WHERE clause before you SELECT every row in the database?

我敢打赌,您的问题是将大量的行插入到临时表中。在选择数据库中的每一行之前,是否可以添加WHERE子句?

#5


1  

Another thing to do is add some indexes like this:

另一件要做的事是添加一些这样的索引:

attributes.{attributeId, attributeName, attributeValue}
expressions.{attributeId, expressionID}

This is hacky! But useful if it's a last resort.

这是出租汽车司机!但如果这是最后一招,那就有用了。

What this does is create a query plan that can be "entirely answered" by indexes. Usually, an index actually causes a double-I/O in your above query: one to hit the index (i.e. probe into the table), another to fetch the actual row referred to by the index (to pull attributeName, etc).

它所做的是创建一个可以被索引“完全回答”的查询计划。通常,一个索引实际上会在上面的查询中产生一个双i /O:一个是命中索引(即探查到表),另一个是获取索引引用的实际行(提取attributeName,等等)。

This is especially helpful if "attributes" or "expresssions" is a wide table. That is, a table that's expensive to fetch the rows from.

如果“属性”或“expresssions”是一个很大的表,这就特别有用。也就是说,从表中获取行是很昂贵的。

Finally, the best way to speed your query is to add a WHERE clause!

最后,加快查询速度的最好方法是添加WHERE子句!

#6


1  

If I'm understanding your schema correctly, you're stating that your tables kinda look like this:

如果我理解正确的话,你的表格是这样的

Expressions: PK - ExpressionID, AttributeID
Attributes:  PK - AttributeID

Assuming that each PK is a clustered index, that still means that an Index Scan is required on the Expressions table. You might want to consider creating an Index on the Expressions table such as: AttributeID, ExpressionID. This would help to stop the Index Scanning that currently occurs.

假设每个PK都是一个聚集索引,这仍然意味着在表达式表上需要一个索引扫描。您可能需要考虑在表达式表上创建一个索引,例如:AttributeID、ExpressionID。这将有助于停止当前发生的索引扫描。