如何正确索引多个关联表?

In a typical many-many arrangement like this...

在一个典型的多如牛毛的安排中……

Movies       Actors       Movies_Actors
------       ------       -------------
movie_ID     actor_ID     FK_movie_ID
title        name         FK_actor_ID

... how should the association table ('Movies_Actors') be indexed for optimal read speed?

…如何索引关联表(“Movies_Actors”)以获得最佳的读取速度?

I usually see this done only with the composite primary key in the association table, like so:

我通常只看到在关联表中使用复合主键完成这个操作，如下所示:

CREATE TABLE Movies_Actors (
  FK_movie_ID INTEGER,
  FK_actor_ID INTEGER,
  PRIMARY KEY (FK_movie_ID, FK_actor_ID)
)

However, this seems like the index will only be useful when searching for both movie_ID and actor_ID (although I'm not certain on whether a composite index also works for the individual columns).

然而，似乎只有在搜索movie_ID和actor_ID时索引才有用(尽管我不确定复合索引是否也适用于单个列)。

Since both "what actors are in Movie X" and "what movies has actor Y been in" will be the common queries for this table, it seems like there should be an individual index on each column to quickly locate actors and movies on their own. Does a composite index effectively do this? If not, having a composite index seems pointless on this table. And if a composite index is pointless, what to do about a primary key? The candidate key is clearly the composite of the two columns, but if the resulting composite index is pointless (it mustn't be?) it seems like a waste.

由于“X片中的演员是什么”和“Y片中的演员是什么”都是这个表格的常见查询，似乎每个列都应该有一个单独的索引来快速定位演员和电影。复合指数能有效地做到这一点吗?如果没有，那么在这张表上使用复合索引似乎毫无意义。如果复合索引是无意义的，那么如何处理主键呢?候选键显然是这两列的组合，但是如果生成的复合索引是无意义的(它不应该是?)，它看起来就像浪费。

Also, this link adds some confusion and indicates that it might even be useful to actually specify two composite indices... one of them as (FK_movie_ID, FK_actor_ID), and the other in reverse as (FK_actor_ID, FK_movie_ID), with the choice of which is the primary key (and thus usually clustered) and which is 'just' a unique composite index being based on which direction is queried more.

此外，该链接还会增加一些混淆，并指出实际上指定两个复合索引可能很有用……其中一个是(FK_movie_ID, FK_actor_ID)，另一个是反向的(FK_actor_ID, FK_movie_ID)，其中的选择是主键(因此通常是聚集的)，这是一个基于哪个方向的唯一的复合索引。

What is the real story? Does a composite index automatically effectively index each column for searching on one or the other? Should the optimal (in read speed, not size) association table have a composite index in each direction and one on each column? What are the behind-the-scene mechancs?

真正的故事是什么?复合索引是否自动有效地索引每个列以搜索其中一个或另一个?最优关联表(读取速度，而不是大小)是否应该在每个方向和每个列上都有一个复合索引?什么是幕后黑幕?

EDIT: I found this related question that for some reason I didn't locate before posting... How to properly index a linking table for many-to-many connection in MySQL?

编辑:我发现这个相关的问题，出于某种原因，我在发布之前没有找到……如何在MySQL中正确索引多对多连接的连接表?

2 个解决方案

#1

(although I'm not certain on whether a composite index also works for the individual columns).

(虽然我不确定复合索引是否也适用于单个列)。

Yes, it can. But only the prefix: http://use-the-index-luke.com/sql/where-clause/the-equals-operator/concatenated-keys

是的,它可以。但是只有前缀:http://use-the-index-卢克/sql/where-clause/the-equals-operator/concatenated-keys

Also, this link adds some confusion and indicates that it might even be useful to actually specify two composite indices... one of them as (FK_movie_ID, FK_actor_ID), and the other in reverse as (FK_actor_ID, FK_movie_ID),

此外，该链接还会增加一些混淆，并指出实际上指定两个复合索引可能很有用……其中一个为(FK_movie_ID, FK_actor_ID)，另一个为(FK_actor_ID, FK_movie_ID)，

That's actually the thing to do.

这就是我们要做的。

Take one as clustering index, and the other as non-clustering index that will anyways include the clustering index key--hence no need to include the that column again (thx to JNK).

将一个作为集群索引，另一个作为非集群索引，无论如何都将包含集群索引键——因此不需要再次包含该列(thx到JNK)。

CREATE CLUSTERING INDEX a on Movies_Actors (fk_movie_id, fk_actor_id);
CREATE NONCLUSTERING INDEX b on Movies_Actors (fk_actor_id);

What is the real story?

真正的故事是什么?

http://Use-The-Index-Luke.com/ :)

http://Use-The-Index-Luke.com/:)

Does a composite index automatically effectively index each column for searching on one or the other?

复合索引是否自动有效地索引每个列以搜索其中一个或另一个?

No. Only the prefix of the index. If you have an index (a,b,c), the query a=? and b=? can use the index. However c=? can't, nor can b=? and c=?.

不。只有索引的前缀。如果有索引(a、b、c)，查询a=?和b = ?可以使用索引。然而c = ?不能,也不可能b = ?和c = ?。

Should the optimal (in read speed, not size) association table have a composite index in each direction and one on each column?

最优关联表(读取速度，而不是大小)是否应该在每个方向和每个列上都有一个复合索引?

If you need to join in both directions, yes ("composite index in each direction") and no ("one on each column").

如果你需要加入两个方向，是的(“每个方向的复合索引”)和no(“每个列上的一个”)。

What are the behind-the-scene mechanics?

幕后的机制是什么?

Well, same link again.

同样,链接。

Speaking SQL Server, you might eventually also consider an indexed view. That's kind of pre-joining. Two indexes, as above, might also be fast enough.

说到SQL Server，您可能最终还会考虑一个索引视图。这是种pre-joining。如上所述，两个索引也可能足够快。

#2

In SQL Server, a composite index can be used for a single field search for the first column only. That means you should have an additional, one field index on FK_actor_id if there will be searches on that field without FK_Movie_id in the same query.

在SQL Server中，复合索引只能用于第一列的单个字段搜索。这意味着您应该在FK_actor_id上有一个额外的字段索引，如果在同一个查询中没有FK_Movie_id的话。

#1