复合键VS主键+不唯一索引

时间:2022-09-16 08:38:20

Here is what I have :

这是我有的:

table content : cat_id product_id data1 data2 etc.
the categories are not unique obviously. the product ids are unique.

表内容:cat_id product_id data1 data2等,这些类别显然不是唯一的。产品ID是独一无二的。

2 queries : 1 -- SELECT * WHERE cat_id = :cat - must be as quick as possible 2 -- SELECT * WHERE product_id = :prodId In second select, I can add : AND cat_id = :cat

2个查询:1 - SELECT * WHERE cat_id =:cat - 必须尽可能快2 - SELECT * WHERE product_id =:prodId在第二个选择中,我可以添加:AND cat_id =:cat

What is the more efficient ?

效率更高的是什么?

  • 1 - index (not unique) on cat_id (good for select 1)
  • 1 - cat_id上的索引(非唯一)(适用于选择1)

  • 2 - primary key on product_id (unique -> excellent for select 2)
  • 2 - product_id上的主键(唯一 - >选择2的优秀)

  • 3 - index (not unique) on cat_id + PK on product_id (good for 1 & 2 separately)
  • 3 - product_id上cat_id + PK上的索引(非唯一)(分别适用于1和2)

  • 4 - unique constraint with composite [cat_id+product_id] (good for 1 & 2 together)
  • 4 - 使用复合[cat_id + product_id]的唯一约束(适用于1和2)

  • 5 - same as 4, but defining the composite as PK
  • 5 - 与4相同,但将复合定义为PK

  • 6 - composite (4 or 5) + single index/PK
  • 6 - 复合(4或5)+单指数/ PK

For information, I'll have around 20 products in each category and a lot of categories (say 3000) - And (as it is unique in table) ONE product belongs to only ONE category - In fact, that is not really cats and products, that is for the simplicity of explaination;)

有关信息,我将在每个类别中有大约20种产品和许多类别(比如3000) - 并且(因为它在表中是独一无二的)一个产品只属于一个类别 - 事实上,这不是真正的猫和产品,这是为了简单的解释;)

thanks!

2 个解决方案

#1


5  

A database without Primary key is only half dressed and according to you product_id is an ideal candidate for a primary key, so let choose that. The primary key will be used in

没有主键的数据库只有一半打扮,根据你的说法,product_id是主键的理想候选者,所以我们选择它。主键将用于

SELECT * WHERE product_id = :prodId

It does not matter if and cat_id = :cat_id becomes a part of the query or not unless you have thousands of cat_ids associated with each product_id.

无论cat和cat_id =:cat_id是否成为查询的一部分都没关系,除非您有数千个与每个product_id关联的cat_id。

Then choose an index on cat_id. This will be used on

然后在cat_id上选择一个索引。这将用于

SELECT * WHERE cat_id = :cat

This will be very quick if the cardinality of the data is good. That means there is a wide distribution of cat_ids in the table. The index of cat_id will not be used in the first query. Thus you have two different indexes and both queries can be expected to be really quick.

如果数据的基数良好,这将非常快。这意味着表格中有大量的cat_ids分布。 cat_id的索引不会在第一个查询中使用。因此,您有两个不同的索引,并且可以预期两个查询都非常快。

Since [cat_id+product_id] != [product_id+cat_id] when it comes to indexing, if you have only one composite index one or the other will be slow.

由于[cat_id + product_id]!= [product_id + cat_id]在索引时,如果你只有一个复合索引,那么另一个将很慢。

For example, suppose we had a composite index on (cat_id, product_id) now the following query cannot make use of this index.

例如,假设我们在(cat_id,product_id)上有一个复合索引,现在以下查询无法使用此索引。

 SELECT * FROM tablename WHERE product_id = :prodId

But both these queries can use the (cat_id, product_id) index

但这两个查询都可以使用(cat_id,product_id)索引

SELECT * FROM tablename WHERE cat_id = :cat_id and product_id = :prodId
SELECT * FROM tablename WHERE cat_id = :catId

So in summary. Choose 1 and 2. But if the number of cat_ids are small or there are lots of cat_ids associated with each product_id choose 4 but make sure that the primary key is also in place.

所以总结一下。选择1和2.但是如果cat_id的数量很小或者每个product_id都有很多cat_id,请选择4,但要确保主键也就位。

#2


1  

If these are your only two queries:

如果这是您唯一的两个查询:

SELECT * FROM tablename WHERE cat_id = :cat_id and product_id = :prodId
SELECT * FROM tablename WHERE cat_id = :cat_id

and you have some other way to assure that product_id is UNIQUE, then you need only:

并且您还有其他方法可以确保product_id是唯一的,那么您只需要:

PRIMARY KEY(cat_id, product_id)

It is optimal for both SELECTs.

它对于两个SELECT都是最佳的。

It is better than INDEX(cat_id) because (1) secondary keys have to finish their work with a PK lookup, and (2) all the cat rows are adjacent, thereby more efficient.

它比INDEX(cat_id)更好,因为(1)辅助键必须用PK查找完成它们的工作,(2)所有cat行都是相邻的,因此更有效。

If product_id is actually an AUTO_INCREMENT, then add

如果product_id实际上是AUTO_INCREMENT,则添加

INDEX(product_id)

No, you don't need to say UNIQUE (unless you are prone to deliberately trying to insert duplicate product_ids). The only thing that AI requires is that the id be first in some index so that it can do the equivalent of SELECT max(id) whenever mysqld is restarted.

不,您不需要说UNIQUE(除非您倾向于故意尝试插入重复的product_ids)。 AI唯一需要的是id在某个索引中是第一个,这样每当mysqld重新启动时它就可以完成SELECT max(id)的等效操作。

My recommendations apply regardless of the size of the table.

无论表格大小如何,我的建议都适用。

The order of clauses in WHERE does not matter.

WHERE中的子句顺序无关紧要。

JOINs do not require anything in particular. It is slightly more efficient to JOIN on a PRIMARY KEY than on a secondary key, which is a lot more efficient (but still possible) than on a non-indexed column(s).

JOIN不需要任何特别的东西。在PRIMARY KEY上加入比在辅助键上更有效,这比非索引列更有效(但仍然可能)。

#1


5  

A database without Primary key is only half dressed and according to you product_id is an ideal candidate for a primary key, so let choose that. The primary key will be used in

没有主键的数据库只有一半打扮,根据你的说法,product_id是主键的理想候选者,所以我们选择它。主键将用于

SELECT * WHERE product_id = :prodId

It does not matter if and cat_id = :cat_id becomes a part of the query or not unless you have thousands of cat_ids associated with each product_id.

无论cat和cat_id =:cat_id是否成为查询的一部分都没关系,除非您有数千个与每个product_id关联的cat_id。

Then choose an index on cat_id. This will be used on

然后在cat_id上选择一个索引。这将用于

SELECT * WHERE cat_id = :cat

This will be very quick if the cardinality of the data is good. That means there is a wide distribution of cat_ids in the table. The index of cat_id will not be used in the first query. Thus you have two different indexes and both queries can be expected to be really quick.

如果数据的基数良好,这将非常快。这意味着表格中有大量的cat_ids分布。 cat_id的索引不会在第一个查询中使用。因此,您有两个不同的索引,并且可以预期两个查询都非常快。

Since [cat_id+product_id] != [product_id+cat_id] when it comes to indexing, if you have only one composite index one or the other will be slow.

由于[cat_id + product_id]!= [product_id + cat_id]在索引时,如果你只有一个复合索引,那么另一个将很慢。

For example, suppose we had a composite index on (cat_id, product_id) now the following query cannot make use of this index.

例如,假设我们在(cat_id,product_id)上有一个复合索引,现在以下查询无法使用此索引。

 SELECT * FROM tablename WHERE product_id = :prodId

But both these queries can use the (cat_id, product_id) index

但这两个查询都可以使用(cat_id,product_id)索引

SELECT * FROM tablename WHERE cat_id = :cat_id and product_id = :prodId
SELECT * FROM tablename WHERE cat_id = :catId

So in summary. Choose 1 and 2. But if the number of cat_ids are small or there are lots of cat_ids associated with each product_id choose 4 but make sure that the primary key is also in place.

所以总结一下。选择1和2.但是如果cat_id的数量很小或者每个product_id都有很多cat_id,请选择4,但要确保主键也就位。

#2


1  

If these are your only two queries:

如果这是您唯一的两个查询:

SELECT * FROM tablename WHERE cat_id = :cat_id and product_id = :prodId
SELECT * FROM tablename WHERE cat_id = :cat_id

and you have some other way to assure that product_id is UNIQUE, then you need only:

并且您还有其他方法可以确保product_id是唯一的,那么您只需要:

PRIMARY KEY(cat_id, product_id)

It is optimal for both SELECTs.

它对于两个SELECT都是最佳的。

It is better than INDEX(cat_id) because (1) secondary keys have to finish their work with a PK lookup, and (2) all the cat rows are adjacent, thereby more efficient.

它比INDEX(cat_id)更好,因为(1)辅助键必须用PK查找完成它们的工作,(2)所有cat行都是相邻的,因此更有效。

If product_id is actually an AUTO_INCREMENT, then add

如果product_id实际上是AUTO_INCREMENT,则添加

INDEX(product_id)

No, you don't need to say UNIQUE (unless you are prone to deliberately trying to insert duplicate product_ids). The only thing that AI requires is that the id be first in some index so that it can do the equivalent of SELECT max(id) whenever mysqld is restarted.

不,您不需要说UNIQUE(除非您倾向于故意尝试插入重复的product_ids)。 AI唯一需要的是id在某个索引中是第一个,这样每当mysqld重新启动时它就可以完成SELECT max(id)的等效操作。

My recommendations apply regardless of the size of the table.

无论表格大小如何,我的建议都适用。

The order of clauses in WHERE does not matter.

WHERE中的子句顺序无关紧要。

JOINs do not require anything in particular. It is slightly more efficient to JOIN on a PRIMARY KEY than on a secondary key, which is a lot more efficient (but still possible) than on a non-indexed column(s).

JOIN不需要任何特别的东西。在PRIMARY KEY上加入比在辅助键上更有效,这比非索引列更有效(但仍然可能)。