MySQL“SELECT DISTINCT”非常大的表的效率

时间:2021-03-13 23:47:27

I have a very large table (millions of records) containing approximately 8 fields as a primary key. for simplicities sake lets say that the table looks like this:

我有一个非常大的表(数百万条记录),包含大约8个字段作为主键。为了简单起见,让我们说这个表看起来像这样:

    key_1 | key_2 | key_3 | ... | key_8 | value

given a value for key_1, I need to fetch all possible values for key_2, key_3, ..., key_8 something along the following lines:

给定key_1的值,我需要获取key_2,key_3,...,key_8的所有可能值,如下所示:

    SELECT DISTINCT key_2 FROM table1 WHERE key_1 = 123;
    SELECT DISTINCT key_3 FROM table1 WHERE key_1 = 123;
    ...
    SELECT DISTINCT key_8 FROM table1 WHERE key_1 = 123;

My problem is that this query is significantly slower then my performance needs, and the data in this table is fairly constant and rarely updated(once every few days). Also table_1 could be a slow sub-query. Short of creating an additional table in the database and manually updating it every time the database is updated, is there another solution that can give me fast results. I would need it to work across multiple MySQL Sessions.

我的问题是这个查询明显慢于我的性能需求,并且该表中的数据相当稳定并且很少更新(每隔几天一次)。 table_1也可能是一个缓慢的子查询。如果没有在数据库中创建额外的表并在每次更新数据库时手动更新它,是否有另一种解决方案可以给我快速的结果。我需要它来跨多个MySQL会话。

2 个解决方案

#1


13  

Can't give a definitive answer with the information we have, but let's start with these:

无法用我们掌握的信息给出明确的答案,但让我们从这些开始:

Do you have an index on key_1?

Without it, each query by itself will already be slow just looking for 123.

没有它,每个查询本身就会很慢,只需要查找123。

Do you have an index on (key_1, key_2)?

Because select distinct key_2 where key_1 = 123 is really fast if it can get all the necessary data from the index alone. No need to access the table.

因为如果key_1 = 123能够从索引中单独获得所有必要的数据,那么选择distinct key_2就非常快。无需访问该表。

Are the rows/indexes fixed-size?

Traversing a fixed-size table/row can be faster because one always knows where the x-th record is by just calculating the offset. Variable row sized tables are slower.

遍历固定大小的表/行可以更快,因为通过计算偏移总是知道第x个记录的位置。可变行大小的表格较慢。

Have you tried adding an autoincrement surrogate primary key?

Indexes work way better when all they have to store is the column, and a small primary key. Composite primary keys are slower.

当所有必须存储的索引是列和小主键时,索引的工作方式会更好。复合主键较慢。

Did you consider a read-only table?

You can pack myisam table for fast access, but they become read-only. It's a hack that has its uses though.

您可以打包myisam表以便快速访问,但它们变为只读。这是一个有其用途的黑客。

One step further, have you considered a datawarehouse?

If the tables don't change often, it might be best to duplicate the information for fast access.

如果表格不经常更改,则最好复制信息以便快速访问。

Can you post a show create table statement? Seeing the columns and indexes would help. Can you post an explain select statement? Seeing which indexes are used would help.

你可以发布show create table语句吗?查看列和索引会有所帮助。你可以发一个解释选择声明吗?查看使用哪些索引会有所帮助。

#2


2  

SELECT DISTINCT key_2 FROM table1 WHERE key_1 = 123;

This can use your primary key index (key_1, key_2, etc.) It will perform an index scan, which is faster than a table scan or a temporary table.

这可以使用您的主键索引(key_1,key_2等)它将执行索引扫描,这比表扫描或临时表更快。

SELECT DISTINCT key_3 FROM table1 WHERE key_1 = 123;

Cannot use the primary key because the combination of key_1 and key_3 don't form a prefix for the primary key. You need to create a compound index on key_1 and key_3, in that order. Then, it can use that index to perform an index scan also.

无法使用主键,因为key_1和key_3的组合不会形成主键的前缀。您需要按顺序在key_1和key_3上创建复合索引。然后,它也可以使用该索引来执行索引扫描。

SELECT DISTINCT key_8 FROM table1 WHERE key_1 = 123;

Needs index on key_1 and key_8, in that order. Same as above.

按顺序需要key_1和key_8的索引。与上面相同。

#1


13  

Can't give a definitive answer with the information we have, but let's start with these:

无法用我们掌握的信息给出明确的答案,但让我们从这些开始:

Do you have an index on key_1?

Without it, each query by itself will already be slow just looking for 123.

没有它,每个查询本身就会很慢,只需要查找123。

Do you have an index on (key_1, key_2)?

Because select distinct key_2 where key_1 = 123 is really fast if it can get all the necessary data from the index alone. No need to access the table.

因为如果key_1 = 123能够从索引中单独获得所有必要的数据,那么选择distinct key_2就非常快。无需访问该表。

Are the rows/indexes fixed-size?

Traversing a fixed-size table/row can be faster because one always knows where the x-th record is by just calculating the offset. Variable row sized tables are slower.

遍历固定大小的表/行可以更快,因为通过计算偏移总是知道第x个记录的位置。可变行大小的表格较慢。

Have you tried adding an autoincrement surrogate primary key?

Indexes work way better when all they have to store is the column, and a small primary key. Composite primary keys are slower.

当所有必须存储的索引是列和小主键时,索引的工作方式会更好。复合主键较慢。

Did you consider a read-only table?

You can pack myisam table for fast access, but they become read-only. It's a hack that has its uses though.

您可以打包myisam表以便快速访问,但它们变为只读。这是一个有其用途的黑客。

One step further, have you considered a datawarehouse?

If the tables don't change often, it might be best to duplicate the information for fast access.

如果表格不经常更改,则最好复制信息以便快速访问。

Can you post a show create table statement? Seeing the columns and indexes would help. Can you post an explain select statement? Seeing which indexes are used would help.

你可以发布show create table语句吗?查看列和索引会有所帮助。你可以发一个解释选择声明吗?查看使用哪些索引会有所帮助。

#2


2  

SELECT DISTINCT key_2 FROM table1 WHERE key_1 = 123;

This can use your primary key index (key_1, key_2, etc.) It will perform an index scan, which is faster than a table scan or a temporary table.

这可以使用您的主键索引(key_1,key_2等)它将执行索引扫描,这比表扫描或临时表更快。

SELECT DISTINCT key_3 FROM table1 WHERE key_1 = 123;

Cannot use the primary key because the combination of key_1 and key_3 don't form a prefix for the primary key. You need to create a compound index on key_1 and key_3, in that order. Then, it can use that index to perform an index scan also.

无法使用主键,因为key_1和key_3的组合不会形成主键的前缀。您需要按顺序在key_1和key_3上创建复合索引。然后,它也可以使用该索引来执行索引扫描。

SELECT DISTINCT key_8 FROM table1 WHERE key_1 = 123;

Needs index on key_1 and key_8, in that order. Same as above.

按顺序需要key_1和key_8的索引。与上面相同。