在MySQL中使用内连接表上的索引

时间:2022-09-25 15:44:09

I have table Foo with 200 million records and table Bar with 1000 records, they are connected many-to-one. There are indexes for columns Foo.someTime and Bar.someField. Also in Bar 900 records have someField of 1, 100 have someField of 2.

我有200万条记录的表Foo和1000条记录的表格栏,它们是多对一连接的。列Foo.someTime和Bar.someField有索引。同样在Bar 900中,记录的某些字段为1,100,其中某些字段为2。

(1) This query executes immediately:

(1)此查询立即执行:

mysql> select * from Foo f inner join Bar b on f.table_id = b.table_id where f.someTime     between '2008-08-14' and '2018-08-14' and b.someField = 1 limit 20;
...
20 rows in set (0.00 sec)

(2) This one takes just forever (the only change is b.someField = 2):

(2)这个只需要永远(唯一的变化是b.someField = 2):

mysql> select * from Foo f inner join Bar b on f.table_id = b.table_id where f.someTime     between '2008-08-14' and '2018-08-14' and b.someField = 2 limit 20;

(3) But if I drop out where clause on someTime than it also executes immediately:

(3)但是如果我在someTime上删除where子句而不是立即执行:

mysql> select * from Foo f inner join Bar b on f.table_id = b.table_id where b.someField = 2 limit 20;
...
20 rows in set (0.00 sec)

(4) Also I can speed it up by forcing the index usage:

(4)我也可以通过强制索引使用来加快速度:

mysql> select * from Foo f inner join Bar b force index(someField) on f.table_id = b.table_id where f.someTime     between '2008-08-14' and '2018-08-14' and b.someField = 2 limit 20;
...
20 rows in set (0.00 sec)

Here is the explain on query (2) (which takes forever)

这是关于查询(2)的解释(这需要永远)

+----+-------------+-------+--------+-------------------------------+-----------+---------+--------------------------+----------+-------------+
| id | select_type | table | type   | possible_keys                 | key       | key_len | ref                      | rows     | Extra       |
+----+-------------+-------+--------+-------------------------------+-----------+---------+--------------------------+----------+-------------+
|  1 | SIMPLE      | g     | range  | bar_id,bar_id_2,someTime      | someTime  | 4       | NULL                     | 95022220 | Using where |
|  1 | SIMPLE      | t     | eq_ref | PRIMARY,someField,bar_id      | PRIMARY   | 4       | db.f.bar_id              |        1 | Using where |
+----+-------------+-------+--------+-------------------------------+-----------+---------+--------------------------+----------+-------------+

Here is the explain on (4) (which has force index)

这是(4)(有力指数)的解释

+----+-------------+-------+------+-------------------------------+-----------+---------+--------------------------+----------+-------------+
| id | select_type | table | type | possible_keys                 | key       | key_len | ref                      | rows     | Extra       |
+----+-------------+-------+------+-------------------------------+-----------+---------+--------------------------+----------+-------------+
|  1 | SIMPLE      | t     | ref  | someField                     | someField | 1       |   const                  |       92 |             |
|  1 | SIMPLE      | g     | ref  | bar_id,bar_id_2,someTime      | bar_id    | 4       | db.f.foo_id              | 10558024 | Using where |
+----+-------------+-------+------+-------------------------------+-----------+---------+--------------------------+----------+-------------+

So the question is how to teach MySQL to use right index? The query is generated by ORM and is not limited to only these two fields. And also it would be nice to avoid changing the query much (though I'm not sure that inner join fits here).

那么问题是如何教MySQL使用正确的索引?查询由ORM生成,并不仅限于这两个字段。并且避免更改查询会很好(尽管我不确定内连接是否适合这里)。

UPDATE:

mysql> create index index_name on Foo (bar_id, someTime);

After that the query (2) executes in 0.00 sec.

之后,查询(2)在0.00秒内执行。

1 个解决方案

#1


4  

If you create compound index for foo(table_id, sometime), it should help a lot. This is because server will be able to narrow down result set by table_id first, and then by sometime.

如果你为foo(table_id,sometime)创建复合索引,它应该会有很大帮助。这是因为服务器可以先通过table_id缩小结果集,然后再缩短一段时间。

Note that when using LIMIT, server does not guarantee which rows will be fetched if many qualify to your WHERE constraint. Technically, every execution can give you slightly different result. If you want to avoid ambiguity, you should always use ORDER BY when you use LIMIT. However, that also means you should be more careful in creating appropriate indexes.

请注意,使用LIMIT时,如果许多行符合WHERE约束条件,则服务器不保证将获取哪些行。从技术上讲,每次执行都会给你略有不同的结果。如果要避免歧义,则在使用LIMIT时应始终使用ORDER BY。但是,这也意味着您应该更加谨慎地创建适当的索引。

#1


4  

If you create compound index for foo(table_id, sometime), it should help a lot. This is because server will be able to narrow down result set by table_id first, and then by sometime.

如果你为foo(table_id,sometime)创建复合索引,它应该会有很大帮助。这是因为服务器可以先通过table_id缩小结果集,然后再缩短一段时间。

Note that when using LIMIT, server does not guarantee which rows will be fetched if many qualify to your WHERE constraint. Technically, every execution can give you slightly different result. If you want to avoid ambiguity, you should always use ORDER BY when you use LIMIT. However, that also means you should be more careful in creating appropriate indexes.

请注意,使用LIMIT时,如果许多行符合WHERE约束条件,则服务器不保证将获取哪些行。从技术上讲,每次执行都会给你略有不同的结果。如果要避免歧义,则在使用LIMIT时应始终使用ORDER BY。但是,这也意味着您应该更加谨慎地创建适当的索引。