具有大表的MySQL: 2查询更快,1(未使用索引)

时间:2022-09-17 21:38:48

I have two tables:

我有两个表:

CREATE TABLE `linf` (
  `ID` bigint(20) NOT NULL AUTO_INCREMENT,
  `glorious` bit(1) DEFAULT NULL,
  `limad` varchar(127) COLLATE utf8_bin DEFAULT NULL,
  `linfDetails_id` bigint(20) DEFAULT NULL,
  PRIMARY KEY (`ID`),
  KEY `FK242415D3B0D13C` (`linfDetails_id`),
  CONSTRAINT `FK242415D3B0D13C` FOREIGN KEY (`linfDetails_id`) REFERENCES `linfdetails` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=135111 DEFAULT CHARSET=utf8 COLLATE=utf8_bin

(130K rows)

(130 k行)

and

CREATE TABLE `messageentry` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `mboxOffset` bigint(20) DEFAULT NULL,
  `mboxOffsetEnd` bigint(20) DEFAULT NULL,
  `from_id` bigint(20) DEFAULT NULL,
  `linf_ID` bigint(20) DEFAULT NULL,
  `mailSourceFile_id` bigint(20) DEFAULT NULL,
  `messageDetails_id` bigint(20) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `FKBBB258CB60B94D38` (`mailSourceFile_id`),
  KEY `FKBBB258CB11F9E114` (`from_id`),
  KEY `FKBBB258CBF7C835B8` (`messageDetails_id`),
  KEY `FKBBB258CBB10E8518` (`linf_ID`),
  CONSTRAINT `FKBBB258CBB10E8518` FOREIGN KEY (`linf_ID`) REFERENCES `linf` (`ID`),
  CONSTRAINT `FKBBB258CB11F9E114` FOREIGN KEY (`from_id`) REFERENCES `emailandname` (`id`),
  CONSTRAINT `FKBBB258CB60B94D38` FOREIGN KEY (`mailSourceFile_id`) REFERENCES `mailsourcefile` (`id`),
  CONSTRAINT `FKBBB258CBF7C835B8` FOREIGN KEY (`messageDetails_id`) REFERENCES `messagedetails` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=5888892 DEFAULT CHARSET=utf8 COLLATE=utf8_bin

(5M rows)

(5 m行)

I need to find linf by linf.limad and then find all messages that correspond to this linf.

我需要找到林夫。然后找到所有与这个linf对应的消息。

If I select it in two queries:

如果我在两个查询中选择它:

select sql_no_cache l.id from linf l where l.limad='test@';
select sql_no_cache me.* from messageentry me where me.linf_id = 118668;

then it takes 0.06 seconds.

然后是0。06秒。

If I use

如果我使用

select sql_no_cache me.* from messageentry me where me.linf_id in(
select l.id from linf l where l.limad='test@') ;

it takes 10 secs to execute. And this one:

执行需要10秒。这一个:

select sql_no_cache me.* from messageentry me, linf l where me.linf_id=l.id
and l.limad='test@';

takes 4 seconds. (Times are stable)

需要4秒。是稳定的(倍)

This request retuns 0 results because there is no messages for this linf. In fact, I've stripped this from big request

这个请求retuns0结果,因为没有消息为这个linf。事实上,我已经从一个大的请求中删除了这个。

select messageent1_.*
from
    MailSourceFile mailsource0_,        
    MessageEntry messageent1_ ,
    MessageDetails messagedet2_,    
    Linf linf3_
where
    messageent1_.messageDetails_id = messagedet2_.id
        and messageent1_.linf_ID = linf3_.ID
        and linf3_.limad = 'test@'
and mailsource0_.id = messageent1_.mailSourceFile_id

which works ~1 minute. Isn't that too much? Explain says that messageEntries index is not used:

工作~ 1分钟。那不是太多吗?Explain表示不使用messageEntries索引:

mysql> explain select sql_no_cache me.* from messageentry me, linf l where me.linf_id=l.id and l.limad='test@';
+----+-------------+-------+--------+--------------------+---------+---------+------------------+---------+-------------+
| id | select_type | table | type   | possible_keys      | key     | key_len | ref              | rows    | Extra       |
+----+-------------+-------+--------+--------------------+---------+---------+------------------+---------+-------------+
|  1 | SIMPLE      | me    | ALL    | FKBBB258CBB10E8518 | NULL    | NULL    | NULL             | 5836332 |             |
|  1 | SIMPLE      | l     | eq_ref | PRIMARY            | PRIMARY | 8       | skryb.me.linf_ID |       1 | Using where |
+----+-------------+-------+--------+--------------------+---------+---------+------------------+---------+-------------+

Any ideas why? I've gained mysql ~1.6 G of memory and this should fit all tables.

任何想法为什么?我获得了mysql ~1.6 G的内存,这个应该适合所有表。

Thanx.

谢谢。

4 个解决方案

#1


3  

Let's look at the query:

让我们看一下查询:

select sql_no_cache me.*
from messageentry me, linf l
where me.linf_id=l.id
and l.limad='test@';

What does it do? According to the execution plan from the EXPLAIN for each row in me table it checks if there is a corresponding record in linf. Since you do not have any index on limad field, MySQL 5M times fetches the value of the limad field from disk (not from memory) to check if it is equal to '@test'. You say that the query returns 0 rows, but for another limad value that would give more rows it would need to go on disk for all me.* fields.

它做什么?根据me表中每一行的EXPLAIN的执行计划,它检查linf中是否有相应的记录。由于limad字段上没有任何索引,所以MySQL 5M乘以从磁盘(不是从内存)获取limad字段的值,以检查它是否等于“@test”。你说这个查询返回0行,但是对于另一个limad值,它会给出更多的行,所以我需要把它放到磁盘上。*字段。

Ok, limad field is varchar(127) COLLATE utf8_bin, that is an index on it might be expesive (I would add it anyway). 130k rows is less than 5M, so it would be great to start with linf, and all we need from messageentry for the start is id, mailSourceFile_id, messageDetails_id. Why only those fields? Since we are going to make two more joins and we take no data from the joined tables, the tables seem to narrow the final result set, that is they are required for the skeleton of the query. Let's start with them only:

limad字段是varchar(127) COLLATE utf8_bin,这是一个索引,它可能是expesive(我还是要添加它)。130k行小于5M,所以最好从linf开始,我们首先需要从messageentry中获得id、mailSourceFile_id、messageDetails_id。为什么只有这些字段?由于我们将再进行两个连接,并且不从已连接的表中获取数据,因此表似乎缩小了最终结果集,即查询的骨架需要这些表。让我们从它们开始:

SELECT me.id, me.mailSourceFile_id, me.messageDetails_id
FROM (
  SELECT ID as linf_ID
  FROM linf
  WHERE limad='test@'
) as linf
JOIN messageentry me USING (linf_ID);

The query selects required linf_ID as and then for each found id looks apprepriate rows in messageentry. Since you have an index on linf_iD, the query should result faster than 4 secs.

查询选择所需的linf_ID,然后每个找到的id在messageentry中看起来都是值得关注的行。由于您在linf_iD上有一个索引,查询的结果应该比4秒快。

But those me.mailSourceFile_id, me.messageDetails_id can't be taken from memory, since MySQL would need to do a complex index merge, hence, MySQL would anyway go on disk for each row with matching linf_ID. If you would have an index that contains all those three fields at once, the query would be even faster in case there is a significant amount of rows which are filtered by subsequent joins.

但这些我。mailSourceFile_id,我。不能从内存中获取messageDetails_id,因为MySQL需要进行复杂的索引合并,因此,MySQL将在磁盘上为每一行使用匹配的linf_ID。如果您将有一个同时包含所有这三个字段的索引,那么如果有大量的行被后续连接过滤,查询将会更快。

If you update your KEY FKBBB258CBB10E8518 (linf_ID) to FKBBB258CBB10E8518 (linf_ID, mailSourceFile_id, messageDetails_id), you would have such an index.

如果您将您的密钥FKBBB258CBB10E8518 (linf_ID)更新为FKBBB258CBB10E8518 (linf_ID、mailSourceFile_id、messageDetails_id),您将获得这样的索引。

The resulting query would look something like:

得到的查询将如下所示:

SELECT me.*
FROM (
  SELECT ID as linf_ID
  FROM linf
  WHERE limad='test@'
) as linf
JOIN messageentry me USING (linf_ID)
JOIN MailSourceFile ms ON ms.id = me.mailSourceFile_id
JOIN MessageDetails md ON md.id = me.messageDetails_id;

Actually, as soon as you update the index FKBBB258CBB10E8518 (linf_ID) as suggested above, your original query most probably will have the same execution plan and timing as the last query.

实际上,只要您按照上面的建议更新索引FKBBB258CBB10E8518 (linf_ID),您的原始查询很可能与上一个查询具有相同的执行计划和时间安排。

#2


0  

MySQL does a very poor job with subqueries in in clauses, explaining the poor performance you see there. I suspect the join performance has to do with the ordering of the joins. It is probably reading the messages table in its entirety.

MySQL在子句中执行的子查询很差,解释了您在那里看到的糟糕性能。我怀疑连接性能与连接的顺序有关。它可能正在完整地读取消息表。

Try changing the in version to an exists:

尝试将in版本改为a exist:

select sql_no_cache me.*
from messageentry me
where exists (select 1 from linf l where l.limad='test@' and l.id = me.inf_id limit 1) ;

By the way, you should get used to doing joins in on clauses rather than in the where clause.

顺便说一下,你应该习惯在从句中使用join,而不是在where子句中。

#3


0  

What happens if you explicitly define the join criteria, like so?

如果显式地定义联接条件,会发生什么情况?

select sql_no_cache me.* 
from messageentry me JOIN linf l ON  me.linf_id=l.id
WHERE l.limad='test@';

You may get something screwy with your version if the optimizer chooses to do a cross join or something else weird.

如果优化器选择做交叉连接或其他怪异的事情,您可能会得到一些古怪的东西。

Barring that, you might consider doing a force index:

除此之外,你可以考虑做一个力指数:

select sql_no_cache me.* 
from messageentry me FORCE INDEX (FKBBB258CBB10E8518)
JOIN linf l ON  me.linf_id=l.id         
WHERE l.limad='test@';

This will at least show you if the index is actually going to help you or not.

这至少能告诉你指数是否真的对你有帮助。

#4


0  

  • Try using INT instead BIGINT if possible, also choose INT for primary key if possible. The secondary index like "linf_ID" store it's related primary key in disk. using BIGINT means more page faults and disk reads. http://planet.mysql.com/entry/?id=13825

    如果可能的话,尝试使用INT代替BIGINT,如果可能的话,也选择INT作为主键。二级索引像“linf_ID”存储它的相关主键在磁盘。使用BIGINT意味着更多的页面错误和磁盘读取。http://planet.mysql.com/entry/?id=13825

  • To reduce the index size of varchar, try index part of limad.
    In book "High Performance Mysql 3Edition" give us a way to choose the length of varchar index. choose the length which let the following two sql's result similar

    为了减小varchar的索引大小,可以尝试limad的索引部分。在“高性能Mysql 3Edition”一书中,我们可以选择varchar索引的长度。选择让以下两个sql结果类似的长度。

    SELECT COUNT(DISTINCT city)/COUNT(*) FROM sakila.city_demo;

    选择COUNT(不同的城市)/COUNT(*) FROM sakila.city_demo;

    SELECT COUNT(DISTINCT LEFT(city, 3))/COUNT() AS sel3, COUNT(DISTINCT LEFT(city, 4))/COUNT() AS sel4, COUNT(DISTINCT LEFT(city, 5))/COUNT() AS sel5, COUNT(DISTINCT LEFT(city, 6))/COUNT() AS sel6, COUNT(DISTINCT LEFT(city, 7))/COUNT(*) AS sel7 FROM sakila.city_demo;

    选择COUNT(清晰的左(城市,3))/COUNT()作为sel3, COUNT(清晰的左(城市,4))/COUNT()作为sel4, COUNT(清晰的左(城市,5))/COUNT(城市,6)/COUNT(城市,4)作为sel6, COUNT(城市,7)/COUNT(城市,7)

  • let MySQL analyze and optimize your data in disk http://dev.mysql.com/doc/refman/5.1/en/optimize-table.html http://dev.mysql.com/doc/refman/5.0/en/analyze-table.html

    让MySQL在磁盘http://dev.mysql.com/doc/refman/5.1/en/optimization -table.html http://dev.mysql.com/doc/refman/5.0/en/analyze-table.html中分析和优化数据

  • For your 1minute run "big request" SQL in questions, to optimize this SQL, you need using multiple column index. http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html

    对于在问题中运行1分钟的“大请求”SQL,要优化这个SQL,需要使用多个列索引。http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html

    CREATE UNIQUE INDEX idx_name ON MessageEntry(linf_ID, messageDetails_id, mailSourceFile_id)

    在MessageEntry上创建唯一的索引idx_name (linf_ID、messageDetails_id、mailSourceFile_id)

#1


3  

Let's look at the query:

让我们看一下查询:

select sql_no_cache me.*
from messageentry me, linf l
where me.linf_id=l.id
and l.limad='test@';

What does it do? According to the execution plan from the EXPLAIN for each row in me table it checks if there is a corresponding record in linf. Since you do not have any index on limad field, MySQL 5M times fetches the value of the limad field from disk (not from memory) to check if it is equal to '@test'. You say that the query returns 0 rows, but for another limad value that would give more rows it would need to go on disk for all me.* fields.

它做什么?根据me表中每一行的EXPLAIN的执行计划,它检查linf中是否有相应的记录。由于limad字段上没有任何索引,所以MySQL 5M乘以从磁盘(不是从内存)获取limad字段的值,以检查它是否等于“@test”。你说这个查询返回0行,但是对于另一个limad值,它会给出更多的行,所以我需要把它放到磁盘上。*字段。

Ok, limad field is varchar(127) COLLATE utf8_bin, that is an index on it might be expesive (I would add it anyway). 130k rows is less than 5M, so it would be great to start with linf, and all we need from messageentry for the start is id, mailSourceFile_id, messageDetails_id. Why only those fields? Since we are going to make two more joins and we take no data from the joined tables, the tables seem to narrow the final result set, that is they are required for the skeleton of the query. Let's start with them only:

limad字段是varchar(127) COLLATE utf8_bin,这是一个索引,它可能是expesive(我还是要添加它)。130k行小于5M,所以最好从linf开始,我们首先需要从messageentry中获得id、mailSourceFile_id、messageDetails_id。为什么只有这些字段?由于我们将再进行两个连接,并且不从已连接的表中获取数据,因此表似乎缩小了最终结果集,即查询的骨架需要这些表。让我们从它们开始:

SELECT me.id, me.mailSourceFile_id, me.messageDetails_id
FROM (
  SELECT ID as linf_ID
  FROM linf
  WHERE limad='test@'
) as linf
JOIN messageentry me USING (linf_ID);

The query selects required linf_ID as and then for each found id looks apprepriate rows in messageentry. Since you have an index on linf_iD, the query should result faster than 4 secs.

查询选择所需的linf_ID,然后每个找到的id在messageentry中看起来都是值得关注的行。由于您在linf_iD上有一个索引,查询的结果应该比4秒快。

But those me.mailSourceFile_id, me.messageDetails_id can't be taken from memory, since MySQL would need to do a complex index merge, hence, MySQL would anyway go on disk for each row with matching linf_ID. If you would have an index that contains all those three fields at once, the query would be even faster in case there is a significant amount of rows which are filtered by subsequent joins.

但这些我。mailSourceFile_id,我。不能从内存中获取messageDetails_id,因为MySQL需要进行复杂的索引合并,因此,MySQL将在磁盘上为每一行使用匹配的linf_ID。如果您将有一个同时包含所有这三个字段的索引,那么如果有大量的行被后续连接过滤,查询将会更快。

If you update your KEY FKBBB258CBB10E8518 (linf_ID) to FKBBB258CBB10E8518 (linf_ID, mailSourceFile_id, messageDetails_id), you would have such an index.

如果您将您的密钥FKBBB258CBB10E8518 (linf_ID)更新为FKBBB258CBB10E8518 (linf_ID、mailSourceFile_id、messageDetails_id),您将获得这样的索引。

The resulting query would look something like:

得到的查询将如下所示:

SELECT me.*
FROM (
  SELECT ID as linf_ID
  FROM linf
  WHERE limad='test@'
) as linf
JOIN messageentry me USING (linf_ID)
JOIN MailSourceFile ms ON ms.id = me.mailSourceFile_id
JOIN MessageDetails md ON md.id = me.messageDetails_id;

Actually, as soon as you update the index FKBBB258CBB10E8518 (linf_ID) as suggested above, your original query most probably will have the same execution plan and timing as the last query.

实际上,只要您按照上面的建议更新索引FKBBB258CBB10E8518 (linf_ID),您的原始查询很可能与上一个查询具有相同的执行计划和时间安排。

#2


0  

MySQL does a very poor job with subqueries in in clauses, explaining the poor performance you see there. I suspect the join performance has to do with the ordering of the joins. It is probably reading the messages table in its entirety.

MySQL在子句中执行的子查询很差,解释了您在那里看到的糟糕性能。我怀疑连接性能与连接的顺序有关。它可能正在完整地读取消息表。

Try changing the in version to an exists:

尝试将in版本改为a exist:

select sql_no_cache me.*
from messageentry me
where exists (select 1 from linf l where l.limad='test@' and l.id = me.inf_id limit 1) ;

By the way, you should get used to doing joins in on clauses rather than in the where clause.

顺便说一下,你应该习惯在从句中使用join,而不是在where子句中。

#3


0  

What happens if you explicitly define the join criteria, like so?

如果显式地定义联接条件,会发生什么情况?

select sql_no_cache me.* 
from messageentry me JOIN linf l ON  me.linf_id=l.id
WHERE l.limad='test@';

You may get something screwy with your version if the optimizer chooses to do a cross join or something else weird.

如果优化器选择做交叉连接或其他怪异的事情,您可能会得到一些古怪的东西。

Barring that, you might consider doing a force index:

除此之外,你可以考虑做一个力指数:

select sql_no_cache me.* 
from messageentry me FORCE INDEX (FKBBB258CBB10E8518)
JOIN linf l ON  me.linf_id=l.id         
WHERE l.limad='test@';

This will at least show you if the index is actually going to help you or not.

这至少能告诉你指数是否真的对你有帮助。

#4


0  

  • Try using INT instead BIGINT if possible, also choose INT for primary key if possible. The secondary index like "linf_ID" store it's related primary key in disk. using BIGINT means more page faults and disk reads. http://planet.mysql.com/entry/?id=13825

    如果可能的话,尝试使用INT代替BIGINT,如果可能的话,也选择INT作为主键。二级索引像“linf_ID”存储它的相关主键在磁盘。使用BIGINT意味着更多的页面错误和磁盘读取。http://planet.mysql.com/entry/?id=13825

  • To reduce the index size of varchar, try index part of limad.
    In book "High Performance Mysql 3Edition" give us a way to choose the length of varchar index. choose the length which let the following two sql's result similar

    为了减小varchar的索引大小,可以尝试limad的索引部分。在“高性能Mysql 3Edition”一书中,我们可以选择varchar索引的长度。选择让以下两个sql结果类似的长度。

    SELECT COUNT(DISTINCT city)/COUNT(*) FROM sakila.city_demo;

    选择COUNT(不同的城市)/COUNT(*) FROM sakila.city_demo;

    SELECT COUNT(DISTINCT LEFT(city, 3))/COUNT() AS sel3, COUNT(DISTINCT LEFT(city, 4))/COUNT() AS sel4, COUNT(DISTINCT LEFT(city, 5))/COUNT() AS sel5, COUNT(DISTINCT LEFT(city, 6))/COUNT() AS sel6, COUNT(DISTINCT LEFT(city, 7))/COUNT(*) AS sel7 FROM sakila.city_demo;

    选择COUNT(清晰的左(城市,3))/COUNT()作为sel3, COUNT(清晰的左(城市,4))/COUNT()作为sel4, COUNT(清晰的左(城市,5))/COUNT(城市,6)/COUNT(城市,4)作为sel6, COUNT(城市,7)/COUNT(城市,7)

  • let MySQL analyze and optimize your data in disk http://dev.mysql.com/doc/refman/5.1/en/optimize-table.html http://dev.mysql.com/doc/refman/5.0/en/analyze-table.html

    让MySQL在磁盘http://dev.mysql.com/doc/refman/5.1/en/optimization -table.html http://dev.mysql.com/doc/refman/5.0/en/analyze-table.html中分析和优化数据

  • For your 1minute run "big request" SQL in questions, to optimize this SQL, you need using multiple column index. http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html

    对于在问题中运行1分钟的“大请求”SQL,要优化这个SQL,需要使用多个列索引。http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html

    CREATE UNIQUE INDEX idx_name ON MessageEntry(linf_ID, messageDetails_id, mailSourceFile_id)

    在MessageEntry上创建唯一的索引idx_name (linf_ID、messageDetails_id、mailSourceFile_id)