当使用索引时,使用“ALL”类型的Mysql Explain查询

时间:2021-12-20 20:55:06

I ran a query in Mysql like below:

我在Mysql中运行了如下查询:

EXPLAIN
SELECT *
FROM(
        SELECT *  # Select Number 2
        FROM post
        WHERE   parentid = 13
        ORDER BY time, id
        LIMIT 1, 10
    ) post13_childs
JOIN post post13_childs_childs
ON post13_childs_childs.parentid = post13_childs.id

and the result was:

结果是:

id |select_type  |table               |type |possible_keys  |key      |key_len  |ref              |rows    |Extra
1  |PRIMARY      |<derived2>          |ALL  | NULL          | NULL    |NULL     |NULL             |10      |
1  |PRIMARY      |post13_childs_childs|ref  |parentid       |parentid |9        |post13_childs.id |10      |Using where
2  |DERIVED      |post                |ALL  |parentid       |parentid |9        |                 |153153  |Using where; Using filesort

This means it used the index parentid but scaned all rows due to ALL and 153153. Why could not the index help to not Full Scannig?

这意味着它使用了索引parentid,但是由于all和153153而扫描了所有行。为什么这个指数不能帮助我们不充分利用呢?

Although if i run the derived query (Select #2) alone like below:

虽然如果我单独运行派生查询(选择#2),如下所示:

Explain
SELECT * FROM post  
WHERE parentid=13
ORDER BY time , id
LIMIT 1,10

the result would be desired:

预期的结果是:

id |select_type  |table  |type |possible_keys  |key      |key_len  |ref  |rows    |Extra
1  |SIMPLE       |post   |ref  |parentid       |parentid |9        |const|41      |Using where; Using filesort

Edit:

The table post has these indexes:

表列有以下指标:

  1. id (PRIMARY)
  2. id(初级)
  3. parentid
  4. parentid
  5. time, id (timeid)
  6. 时间,id(timeid)

count of total rows --> 141280.
count of children of 13 (parentid=13) --> 41
count of children of 11523 --> 10119

总行计数——> 141280。13岁儿童的计数(parentid=13)——> 41岁儿童的计数11523——> 10119

When i add index of (parent,time,id), problem of first query would be solved by the explin output for 13 --> 40 rows, type:ref
and for 11523 --> 19538 rows, type:ref!!! this Means all children rows of 11423 is examined while i limited first 10 rows.

当我添加(父、时间、id)索引时,第一个查询的问题将通过explin输出13—> 40行、类型:ref、11523—> 19538行来解决,类型:ref!!!这意味着检查11423的所有子行,同时限制前10行。

2 个解决方案

#1


7  

Your subquery:

你的子查询:

    SELECT *  # Select Number 2
    FROM post
    WHERE   parentid = 13
    ORDER BY time, id
    LIMIT 1, 10;

This mentions three columns explicitly, plus all the rest of the columns You have three indexes. Here is how they can be used:

它显式地提到了三列,加上剩下的所有列,您有三个索引。以下是它们的使用方法:

  • id (PRIMARY) -- This index is useless. Although mentioned in the order by clause, it is the second condition
  • id (PRIMARY)——这个索引没有用处。虽然在order by子句中提到过,但这是第二个条件
  • parentid -- This index can be used for satisfying the where clause. However, after the correct data is pulled, it then would need to be sorted explicitly.
  • parentid——这个索引可用于满足where子句。然而,在正确的数据被拉出来之后,它需要被显式地排序。
  • time, id (timeid) -- This index can be used for the sort, with a big BUT. MySQL can scan the index to get everything in the right order. But it will have to check, row-by-row, whether the condition on parentid is met.
  • time, id (timeid)——这个索引可以用于排序,带有一个大大的但是。MySQL可以扫描索引以获得正确的顺序。但是,它必须逐行检查父母的条件是否满足。

Just to introduce why optimization is hard. If you have a small amount of data (say the table fits on one or two pages), then a full table scan followed by a sort is probably fine. If most of the parentid values are 13, then the second index could be a worst case. If the table does not fit into memory, then the third would be incredibly slow (something called page thrashing).

简单介绍一下为什么优化很难。如果您有少量的数据(假设该表适合于一或两页),那么完整的表扫描之后进行排序可能是可以的。如果parentid的大多数值是13,那么第二个索引可能是最坏的情况。如果表不适合内存,那么第三个表将会非常慢(称为页面抖动)。

The correct index for this subquery is one that satisfies the where clause and allows ordering. That index is parentid, time, id. This is not a covering index (unless these are all the columns in the table). But it should reduce the number of hits to actual rows to 10 because of the limit clause.

这个子查询的正确索引是满足where子句并允许排序的索引。这个索引是parentid、time和id,不是覆盖索引(除非这些是表中的所有列)。但是,由于有limit子句,它应该将命中的次数减少到实际的行10次。

Note that for the complete query, you want an index on parentid. And, happily, an index on parentid, time, id counts as such an index. So, you can remove that index. The time, id index is probably not necessary, unless you need that for other queries.

注意,对于完整的查询,您需要在parentid上建立一个索引。而且,令人高兴的是,parentid、time和id上的索引也可以算作这样的索引。所以,你可以删除那个索引。时间,id索引可能不是必需的,除非您对其他查询需要它。

Your query is also filtering only those "children" that have "children" themselves. It is quite possible that no rows will be returned. Do you really intend a left outer join?

您的查询也只过滤那些自己有“子”的“子”。很可能不会返回任何行。你真的打算左外加入吗?

As a final comment. I assume that this query is a simplification of your real query. The query is pulling all columns from two tables -- and those two tables are the same. That is, you will be getting duplicate column names from identical tables. You should have column aliases to better define the columns.

最后一个评论。我假设这个查询是对真实查询的简化。查询从两个表中提取所有列——这两个表是相同的。也就是说,您将从相同的表中获得重复的列名。应该使用列别名来更好地定义列。

#2


0  

Doing an ORDER BY that is not helped by any index can regularly kill performance. For the inner query, I would have a covering index on (parentID, time, id ) so that both the WHERE and ORDER BY clauses can utilize the index. Since the parentID is also the basis of the join afterwords, it should be good to go there to and be quite fast.

在没有任何索引帮助的情况下执行订单会经常破坏性能。对于内部查询,我将有一个覆盖索引(parentID, time, id),以便在WHERE和ORDER BY子句可以使用索引。因为parentID也是连接后词的基础,所以应该去那里,而且要快。

#1


7  

Your subquery:

你的子查询:

    SELECT *  # Select Number 2
    FROM post
    WHERE   parentid = 13
    ORDER BY time, id
    LIMIT 1, 10;

This mentions three columns explicitly, plus all the rest of the columns You have three indexes. Here is how they can be used:

它显式地提到了三列,加上剩下的所有列,您有三个索引。以下是它们的使用方法:

  • id (PRIMARY) -- This index is useless. Although mentioned in the order by clause, it is the second condition
  • id (PRIMARY)——这个索引没有用处。虽然在order by子句中提到过,但这是第二个条件
  • parentid -- This index can be used for satisfying the where clause. However, after the correct data is pulled, it then would need to be sorted explicitly.
  • parentid——这个索引可用于满足where子句。然而,在正确的数据被拉出来之后,它需要被显式地排序。
  • time, id (timeid) -- This index can be used for the sort, with a big BUT. MySQL can scan the index to get everything in the right order. But it will have to check, row-by-row, whether the condition on parentid is met.
  • time, id (timeid)——这个索引可以用于排序,带有一个大大的但是。MySQL可以扫描索引以获得正确的顺序。但是,它必须逐行检查父母的条件是否满足。

Just to introduce why optimization is hard. If you have a small amount of data (say the table fits on one or two pages), then a full table scan followed by a sort is probably fine. If most of the parentid values are 13, then the second index could be a worst case. If the table does not fit into memory, then the third would be incredibly slow (something called page thrashing).

简单介绍一下为什么优化很难。如果您有少量的数据(假设该表适合于一或两页),那么完整的表扫描之后进行排序可能是可以的。如果parentid的大多数值是13,那么第二个索引可能是最坏的情况。如果表不适合内存,那么第三个表将会非常慢(称为页面抖动)。

The correct index for this subquery is one that satisfies the where clause and allows ordering. That index is parentid, time, id. This is not a covering index (unless these are all the columns in the table). But it should reduce the number of hits to actual rows to 10 because of the limit clause.

这个子查询的正确索引是满足where子句并允许排序的索引。这个索引是parentid、time和id,不是覆盖索引(除非这些是表中的所有列)。但是,由于有limit子句,它应该将命中的次数减少到实际的行10次。

Note that for the complete query, you want an index on parentid. And, happily, an index on parentid, time, id counts as such an index. So, you can remove that index. The time, id index is probably not necessary, unless you need that for other queries.

注意,对于完整的查询,您需要在parentid上建立一个索引。而且,令人高兴的是,parentid、time和id上的索引也可以算作这样的索引。所以,你可以删除那个索引。时间,id索引可能不是必需的,除非您对其他查询需要它。

Your query is also filtering only those "children" that have "children" themselves. It is quite possible that no rows will be returned. Do you really intend a left outer join?

您的查询也只过滤那些自己有“子”的“子”。很可能不会返回任何行。你真的打算左外加入吗?

As a final comment. I assume that this query is a simplification of your real query. The query is pulling all columns from two tables -- and those two tables are the same. That is, you will be getting duplicate column names from identical tables. You should have column aliases to better define the columns.

最后一个评论。我假设这个查询是对真实查询的简化。查询从两个表中提取所有列——这两个表是相同的。也就是说,您将从相同的表中获得重复的列名。应该使用列别名来更好地定义列。

#2


0  

Doing an ORDER BY that is not helped by any index can regularly kill performance. For the inner query, I would have a covering index on (parentID, time, id ) so that both the WHERE and ORDER BY clauses can utilize the index. Since the parentID is also the basis of the join afterwords, it should be good to go there to and be quite fast.

在没有任何索引帮助的情况下执行订单会经常破坏性能。对于内部查询,我将有一个覆盖索引(parentID, time, id),以便在WHERE和ORDER BY子句可以使用索引。因为parentID也是连接后词的基础,所以应该去那里,而且要快。