我不确定是否有正确的索引,或者是否可以提高MySQL查询的速度?

时间:2021-09-20 06:30:16

My query has a join, and it looks like it's using two indexes which makes it more complicated. I'm not sure if I can improve on this, but I thought I'd ask.

我的查询有一个连接,看起来它使用了两个索引,这使得查询更加复杂。我不确定我能不能在这方面有所改进,但我想我应该问问。

The query produces a list of records with similar keywords the record being queried.

该查询生成一个具有类似关键字记录的记录列表。

Here's my query.

这是我的查询。

SELECT match_keywords.padid,
       COUNT(match_keywords.word) AS matching_words
FROM   keywords current_program_keywords
       INNER JOIN keywords match_keywords
         ON match_keywords.word = current_program_keywords.word
WHERE  match_keywords.word IS NOT NULL
       AND current_program_keywords.padid = 25695
GROUP  BY match_keywords.padid
ORDER  BY matching_words DESC
LIMIT  0, 11  

The EXPLAIN 我不确定是否有正确的索引,或者是否可以提高MySQL查询的速度?

解释

Word is varchar(40).

单词是varchar(40)。

3 个解决方案

#1


9  

You can start by trying to remove the IS NOT NULL test, which is implicitly removed by COUNT on the field. It also looks like you would want to omit 25695 from match_keywords, otherwise 25695 (or other) would surely show up as the "best" match within your 11 row limit?

您可以从尝试删除NOT NULL测试开始,该测试通过字段上的COUNT隐式删除。看起来您还希望从match_keywords中省略25695,否则25695(或其他)一定会在您的11行限制中显示为“最佳”匹配?

SELECT     match_keywords.padid,
           COUNT(match_keywords.word) AS matching_words
FROM       keywords current_program_keywords
INNER JOIN keywords match_keywords
        ON match_keywords.word = current_program_keywords.word
WHERE      current_program_keywords.padid = 25695
GROUP BY   match_keywords.padid
ORDER BY   matching_words DESC
LIMIT      0, 11

Next, consider how you would do it as a person.

接下来,考虑一下你作为一个人该怎么做。

  • You would to start with a padid (25695) and retrieve all the words for that padid
  • 您将从padid(25695)开始,并检索该padid的所有单词
  • From those list of words, go back into the table again and for each matching word, get their padid's (assumed to have no duplicate on padid + word)
  • 从这些单词列表中,再次回到表中,对于每个匹配的单词,获取它们的padid(假定在padid + word上没有重复)
  • group the padid's together and count them
  • 把padid组在一起数一数。
  • order the counts and return the highest 11
  • 订购计数并返回最高的11

With your list of 3 separate single-column indexes, the first two steps (both involve only 2 columns) will always have to jump from index back to data to get the other column. Covering indexes may help here - create two composite indexes to test

对于3个单独的单列索引的列表,前两个步骤(都只涉及2列)必须从索引跳到数据,才能获得另一列。覆盖索引在这里可能有帮助——创建两个要测试的复合索引

create index ix_keyword_pw on keyword(padid, word);
create index ix_keyword_wp on keyword(word, padid);

With these composite indexes in place, you can remove the single-column indexes on padid and word since they are covered by these two.

有了这些复合索引,您就可以删除padid和word上的单列索引,因为它们由这两个索引覆盖。

Note: You always have to temper SELECT performance against

注意:您必须对选择性能进行调整

  • size of indexes (the more you create the more to store)
  • 索引的大小(创建的越多,存储的越多)
  • insert/update performance (the more indexes, the longer it takes to commit since it has to update the data, then update all indexes)
  • 插入/更新性能(索引越多,提交所需的时间就越长,因为必须更新数据,然后更新所有索引)

#2


5  

Try the following... ensure index on PadID, and one on WORD. Then, by changing the order of the SELECT WHERE qualifier should optimize on the PADID of the CURRENT keyword first, then join to the others... Exclude a join to itself. Also, since you were checking on equality on the inner join to matching keywords... if the current keyword is checked for null, it should never join to a null value, thus eliminating a compare on the MATCH keywords alias as looking at every comparison as looking for NULL...

试试下面的……确保PadID上有索引,WORD上有索引。然后,通过改变选择的顺序,限定符应该先对当前关键字的PADID进行优化,然后再加入其他的。排除一个连接本身。另外,由于您检查了内部连接与匹配关键字的相等性……如果检查当前关键字是否为null,则不应该将它连接到null值,从而消除了对匹配关键字别名的比较,将每个比较视为查找null……

SELECT STRAIGHT_JOIN
      match_keywords.padid,
      COUNT(*) AS matching_words 
   FROM
      keywords current_program_keywords
         INNER JOIN keywords match_keywords          
            ON match_keywords.word = current_program_keywords.word 
            and match_keywords.padid <> 25695
   WHERE  
          current_program_keywords.padid = 25695
      AND current_program_keywords.word IS NOT NULL
   GROUP BY 
      match_keywords.padid 
   ORDER BY 
      matching_words DESC 
   LIMIT
      0, 11 

#3


1  

You should index the following fields (check to what table corresponds)

您应该索引以下字段(检查对应的表)

match_keyword.padid

match_keyword.padid

current_program_keywords.padid

current_program_keywords.padid

match_keyword.words

match_keyword.words

current_program_keywords.words

current_program_keywords.words

Hope it helps accelerate

希望它可以帮助加速

#1


9  

You can start by trying to remove the IS NOT NULL test, which is implicitly removed by COUNT on the field. It also looks like you would want to omit 25695 from match_keywords, otherwise 25695 (or other) would surely show up as the "best" match within your 11 row limit?

您可以从尝试删除NOT NULL测试开始,该测试通过字段上的COUNT隐式删除。看起来您还希望从match_keywords中省略25695,否则25695(或其他)一定会在您的11行限制中显示为“最佳”匹配?

SELECT     match_keywords.padid,
           COUNT(match_keywords.word) AS matching_words
FROM       keywords current_program_keywords
INNER JOIN keywords match_keywords
        ON match_keywords.word = current_program_keywords.word
WHERE      current_program_keywords.padid = 25695
GROUP BY   match_keywords.padid
ORDER BY   matching_words DESC
LIMIT      0, 11

Next, consider how you would do it as a person.

接下来,考虑一下你作为一个人该怎么做。

  • You would to start with a padid (25695) and retrieve all the words for that padid
  • 您将从padid(25695)开始,并检索该padid的所有单词
  • From those list of words, go back into the table again and for each matching word, get their padid's (assumed to have no duplicate on padid + word)
  • 从这些单词列表中,再次回到表中,对于每个匹配的单词,获取它们的padid(假定在padid + word上没有重复)
  • group the padid's together and count them
  • 把padid组在一起数一数。
  • order the counts and return the highest 11
  • 订购计数并返回最高的11

With your list of 3 separate single-column indexes, the first two steps (both involve only 2 columns) will always have to jump from index back to data to get the other column. Covering indexes may help here - create two composite indexes to test

对于3个单独的单列索引的列表,前两个步骤(都只涉及2列)必须从索引跳到数据,才能获得另一列。覆盖索引在这里可能有帮助——创建两个要测试的复合索引

create index ix_keyword_pw on keyword(padid, word);
create index ix_keyword_wp on keyword(word, padid);

With these composite indexes in place, you can remove the single-column indexes on padid and word since they are covered by these two.

有了这些复合索引,您就可以删除padid和word上的单列索引,因为它们由这两个索引覆盖。

Note: You always have to temper SELECT performance against

注意:您必须对选择性能进行调整

  • size of indexes (the more you create the more to store)
  • 索引的大小(创建的越多,存储的越多)
  • insert/update performance (the more indexes, the longer it takes to commit since it has to update the data, then update all indexes)
  • 插入/更新性能(索引越多,提交所需的时间就越长,因为必须更新数据,然后更新所有索引)

#2


5  

Try the following... ensure index on PadID, and one on WORD. Then, by changing the order of the SELECT WHERE qualifier should optimize on the PADID of the CURRENT keyword first, then join to the others... Exclude a join to itself. Also, since you were checking on equality on the inner join to matching keywords... if the current keyword is checked for null, it should never join to a null value, thus eliminating a compare on the MATCH keywords alias as looking at every comparison as looking for NULL...

试试下面的……确保PadID上有索引,WORD上有索引。然后,通过改变选择的顺序,限定符应该先对当前关键字的PADID进行优化,然后再加入其他的。排除一个连接本身。另外,由于您检查了内部连接与匹配关键字的相等性……如果检查当前关键字是否为null,则不应该将它连接到null值,从而消除了对匹配关键字别名的比较,将每个比较视为查找null……

SELECT STRAIGHT_JOIN
      match_keywords.padid,
      COUNT(*) AS matching_words 
   FROM
      keywords current_program_keywords
         INNER JOIN keywords match_keywords          
            ON match_keywords.word = current_program_keywords.word 
            and match_keywords.padid <> 25695
   WHERE  
          current_program_keywords.padid = 25695
      AND current_program_keywords.word IS NOT NULL
   GROUP BY 
      match_keywords.padid 
   ORDER BY 
      matching_words DESC 
   LIMIT
      0, 11 

#3


1  

You should index the following fields (check to what table corresponds)

您应该索引以下字段(检查对应的表)

match_keyword.padid

match_keyword.padid

current_program_keywords.padid

current_program_keywords.padid

match_keyword.words

match_keyword.words

current_program_keywords.words

current_program_keywords.words

Hope it helps accelerate

希望它可以帮助加速