如何通过多个连接加速MySQL查询

时间:2021-09-20 06:30:10

Here is my issue, I am selecting and doing multiple joins to get the correct items...it pulls in a fair amount of rows, above 100,000. This query takes more than 5mins when the date range is set to 1 year.

这是我的问题,我正在选择并进行多次连接以获得正确的项目...它会提取相当数量的行,超过100,000。当日期范围设置为1年时,此查询需要5分钟以上。

I don't know if it's possible but I am afraid that the user might extend the date range to like ten years and crash it.

我不知道是否可能,但我担心用户可能会将日期范围扩展到十年并使其崩溃。

Anyone know how I can speed this up? Here is the query.

谁知道我怎么能加快速度呢?这是查询。

SELECT DISTINCT t1.first_name, t1.last_name, t1.email 
FROM table1 AS t1 
INNER JOIN table2 AS t2 ON t1.CU_id = t2.O_cid 
INNER JOIN table3 AS t3 ON t2.O_ref = t3.I_oref 
INNER JOIN table4 AS t4 ON t3.I_pid = t4.P_id 
INNER JOIN table5 AS t5 ON t4.P_cat = t5.C_id 
WHERE t1.subscribe =1 
AND t1.Cdate >= $startDate
AND t1.Cdate <= $endDate
AND t5.store =2

I am not the greatest with mysql so any help would be appreciated!

我不是最好的mysql所以任何帮助将不胜感激!

Thanks in advance!

提前致谢!

UPDATE

UPDATE

Here is the explain you asked for

这是你要求的解释

id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra
1   SIMPLE  t5  ref     PRIMARY,C_store_type,C_id,C_store_type_2    C_store_type_2  1   const   101     Using temporary
1   SIMPLE  t4  ref     PRIMARY,P_cat   P_cat   5   alphacom.t5.C_id    326     Using where
1   SIMPLE  t3  ref     I_pid,I_oref    I_pid   4   alphacom.t4.P_id    31   
1   SIMPLE  t2  eq_ref  O_ref,O_cid     O_ref   28  alphacom.t3.I_oref  1    
1   SIMPLE  t1  eq_ref  PRIMARY     PRIMARY     4   alphacom.t2.O_cid   1   Using where

Also I added an index to table5 rows and table4 rows because they don't really change, however the other tables get around 500-1000 entries a month... I heard you should add an index to a table that has that many new entries....is this true?

我还为table5行和table4行添加了一个索引,因为它们并没有真正改变,但是其他表每月大约有500-1000个条目......我听说你应该为一个包含许多新条目的表添加一个索引....这是真的?

7 个解决方案

#1


11  

I'd try the following:

我尝试以下方法:

First, ensure there are indexes on the following tables and columns (each set of columns in parentheses should be a separate index):

首先,确保以下表和列上有索引(括号中的每组列应该是一个单独的索引):

table1 : (subscribe, CDate)
         (CU_id)
table2 : (O_cid)
         (O_ref)
table3 : (I_oref)
         (I_pid)
table4 : (P_id)
         (P_cat)
table5 : (C_id, store)

Second, if adding the above indexes didn't improve things as much as you'd like, try rewriting the query as

其次,如果添加上述索引并没有像你想的那样改进,请尝试将查询重写为

SELECT DISTINCT t1.first_name, t1.last_name, t1.email FROM
  (SELECT CU_id, t1.first_name, t1.last_name, t1.email
     FROM table1
     WHERE subscribe = 1 AND
           CDate >= $startDate AND
           CDate <= $endDate) AS t1
  INNER JOIN table2 AS t2
    ON t1.CU_id = t2.O_cid   
  INNER JOIN table3 AS t3
    ON t2.O_ref = t3.I_oref   
  INNER JOIN table4 AS t4
    ON t3.I_pid = t4.P_id   
  INNER JOIN (SELECT C_id FROM table5 WHERE store = 2) AS t5
    ON t4.P_cat = t5.C_id

I'm hoping here that the first sub-select would cut down significantly on the number of rows to be considered for joining, hopefully making the subsequent joins do less work. Ditto the reasoning behind the second sub-select on table5.

我希望这里第一个子选择会显着减少要考虑加入的行数,希望使后续连接做得更少。同样在table5上第二个子选择背后的推理。

In any case, mess with it. I mean, ultimately it's just a SELECT - you can't really hurt anything with it. Examine the plans that are generated by each different permutation and try to figure out what's good or bad about each.

无论如何,搞乱它。我的意思是,最终它只是一个SELECT - 你不能用它真的伤害任何东西。检查每个不同排列产生的计划,并试图找出每个排列的好坏。

Share and enjoy.

分享和享受。

#2


8  

Make sure your date columns and all the columns you are joining on are indexed.

确保您的日期列和您加入的所有列都已编制索引。

Doing an unequivalence operator on your dates means it checks every row, which is inherently slower than an equivalence.

在你的日期做一个不等的运算符意味着它检查每一行,这本质上比等价更慢。

Also, using DISTINCT adds an extra comparison to the logic that your optimizer is running behind the scenes. Eliminate that if possible.

此外,使用DISTINCT可以为优化程序在后台运行的逻辑添加额外的比较。如果可能的话,消除它。

#3


3  

Well, first, make a subquery to decimate table1 down to just the records you actually want to go to all the trouble of joining...

好吧,首先,创建一个子查询来将table1抽取到你真正想要加入的所有麻烦的记录......

SELECT DISTINCT t1.first_name, t1.last_name, t1.email  
FROM (  
SELECT first_name, last_name, email, CU_id FROM table1 WHERE  
table1.subscribe = 1  
AND table1.Cdate >= $startDate  
AND table1.Cdate <= $endDate  
) AS t1  
INNER JOIN table2 AS t2 ON t1.CU_id = t2.O_cid  
INNER JOIN table3 AS t3 ON t2.O_ref = t3.I_oref  
INNER JOIN table4 AS t4 ON t3.I_pid = t4.P_id  
INNER JOIN table5 AS t5 ON t4.P_cat = t5.C_id  
WHERE t5.store = 2

Then start looking at modifying the directionality of the joins.

然后开始考虑修改连接的方向性。

Additionally, if t5.store is only very rarely 2, then flip this idea around: construct the t5 subquery, then join it back and back and back.

另外,如果t5.store只是非常少的2,那么就可以解决这个问题:构造t5子查询,然后将它连接起来,然后反复加入。

#4


2  

At present, your query is returning all matching rows on table2-table5, just to establish whether t5.store = 2. If any of table2-table5 have a significantly higher row count than table1, this may be greatly increasing the number of rows processed - consequently, the following query may perform significantly better:

目前,您的查询返回table2-table5上的所有匹配行,只是为了确定t5.store = 2.如果table2-table5中的任何一行的行数明显高于table1,这可能会大大增加处理的行数 - 因此,以下查询可能会表现得更好:

SELECT DISTINCT t1.first_name, t1.last_name, t1.email 
FROM table1 AS t1 
WHERE t1.subscribe =1 
AND t1.Cdate >= $startDate
AND t1.Cdate <= $endDate
AND EXISTS
(SELECT NULL FROM table2 AS t2
INNER JOIN table3 AS t3 ON t2.O_ref = t3.I_oref 
INNER JOIN table4 AS t4 ON t3.I_pid = t4.P_id 
INNER JOIN table5 AS t5 ON t4.P_cat = t5.C_id AND t5.store =2
WHERE t1.CU_id = t2.O_cid);

#5


1  

Try adding indexes on the fields that you join. It may or may not improve the performance.

尝试在您加入的字段上添加索引。它可能会也可能不会改善性能。

Moreover it also depends on the engine that you are using. If you are using InnoDB check your configuration params. I had faced a similar problem, as the default configuration of innodb wont scale much as myisam's default configuration.

此外,它还取决于您使用的引擎。如果您使用的是InnoDB,请检查您的配置参数。我遇到了类似的问题,因为innodb的默认配置不会像myisam的默认配置那样扩展。

#6


1  

As everyone says, make sure you have indexes.

正如大家所说,确保你有索引。

You can also check if your server is set up properly so it can contain more of, of maybe the entire, dataset in memory.

您还可以检查服务器是否已正确设置,以便它可以包含更多内存中的整个数据集。

Without an EXPLAIN, there's not much to work by. Also keep in mind that MySQL will look at your JOIN, and iterate through all possible solutions before executing the query, which can take time. Once you have the optimal JOIN order from the EXPLAIN, you could try and force this order in your query, eliminating this step from the optimizer.

没有EXPLAIN,就没有多少工作了。还要记住,MySQL会查看你的JOIN,并在执行查询之前迭代所有可能的解决方案,这可能需要一些时间。从EXPLAIN获得最佳JOIN顺序后,您可以尝试在查询中强制执行此顺序,从而从优化程序中删除此步骤。

#7


-1  

It sounds like you should think about delivering subsets (paging) or limit the results some other way unless there is a reason that the users need every row possible all at once. Typically 100K rows is more than the average person can digest.

听起来你应该考虑提供子集(分页)或以其他方式限制结果,除非有一个原因是用户一次需要所有可能的行。通常100K行比普通人可以消化的多。

#1


11  

I'd try the following:

我尝试以下方法:

First, ensure there are indexes on the following tables and columns (each set of columns in parentheses should be a separate index):

首先,确保以下表和列上有索引(括号中的每组列应该是一个单独的索引):

table1 : (subscribe, CDate)
         (CU_id)
table2 : (O_cid)
         (O_ref)
table3 : (I_oref)
         (I_pid)
table4 : (P_id)
         (P_cat)
table5 : (C_id, store)

Second, if adding the above indexes didn't improve things as much as you'd like, try rewriting the query as

其次,如果添加上述索引并没有像你想的那样改进,请尝试将查询重写为

SELECT DISTINCT t1.first_name, t1.last_name, t1.email FROM
  (SELECT CU_id, t1.first_name, t1.last_name, t1.email
     FROM table1
     WHERE subscribe = 1 AND
           CDate >= $startDate AND
           CDate <= $endDate) AS t1
  INNER JOIN table2 AS t2
    ON t1.CU_id = t2.O_cid   
  INNER JOIN table3 AS t3
    ON t2.O_ref = t3.I_oref   
  INNER JOIN table4 AS t4
    ON t3.I_pid = t4.P_id   
  INNER JOIN (SELECT C_id FROM table5 WHERE store = 2) AS t5
    ON t4.P_cat = t5.C_id

I'm hoping here that the first sub-select would cut down significantly on the number of rows to be considered for joining, hopefully making the subsequent joins do less work. Ditto the reasoning behind the second sub-select on table5.

我希望这里第一个子选择会显着减少要考虑加入的行数,希望使后续连接做得更少。同样在table5上第二个子选择背后的推理。

In any case, mess with it. I mean, ultimately it's just a SELECT - you can't really hurt anything with it. Examine the plans that are generated by each different permutation and try to figure out what's good or bad about each.

无论如何,搞乱它。我的意思是,最终它只是一个SELECT - 你不能用它真的伤害任何东西。检查每个不同排列产生的计划,并试图找出每个排列的好坏。

Share and enjoy.

分享和享受。

#2


8  

Make sure your date columns and all the columns you are joining on are indexed.

确保您的日期列和您加入的所有列都已编制索引。

Doing an unequivalence operator on your dates means it checks every row, which is inherently slower than an equivalence.

在你的日期做一个不等的运算符意味着它检查每一行,这本质上比等价更慢。

Also, using DISTINCT adds an extra comparison to the logic that your optimizer is running behind the scenes. Eliminate that if possible.

此外,使用DISTINCT可以为优化程序在后台运行的逻辑添加额外的比较。如果可能的话,消除它。

#3


3  

Well, first, make a subquery to decimate table1 down to just the records you actually want to go to all the trouble of joining...

好吧,首先,创建一个子查询来将table1抽取到你真正想要加入的所有麻烦的记录......

SELECT DISTINCT t1.first_name, t1.last_name, t1.email  
FROM (  
SELECT first_name, last_name, email, CU_id FROM table1 WHERE  
table1.subscribe = 1  
AND table1.Cdate >= $startDate  
AND table1.Cdate <= $endDate  
) AS t1  
INNER JOIN table2 AS t2 ON t1.CU_id = t2.O_cid  
INNER JOIN table3 AS t3 ON t2.O_ref = t3.I_oref  
INNER JOIN table4 AS t4 ON t3.I_pid = t4.P_id  
INNER JOIN table5 AS t5 ON t4.P_cat = t5.C_id  
WHERE t5.store = 2

Then start looking at modifying the directionality of the joins.

然后开始考虑修改连接的方向性。

Additionally, if t5.store is only very rarely 2, then flip this idea around: construct the t5 subquery, then join it back and back and back.

另外,如果t5.store只是非常少的2,那么就可以解决这个问题:构造t5子查询,然后将它连接起来,然后反复加入。

#4


2  

At present, your query is returning all matching rows on table2-table5, just to establish whether t5.store = 2. If any of table2-table5 have a significantly higher row count than table1, this may be greatly increasing the number of rows processed - consequently, the following query may perform significantly better:

目前,您的查询返回table2-table5上的所有匹配行,只是为了确定t5.store = 2.如果table2-table5中的任何一行的行数明显高于table1,这可能会大大增加处理的行数 - 因此,以下查询可能会表现得更好:

SELECT DISTINCT t1.first_name, t1.last_name, t1.email 
FROM table1 AS t1 
WHERE t1.subscribe =1 
AND t1.Cdate >= $startDate
AND t1.Cdate <= $endDate
AND EXISTS
(SELECT NULL FROM table2 AS t2
INNER JOIN table3 AS t3 ON t2.O_ref = t3.I_oref 
INNER JOIN table4 AS t4 ON t3.I_pid = t4.P_id 
INNER JOIN table5 AS t5 ON t4.P_cat = t5.C_id AND t5.store =2
WHERE t1.CU_id = t2.O_cid);

#5


1  

Try adding indexes on the fields that you join. It may or may not improve the performance.

尝试在您加入的字段上添加索引。它可能会也可能不会改善性能。

Moreover it also depends on the engine that you are using. If you are using InnoDB check your configuration params. I had faced a similar problem, as the default configuration of innodb wont scale much as myisam's default configuration.

此外,它还取决于您使用的引擎。如果您使用的是InnoDB,请检查您的配置参数。我遇到了类似的问题,因为innodb的默认配置不会像myisam的默认配置那样扩展。

#6


1  

As everyone says, make sure you have indexes.

正如大家所说,确保你有索引。

You can also check if your server is set up properly so it can contain more of, of maybe the entire, dataset in memory.

您还可以检查服务器是否已正确设置,以便它可以包含更多内存中的整个数据集。

Without an EXPLAIN, there's not much to work by. Also keep in mind that MySQL will look at your JOIN, and iterate through all possible solutions before executing the query, which can take time. Once you have the optimal JOIN order from the EXPLAIN, you could try and force this order in your query, eliminating this step from the optimizer.

没有EXPLAIN,就没有多少工作了。还要记住,MySQL会查看你的JOIN,并在执行查询之前迭代所有可能的解决方案,这可能需要一些时间。从EXPLAIN获得最佳JOIN顺序后,您可以尝试在查询中强制执行此顺序,从而从优化程序中删除此步骤。

#7


-1  

It sounds like you should think about delivering subsets (paging) or limit the results some other way unless there is a reason that the users need every row possible all at once. Typically 100K rows is more than the average person can digest.

听起来你应该考虑提供子集(分页)或以其他方式限制结果,除非有一个原因是用户一次需要所有可能的行。通常100K行比普通人可以消化的多。