MySQL:Grouped / Ordered / Left Join查询非常慢

时间:2022-09-19 19:06:13

I have a problem with a query which takes far too long (Over two seconds just for this simple query).

我的查询有一个问题需要太长时间(这个简单的查询只需要两秒钟)。

On first look it appears to be an indexing issue, all joined fields are indexed, but i cannot find what else I may need to index to speed this up. As soon as i add the fields i need to the query, it gets even slower.

首先看它似乎是一个索引问题,所有连接的字段都被索引,但我找不到我可能需要索引的其他内容以加快这一点。一旦我添加了查询所需的字段,它就会变得更慢。

SELECT  `jobs`.`job_id` AS  `job_id` FROM tabledef_Jobs AS jobs
 LEFT JOIN tabledef_JobCatLink AS jobcats ON jobs.job_id = jobcats.job_id
 LEFT JOIN tabledef_Applications AS apps ON jobs.job_id = apps.job_id
 LEFT JOIN tabledef_Companies AS company ON jobs.company_id = company.company_id
GROUP BY  `jobs`.`job_id` 
ORDER BY  `jobs`.`date_posted` ASC
LIMIT 0 , 50

Table row counts (~): tabledef_Jobs (108k), tabledef_JobCatLink (109k), tabledef_Companies (100), tabledef_Applications (50k)

表行数(〜):tabledef_Jobs(108k),tabledef_JobCatLink(109k),tabledef_Companies(100),tabledef_Applications(50k)

Here you can see the Describe. 'Using temporary' appears to be what is slowing down the query:

在这里你可以看到描述。 '使用临时'似乎是减慢查询速度的原因:

MySQL:Grouped / Ordered / Left Join查询非常慢

table index screenshots:

表索引截图:

MySQL:Grouped / Ordered / Left Join查询非常慢

MySQL:Grouped / Ordered / Left Join查询非常慢

MySQL:Grouped / Ordered / Left Join查询非常慢

MySQL:Grouped / Ordered / Left Join查询非常慢

Any help would be greatly appreciated

任何帮助将不胜感激

EDIT WITH ANSWER

编辑与答案

Final improved query with thanks to @Steve (marked answer). Ultimately, the final query was reduced from ~22s to ~0.3s:

感谢@Steve(标记答案),最终改进了查询。最终,最终查询从~22s减少到~0.3s:

SELECT `jobs`.`job_id` AS `job_id` FROM
(
 SELECT * FROM tabledef_Jobs as jobs ORDER BY `jobs`.`date_posted` ASC LIMIT 0 , 50
) AS jobs
 LEFT JOIN tabledef_JobCatLink AS jobcats ON jobs.job_id = jobcats.job_id
 LEFT JOIN tabledef_Applications AS apps ON jobs.job_id = apps.job_id
 LEFT JOIN tabledef_Companies AS company ON jobs.company_id = company.company_id
GROUP BY  `jobs`.`job_id` 
ORDER BY  `jobs`.`date_posted` ASC
LIMIT 0 , 50

1 个解决方案

#1


6  

Right, I’ll have a stab at this.

对,我会抓住这个。

It would appear that the Query Optimiser cannot use an index to fulfil the query upon the tabledef_Jobs table.

看起来查询优化器似乎无法使用索引来完成tabledef_Jobs表上的查询。

You've got an offset limit and this with the combination of your ORDER BY cannot limit the amount of data before joining and thus it is having to group by job_id which is a PK and fast – but then order that data (temporary table and a filesort) before limiting and throwing away a the vast majorly of this data before finally join everything else to it.

你有一个偏移限制,这与你的ORDER BY的组合不能限制加入前的数据量,因此它必须按job_id分组,这是一个PK和快 - 但然后订购该数据(临时表和一个filesort)在限制和丢弃这些数据的大部分之前,最后加入其他所有内容。

I would suggest, adding a composite index to jobs of “job_id, date_posted”

我建议,为“job_id,date_posted”的作业添加一个复合索引

So firstly optimise the base query:

首先优化基本查询:

SELECT * FROM tabledef_Jobs 
GROUP BY job_id
ORDER BY date_posted
LIMIT 0,50

Then you can combine the joins and the final structure together to make a more efficient query.

然后,您可以将连接和最终结构组合在一起,以提高查询效率。

I cannot let it go by without suggesting you rethink your limit offset. This is fine for small initial offsets but when it starts to get large this can be a major cause of performance issues. Let’s for example sake say you’re using this for pagination, what happens if they want page 3,000 – you will use

如果不建议你重新考虑你的限制抵消,我不能放过它。这适用于小的初始偏移,但是当它开始变大时,这可能是性能问题的主要原因。例如,让我们说你正在使用它进行分页,如果他们想要第3,000页会发生什么 - 你会使用

LIMIT 3000, 50

This will then collect 3050 rows / manipulate the data and then throw away the first 3000.

然后,这将收集3050行/操纵数据,然后扔掉前3000。

[edit 1 - In response to comments below]

[编辑1 - 回应以下评论]

I will expand with some more information that might point you in the right direction. Unfortunately there isn’t a simple fix that will resolve it , you must understand why this is happening to be able to address it. Simply removing the LIMIT or ORDER BY may not work and after all you don’t want to remove then as its part of your query which means it must be there for a purpose.

我将扩展一些可能指向正确方向的信息。不幸的是,没有一个简单的解决方案可以解决它,你必须明白为什么会发生这种情况才能解决它。简单地删除LIMIT或ORDER BY可能不起作用,毕竟你不想删除它作为查询的一部分,这意味着它必须在那里用于某个目的。

Optimise the simple base query first that is usually a lot easier than working with multi-joined datasets.

首先优化简单的基本查询,这通常比使用多连接数据集容易得多。

Despite all the bashing it receives there is nothing wrong with filesort. Sometimes this is the only way to execute the query. Agreed it can be the cause of many performance issues (especially on larger data sets) but that’s not usually the fault of filesort but the underlying query / indexing strategy.

尽管它收到了所有的抨击,但是filesort没有任何问题。有时这是执行查询的唯一方法。同意它可能是许多性能问题的原因(特别是在较大的数据集上),但这通常不是filesort的错误,而是底层的查询/索引策略。

Within MySQL you cannot mix indexes or mix orders of the same index – performing such a task will result in a filesort.

在MySQL中,您不能混合索引或混合相同索引的订单 - 执行此类任务将导致文件排序。

How about as I suggested creating an index on date_posted and then using:

我建议在date_posted上创建一个索引,然后使用:

SELECT jobs.job_id, jobs.date_posted, jobcats .*, apps.*, company .* FROM
(
    SELECT DISTINCT job_id FROM tabledef_Jobs 
    ORDER BY date_posted
    LIMIT 0,50
) AS jobs
LEFT JOIN tabledef_JobCatLink AS jobcats ON jobs.job_id = jobcats.job_id
LEFT JOIN tabledef_Applications AS apps ON jobs.job_id = apps.job_id
LEFT JOIN tabledef_Companies AS company ON jobs.company_id = company.company_id

#1


6  

Right, I’ll have a stab at this.

对,我会抓住这个。

It would appear that the Query Optimiser cannot use an index to fulfil the query upon the tabledef_Jobs table.

看起来查询优化器似乎无法使用索引来完成tabledef_Jobs表上的查询。

You've got an offset limit and this with the combination of your ORDER BY cannot limit the amount of data before joining and thus it is having to group by job_id which is a PK and fast – but then order that data (temporary table and a filesort) before limiting and throwing away a the vast majorly of this data before finally join everything else to it.

你有一个偏移限制,这与你的ORDER BY的组合不能限制加入前的数据量,因此它必须按job_id分组,这是一个PK和快 - 但然后订购该数据(临时表和一个filesort)在限制和丢弃这些数据的大部分之前,最后加入其他所有内容。

I would suggest, adding a composite index to jobs of “job_id, date_posted”

我建议,为“job_id,date_posted”的作业添加一个复合索引

So firstly optimise the base query:

首先优化基本查询:

SELECT * FROM tabledef_Jobs 
GROUP BY job_id
ORDER BY date_posted
LIMIT 0,50

Then you can combine the joins and the final structure together to make a more efficient query.

然后,您可以将连接和最终结构组合在一起,以提高查询效率。

I cannot let it go by without suggesting you rethink your limit offset. This is fine for small initial offsets but when it starts to get large this can be a major cause of performance issues. Let’s for example sake say you’re using this for pagination, what happens if they want page 3,000 – you will use

如果不建议你重新考虑你的限制抵消,我不能放过它。这适用于小的初始偏移,但是当它开始变大时,这可能是性能问题的主要原因。例如,让我们说你正在使用它进行分页,如果他们想要第3,000页会发生什么 - 你会使用

LIMIT 3000, 50

This will then collect 3050 rows / manipulate the data and then throw away the first 3000.

然后,这将收集3050行/操纵数据,然后扔掉前3000。

[edit 1 - In response to comments below]

[编辑1 - 回应以下评论]

I will expand with some more information that might point you in the right direction. Unfortunately there isn’t a simple fix that will resolve it , you must understand why this is happening to be able to address it. Simply removing the LIMIT or ORDER BY may not work and after all you don’t want to remove then as its part of your query which means it must be there for a purpose.

我将扩展一些可能指向正确方向的信息。不幸的是,没有一个简单的解决方案可以解决它,你必须明白为什么会发生这种情况才能解决它。简单地删除LIMIT或ORDER BY可能不起作用,毕竟你不想删除它作为查询的一部分,这意味着它必须在那里用于某个目的。

Optimise the simple base query first that is usually a lot easier than working with multi-joined datasets.

首先优化简单的基本查询,这通常比使用多连接数据集容易得多。

Despite all the bashing it receives there is nothing wrong with filesort. Sometimes this is the only way to execute the query. Agreed it can be the cause of many performance issues (especially on larger data sets) but that’s not usually the fault of filesort but the underlying query / indexing strategy.

尽管它收到了所有的抨击,但是filesort没有任何问题。有时这是执行查询的唯一方法。同意它可能是许多性能问题的原因(特别是在较大的数据集上),但这通常不是filesort的错误,而是底层的查询/索引策略。

Within MySQL you cannot mix indexes or mix orders of the same index – performing such a task will result in a filesort.

在MySQL中,您不能混合索引或混合相同索引的订单 - 执行此类任务将导致文件排序。

How about as I suggested creating an index on date_posted and then using:

我建议在date_posted上创建一个索引,然后使用:

SELECT jobs.job_id, jobs.date_posted, jobcats .*, apps.*, company .* FROM
(
    SELECT DISTINCT job_id FROM tabledef_Jobs 
    ORDER BY date_posted
    LIMIT 0,50
) AS jobs
LEFT JOIN tabledef_JobCatLink AS jobcats ON jobs.job_id = jobcats.job_id
LEFT JOIN tabledef_Applications AS apps ON jobs.job_id = apps.job_id
LEFT JOIN tabledef_Companies AS company ON jobs.company_id = company.company_id