如何使用多个连接来优化慢速查询

时间:2022-07-03 06:26:12

My situation:

我的情况:

  • the query searches around 90,000 vehicles
  • 该查询搜索了大约9万辆汽车
  • the query takes long each time
  • 每次查询都花费很长时间
  • I already have indexes on all the fields being JOINed.
  • 我已经在所有被连接的字段上拥有了索引。

How can I optimise it?

我如何优化它?

Here is the query:

在这里查询:

SELECT vehicles.make_id,
       vehicles.fuel_id,
       vehicles.body_id,
       vehicles.transmission_id,
       vehicles.colour_id,
       vehicles.mileage,
       vehicles.vehicle_year,
       vehicles.engine_size,
       vehicles.trade_or_private,
       vehicles.doors,
       vehicles.model_id,
       Round(3959 * Acos(Cos(Radians(51.465436)) *
                         Cos(Radians(vehicles.gps_lat)) *
                                           Cos(
                                           Radians(vehicles.gps_lon) - Radians(
                                           -0.296482)) +
                               Sin(
                                      Radians(51.465436)) * Sin(
                               Radians(vehicles.gps_lat)))) AS distance
FROM   vehicles
       INNER JOIN vehicles_makes
         ON vehicles.make_id = vehicles_makes.id
       LEFT JOIN vehicles_models
         ON vehicles.model_id = vehicles_models.id
       LEFT JOIN vehicles_fuel
         ON vehicles.fuel_id = vehicles_fuel.id
       LEFT JOIN vehicles_transmissions
         ON vehicles.transmission_id = vehicles_transmissions.id
       LEFT JOIN vehicles_axles
         ON vehicles.axle_id = vehicles_axles.id
       LEFT JOIN vehicles_sub_years
         ON vehicles.sub_year_id = vehicles_sub_years.id
       INNER JOIN members
         ON vehicles.member_id = members.id
       LEFT JOIN vehicles_categories
         ON vehicles.category_id = vehicles_categories.id
WHERE  vehicles.status = 1
       AND vehicles.date_from < 1330349235
       AND vehicles.date_to > 1330349235
       AND vehicles.type_id = 1
       AND ( vehicles.price >= 0
             AND vehicles.price <= 1000000 )  

Here is the vehicle table schema:

这里是车辆表模式:

CREATE TABLE IF NOT EXISTS `vehicles` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `number_plate` varchar(100) NOT NULL,
  `type_id` int(11) NOT NULL,
  `make_id` int(11) NOT NULL,
  `model_id` int(11) NOT NULL,
  `model_sub_type` varchar(250) NOT NULL,
  `engine_size` decimal(12,1) NOT NULL,
  `vehicle_year` int(11) NOT NULL,
  `sub_year_id` int(11) NOT NULL,
  `mileage` int(11) NOT NULL,
  `fuel_id` int(11) NOT NULL,
  `transmission_id` int(11) NOT NULL,
  `price` decimal(12,2) NOT NULL,
  `trade_or_private` tinyint(4) NOT NULL,
  `postcode` varchar(25) NOT NULL,
  `gps_lat` varchar(50) NOT NULL,
  `gps_lon` varchar(50) NOT NULL,
  `img1` varchar(100) NOT NULL,
  `img2` varchar(100) NOT NULL,
  `img3` varchar(100) NOT NULL,
  `img4` varchar(100) NOT NULL,
  `img5` varchar(100) NOT NULL,
  `img6` varchar(100) NOT NULL,
  `img7` varchar(100) NOT NULL,
  `img8` varchar(100) NOT NULL,
  `img9` varchar(100) NOT NULL,
  `img10` varchar(100) NOT NULL,
  `is_featured` tinyint(4) NOT NULL,
  `body_id` int(11) NOT NULL,
  `colour_id` int(11) NOT NULL,
  `doors` tinyint(4) NOT NULL,
  `axle_id` int(11) NOT NULL,
  `category_id` int(11) NOT NULL,
  `contents` text NOT NULL,
  `date_created` int(11) NOT NULL,
  `date_edited` int(11) NOT NULL,
  `date_from` int(11) NOT NULL,
  `date_to` int(11) NOT NULL,
  `member_id` int(11) NOT NULL,
  `inactive_id` int(11) NOT NULL,
  `status` tinyint(4) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `type_id` (`type_id`),
  KEY `make_id` (`make_id`),
  KEY `model_id` (`model_id`),
  KEY `fuel_id` (`fuel_id`),
  KEY `transmission_id` (`transmission_id`),
  KEY `body_id` (`body_id`),
  KEY `colour_id` (`colour_id`),
  KEY `axle_id` (`axle_id`),
  KEY `category_id` (`category_id`),
  KEY `vehicle_year` (`vehicle_year`),
  KEY `mileage` (`mileage`),
  KEY `status` (`status`),
  KEY `date_from` (`date_from`),
  KEY `date_to` (`date_to`),
  KEY `trade_or_private` (`trade_or_private`),
  KEY `doors` (`doors`),
  KEY `price` (`price`),
  KEY `engine_size` (`engine_size`),
  KEY `sub_year_id` (`sub_year_id`),
  KEY `member_id` (`member_id`),
  KEY `date_created` (`date_created`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8 AUTO_INCREMENT=136237 ;

The EXPLAIN:

解释:

1   SIMPLE  vehicles    ref     type_id,make_id,status,date_from,date_to,price,mem...   type_id     4   const   85695   Using where
1   SIMPLE  members     index   PRIMARY     PRIMARY     4   NULL    3   Using where; Using index; Using join buffer
1   SIMPLE  vehicles_makes  eq_ref  PRIMARY     PRIMARY     4   tvs.vehicles.make_id    1   Using index
1   SIMPLE  vehicles_models     eq_ref  PRIMARY     PRIMARY     4   tvs.vehicles.model_id   1   Using index
1   SIMPLE  vehicles_fuel   eq_ref  PRIMARY     PRIMARY     4   tvs.vehicles.fuel_id    1   Using index
1   SIMPLE  vehicles_transmissions  eq_ref  PRIMARY     PRIMARY     4   tvs.vehicles.transmission_id    1   Using index
1   SIMPLE  vehicles_axles  eq_ref  PRIMARY     PRIMARY     4   tvs.vehicles.axle_id    1   Using index
1   SIMPLE  vehicles_sub_years  eq_ref  PRIMARY     PRIMARY     4   tvs.vehicles.sub_year_id    1   Using index
1   SIMPLE  vehicles_categories     eq_ref  PRIMARY     PRIMARY     4   tvs.vehicles.category_id    1   Using index

5 个解决方案

#1


11  

Improving the WHERE clause

改善WHERE子句

Your EXPLAIN shows that MySQL is only utilizing one index (type_id) for selecting the rows that match the WHERE clause, even though you have multiple criteria in the clause.

您的解释显示,MySQL仅使用一个索引(type_id)来选择与WHERE子句匹配的行,即使您在子句中有多个条件。

To be able to utilize an index for all of the criteria in the WHERE clause, and to reduce the size of the result set as quickly as possible, add a multi-column index on the following columns on the vehicles table:

为了能够对WHERE子句中的所有标准使用索引,并尽可能快地减少结果集的大小,在vehicle表的以下列上添加多列索引:

(status, date_from, date_to, type_id, price)

The columns should be in order of highest cardinality to least.

列的基数最高的顺序应该是最少的。

For example, vehicles.date_from is likely to have more distinct values than status, so put the date_from column before status, like this:

例如,汽车。date_from可能具有比status更不同的值,因此将date_from列放在status之前,如下所示:

(date_from, date_to, price, type_id, status)

This should reduce the rows returned in the first part of the query execution, and should be demonstrated with a lower row count on the first line of the EXPLAIN result.

这将减少查询执行的第一部分中返回的行,并且应该在EXPLAIN结果的第一行上使用较低的行计数进行演示。

You will also notice that MySQL will use the multi-column index for the WHERE in the EXPLAIN result. If, by chance, it doesn't, you should hint or force the multi-column index.

您还将注意到,MySQL将使用多列索引作为EXPLAIN结果中的WHERE。如果碰巧没有,您应该提示或强制多列索引。

Removing the unnecessary JOINs

删除不必要的连接

It doesn't appear that you are using any fields in any of the joined tables, so remove the joins. This will remove all of the additional work of the query, and get you down to one, simple execution plan (one line in the EXPLAIN result).

似乎没有在任何已连接的表中使用任何字段,因此删除联接。这将删除查询的所有附加工作,并将您简化为一个简单的执行计划(EXPLAIN结果中的一行)。

Each JOINed table causes an additional lookup per row of the result set. So, if the WHERE clause selects 5,000 rows from vehicles, since you have 8 joins to vehicles, you will have 5,000 * 8 = 40,000 lookups. That's a lot to ask from your database server.

每个已连接的表都会对结果集中的每一行进行额外的查找。这对数据库服务器来说是一个很大的要求。

#2


4  

Instead of expensive calculation of precise distance for all of the rows use a bounding box and calculate the exact distance only for rows inside the box.

而不是对所有行的精确距离进行昂贵的计算,而是使用一个边界框,并只计算盒子内的行的精确距离。

The simplest possible example is to calculate min/max longitude and latitude that interests you and add it to WHERE clause. This way the distance will be calculated only for a subset of rows.

最简单的示例是计算您感兴趣的最小/最大经度和纬度,并将其添加到WHERE子句中。这样,就只计算出行的子集的距离。

WHERE
    vehicles.gps_lat > min_lat ANDd vehicles.gps_lat < max_lat AND
    vehicles.gps_lon > min_lon AND vehicles.gps_lon < max_lon

For more complex solutions see:

有关更复杂的解决方案,请参阅:

#3


3  

Is you SQL faster without this?

如果没有这个,您的SQL会更快吗?

Round(3959 * Acos(Cos(Radians(51.465436)) *
  Cos(Radians(vehicles.gps_lat)) *
  Cos(Radians(vehicles.gps_lon) - 
  Radians(-0.296482)) + 
  Sin(Radians(51.465436)) * 
  Sin(Radians(vehicles.gps_lat)))) AS distance

performing math equation is very expensive

执行数学方程是非常昂贵的

Maybe you should consider a materialized view that pre-calculate you distance, and you can select from that view. Depending on how dynamic you data is, you may not have to refresh you data too often.

也许你应该考虑一个物化的视图,它预先计算你的距离,你可以从这个视图中选择。根据数据的动态程度,可能不需要经常刷新数据。

#4


1  

To be a little more specific than @Randy of indexes, I believe his intention was to have a COMPOUND index to take advantage of your querying critieria... One index that is built on a MINIMUM of ...

比索引的@Randy更具体一点,我相信他的目的是利用你的查询critieria来建立一个复合索引……一个建立在最小值上的索引。

( status, type_id, date_from )

but could be extended to include the date_to and price too, but don't know how much the index at that granular level might actually help

但是也可以扩展到包含date_to和price,但是不知道在这个粒度级别上的索引可能会有多大帮助。

( status, type_id, date_from, date_to, price )

EDIT per Comments

编辑/评论

You shouldn't need all those individual indexes... Yes, the Primary Key by itself. However, for the others, you should have compound indexes based on what your common querying criteria might be and remove the others... the engine might get confused on which might be best suited for the query. If you know you are always looking for a certain status, type and date (assuming vehicle searches), make that as one index. If the query is looking for such information, but also prices within that criteria it will already be very close on the few indexed records that qualify and fly through the price as just an extra criteria.

你不需要所有那些单独的索引……是的,主键本身。但是,对于其他的,您应该基于您的通用查询条件拥有复合索引,并删除其他索引……引擎可能会搞不清哪个可能最适合查询。如果你知道你一直在寻找某种状态,类型和日期(假设车辆搜索),把它作为一个索引。如果查询要查找这样的信息,但也要查找该条件下的价格,那么它将非常接近少数索引记录,这些记录符合条件,并作为一个额外的条件通过价格。

If you offer querying like Only Automatic vs Manual transmission regardless of year/make, then yes, that could be an index of its own. However, if you would TYPICALLY have some other "common" criteria, tack that on as a secondary that MAY be utilized in the query. Ex: if you look for Manual Transmissions that are 2-door vs 4-door, have your index on (transmission_id, category_id).

如果您提供的查询类似于自动与手动传输,而不考虑年份/制造,那么是的,这可能是它自己的索引。但是,如果您通常有一些其他的“通用”标准,请将其作为辅助标准添加到查询中。例句:如果你寻找的手动传输是2门传输还是4门传输,请将你的索引放在(transmission_id, category_id)上。

Again, you want whatever will help narrow down the field of criteria based on some "minimum" condition. If you tack on an extra column to the index that might "commonly" be applied, that should only help the performance.

同样,你想要任何能帮助缩小基于“最小”条件的标准范围的东西。如果您在可能“通常”被应用的索引中添加一个额外的列,那么这应该只会有助于性能。

#5


1  

To clarify this as an answer: if you do not already have these indexes, you should consider adding them

要明确这一点作为答案:如果您还没有这些索引,您应该考虑添加它们

do you also have indexes on these:

你们也有关于这些的索引吗:

vehicles.status
vehicles.date_from
vehicles.date_to
vehicles.type_id
vehicles.price

#1


11  

Improving the WHERE clause

改善WHERE子句

Your EXPLAIN shows that MySQL is only utilizing one index (type_id) for selecting the rows that match the WHERE clause, even though you have multiple criteria in the clause.

您的解释显示,MySQL仅使用一个索引(type_id)来选择与WHERE子句匹配的行,即使您在子句中有多个条件。

To be able to utilize an index for all of the criteria in the WHERE clause, and to reduce the size of the result set as quickly as possible, add a multi-column index on the following columns on the vehicles table:

为了能够对WHERE子句中的所有标准使用索引,并尽可能快地减少结果集的大小,在vehicle表的以下列上添加多列索引:

(status, date_from, date_to, type_id, price)

The columns should be in order of highest cardinality to least.

列的基数最高的顺序应该是最少的。

For example, vehicles.date_from is likely to have more distinct values than status, so put the date_from column before status, like this:

例如,汽车。date_from可能具有比status更不同的值,因此将date_from列放在status之前,如下所示:

(date_from, date_to, price, type_id, status)

This should reduce the rows returned in the first part of the query execution, and should be demonstrated with a lower row count on the first line of the EXPLAIN result.

这将减少查询执行的第一部分中返回的行,并且应该在EXPLAIN结果的第一行上使用较低的行计数进行演示。

You will also notice that MySQL will use the multi-column index for the WHERE in the EXPLAIN result. If, by chance, it doesn't, you should hint or force the multi-column index.

您还将注意到,MySQL将使用多列索引作为EXPLAIN结果中的WHERE。如果碰巧没有,您应该提示或强制多列索引。

Removing the unnecessary JOINs

删除不必要的连接

It doesn't appear that you are using any fields in any of the joined tables, so remove the joins. This will remove all of the additional work of the query, and get you down to one, simple execution plan (one line in the EXPLAIN result).

似乎没有在任何已连接的表中使用任何字段,因此删除联接。这将删除查询的所有附加工作,并将您简化为一个简单的执行计划(EXPLAIN结果中的一行)。

Each JOINed table causes an additional lookup per row of the result set. So, if the WHERE clause selects 5,000 rows from vehicles, since you have 8 joins to vehicles, you will have 5,000 * 8 = 40,000 lookups. That's a lot to ask from your database server.

每个已连接的表都会对结果集中的每一行进行额外的查找。这对数据库服务器来说是一个很大的要求。

#2


4  

Instead of expensive calculation of precise distance for all of the rows use a bounding box and calculate the exact distance only for rows inside the box.

而不是对所有行的精确距离进行昂贵的计算,而是使用一个边界框,并只计算盒子内的行的精确距离。

The simplest possible example is to calculate min/max longitude and latitude that interests you and add it to WHERE clause. This way the distance will be calculated only for a subset of rows.

最简单的示例是计算您感兴趣的最小/最大经度和纬度,并将其添加到WHERE子句中。这样,就只计算出行的子集的距离。

WHERE
    vehicles.gps_lat > min_lat ANDd vehicles.gps_lat < max_lat AND
    vehicles.gps_lon > min_lon AND vehicles.gps_lon < max_lon

For more complex solutions see:

有关更复杂的解决方案,请参阅:

#3


3  

Is you SQL faster without this?

如果没有这个,您的SQL会更快吗?

Round(3959 * Acos(Cos(Radians(51.465436)) *
  Cos(Radians(vehicles.gps_lat)) *
  Cos(Radians(vehicles.gps_lon) - 
  Radians(-0.296482)) + 
  Sin(Radians(51.465436)) * 
  Sin(Radians(vehicles.gps_lat)))) AS distance

performing math equation is very expensive

执行数学方程是非常昂贵的

Maybe you should consider a materialized view that pre-calculate you distance, and you can select from that view. Depending on how dynamic you data is, you may not have to refresh you data too often.

也许你应该考虑一个物化的视图,它预先计算你的距离,你可以从这个视图中选择。根据数据的动态程度,可能不需要经常刷新数据。

#4


1  

To be a little more specific than @Randy of indexes, I believe his intention was to have a COMPOUND index to take advantage of your querying critieria... One index that is built on a MINIMUM of ...

比索引的@Randy更具体一点,我相信他的目的是利用你的查询critieria来建立一个复合索引……一个建立在最小值上的索引。

( status, type_id, date_from )

but could be extended to include the date_to and price too, but don't know how much the index at that granular level might actually help

但是也可以扩展到包含date_to和price,但是不知道在这个粒度级别上的索引可能会有多大帮助。

( status, type_id, date_from, date_to, price )

EDIT per Comments

编辑/评论

You shouldn't need all those individual indexes... Yes, the Primary Key by itself. However, for the others, you should have compound indexes based on what your common querying criteria might be and remove the others... the engine might get confused on which might be best suited for the query. If you know you are always looking for a certain status, type and date (assuming vehicle searches), make that as one index. If the query is looking for such information, but also prices within that criteria it will already be very close on the few indexed records that qualify and fly through the price as just an extra criteria.

你不需要所有那些单独的索引……是的,主键本身。但是,对于其他的,您应该基于您的通用查询条件拥有复合索引,并删除其他索引……引擎可能会搞不清哪个可能最适合查询。如果你知道你一直在寻找某种状态,类型和日期(假设车辆搜索),把它作为一个索引。如果查询要查找这样的信息,但也要查找该条件下的价格,那么它将非常接近少数索引记录,这些记录符合条件,并作为一个额外的条件通过价格。

If you offer querying like Only Automatic vs Manual transmission regardless of year/make, then yes, that could be an index of its own. However, if you would TYPICALLY have some other "common" criteria, tack that on as a secondary that MAY be utilized in the query. Ex: if you look for Manual Transmissions that are 2-door vs 4-door, have your index on (transmission_id, category_id).

如果您提供的查询类似于自动与手动传输,而不考虑年份/制造,那么是的,这可能是它自己的索引。但是,如果您通常有一些其他的“通用”标准,请将其作为辅助标准添加到查询中。例句:如果你寻找的手动传输是2门传输还是4门传输,请将你的索引放在(transmission_id, category_id)上。

Again, you want whatever will help narrow down the field of criteria based on some "minimum" condition. If you tack on an extra column to the index that might "commonly" be applied, that should only help the performance.

同样,你想要任何能帮助缩小基于“最小”条件的标准范围的东西。如果您在可能“通常”被应用的索引中添加一个额外的列,那么这应该只会有助于性能。

#5


1  

To clarify this as an answer: if you do not already have these indexes, you should consider adding them

要明确这一点作为答案:如果您还没有这些索引,您应该考虑添加它们

do you also have indexes on these:

你们也有关于这些的索引吗:

vehicles.status
vehicles.date_from
vehicles.date_to
vehicles.type_id
vehicles.price