如何优化在MySQL中派生的列上执行订单的查询?

时间:2023-01-31 00:13:32

I am having trouble optimizing a relatively simple query involving a GROUP BY, ORDER BY, and LIMIT. The table has just over 300,000 records. Here's the schema (I added some extra indexes to experiment with):

我在优化一个涉及组BY、ORDER BY和LIMIT的相对简单的查询时遇到了麻烦。该表有超过30万条记录。这里是模式(我添加了一些额外的索引用于实验):

CREATE TABLE `scrape_search_results` (
  `id` int(11) NOT NULL auto_increment,
  `creative_id` int(11) NOT NULL,
  `url_id` int(11) NOT NULL,
  `access_date` datetime NOT NULL,
  PRIMARY KEY  (`id`),
  KEY `creative_url_index` (`creative_id`,`url_id`),
  KEY `access_date_index` (`access_date`),
  KEY `access_date_creative_id_index` (`access_date`,`creative_id`),
  KEY `creative_id_access_date_index` (`creative_id`,`access_date`),
  KEY `test_index` USING HASH (`creative_id`)
) ENGINE=MyISAM AUTO_INCREMENT=4252725 DEFAULT CHARSET=latin1

In the table, a single creative_id may appear multiple (hundreds) of times. The query I am trying to answer is a relatively simple one; give me the first 20 creative_ids ordered by access_date. Here's my SQL:

在表中,一个creative_id可能出现多次(数百次)。我试图回答的问题是一个相对简单的问题;给我按access_date排序的前20个creative_id。这是我的SQL:

SELECT `ScrapeSearchResult`.`creative_id`, 
        MAX(`ScrapeSearchResult`.`access_date`) AS `latest_access_date` 
FROM `scrape_search_results` AS `ScrapeSearchResult` 
WHERE 1 = 1 
GROUP BY `ScrapeSearchResult`.`creative_id` 
ORDER BY `latest_access_date` DESC 
LIMIT 20;

Here's the results of executing this query, where we see the 20th largest access_date is 2010-08-23 11:03:25:

下面是执行此查询的结果,其中第20个最大的access_date为2010-08-23 11:03:25:

+-------------+---------------------+
| creative_id | latest_access_date  |
+-------------+---------------------+
|         550 | 2010-08-23 11:07:49 | 
|        4568 | 2010-08-23 11:07:49 | 
|         552 | 2010-08-23 11:07:49 | 
|        2109 | 2010-08-23 11:07:49 | 
|        5221 | 2010-08-23 11:07:49 | 
|        1544 | 2010-08-23 11:07:49 | 
|        1697 | 2010-08-23 11:07:49 | 
|         554 | 2010-08-23 11:07:12 | 
|         932 | 2010-08-23 11:05:48 | 
|       11029 | 2010-08-23 11:05:37 | 
|       11854 | 2010-08-23 11:05:27 | 
|       11856 | 2010-08-23 11:05:05 | 
|         702 | 2010-08-23 11:03:56 | 
|        4319 | 2010-08-23 11:03:56 | 
|        7159 | 2010-08-23 11:03:56 | 
|       10610 | 2010-08-23 11:03:46 | 
|        5540 | 2010-08-23 11:03:46 | 
|           1 | 2010-08-23 11:03:46 | 
|       11942 | 2010-08-23 11:03:35 | 
|        7900 | 2010-08-23 11:03:25 | 
+-------------+---------------------+

If I was going to write this algorithm by hand, I would build a b-tree ordered on (access_date, creative_id). I'd start at the MAX(access_date) and keep walking the tree until I found 20 unique creative_ids, which I would then return in the order I found them in.

如果我要手工编写这个算法,我将构建一个b-tree命令(access_date, creative_id)。我将从MAX(access_date)开始,然后继续遍历树,直到找到20个惟一的creative_ids,然后按照我找到它们的顺序返回。

Using that algorithm, I would need to consider just 94 rows (there are 94 rows for which access_date >= 2010-08-23 11:03:25, which is our 20th largest access_date as shown above).

使用该算法,我只需要考虑94行(有94行access_date >= 2010-08-23 11:03:25,这是第20大的access_date,如上所示)。

However, MySQL decides to use creative_url_index when answering this query, which I don't understand. It considers over 10,000 rows when doing this.

但是,MySQL决定在回答这个查询时使用creative_url_index,我不理解。它在执行此操作时考虑超过10,000行。

ANALYZE TABLE scrape_search_results;
SELECT ...;
+----+-------------+--------------------+-------+---------------+--------------------+---------+------+-------+---------------------------------+
| id | select_type | table              | type  | possible_keys | key                | key_len | ref  | rows  | Extra                           |
+----+-------------+--------------------+-------+---------------+--------------------+---------+------+-------+---------------------------------+
|  1 | SIMPLE      | ScrapeSearchResult | index | NULL          | creative_url_index | 8       | NULL | 10687 | Using temporary; Using filesort | 
+----+-------------+--------------------+-------+---------------+--------------------+---------+------+-------+---------------------------------+

Is my trouble that I am performing an ORDER BY on the derived-column MAX(access_date)? If so, how can I optimize my query to perform more in-line with my expectations?

我的麻烦是在派生列MAX(access_date)上执行ORDER BY吗?如果是这样,我如何优化查询,使其更符合我的期望?

1 个解决方案

#1


4  

I haven't done this sort of thing in MySQL for a while (long since switched to PostgtreSQL) but typically I would handle this with concentric selects to trick the query planner into giving a good plan.

我已经有一段时间没有在MySQL中做过这类事情了(很长一段时间以来我一直在使用PostgtreSQL),但是我通常会使用同心选择来处理这类事情,以欺骗查询计划者给出一个好的计划。

SELECT * FROM 
(SELECT `ScrapeSearchResult`.`creative_id`, 
        MAX(`ScrapeSearchResult`.`access_date`) AS `latest_access_date` 
FROM `scrape_search_results` AS `ScrapeSearchResult` 
WHERE 1 = 1 
GROUP BY `ScrapeSearchResult`.`creative_id` 

) as inner
ORDER BY `latest_access_date` DESC 
LIMIT 20;

The success of this will purely depend on a reasonable number of total rows in the inner though.

它的成功完全取决于内部的行数。

I just looked up the docs for MySQL 5.6 and it looks like this should work ... even in MySQL ;)

我只是查找了MySQL 5.6的文档,它看起来应该可以工作了……即使在MySQL;)

#1


4  

I haven't done this sort of thing in MySQL for a while (long since switched to PostgtreSQL) but typically I would handle this with concentric selects to trick the query planner into giving a good plan.

我已经有一段时间没有在MySQL中做过这类事情了(很长一段时间以来我一直在使用PostgtreSQL),但是我通常会使用同心选择来处理这类事情,以欺骗查询计划者给出一个好的计划。

SELECT * FROM 
(SELECT `ScrapeSearchResult`.`creative_id`, 
        MAX(`ScrapeSearchResult`.`access_date`) AS `latest_access_date` 
FROM `scrape_search_results` AS `ScrapeSearchResult` 
WHERE 1 = 1 
GROUP BY `ScrapeSearchResult`.`creative_id` 

) as inner
ORDER BY `latest_access_date` DESC 
LIMIT 20;

The success of this will purely depend on a reasonable number of total rows in the inner though.

它的成功完全取决于内部的行数。

I just looked up the docs for MySQL 5.6 and it looks like this should work ... even in MySQL ;)

我只是查找了MySQL 5.6的文档,它看起来应该可以工作了……即使在MySQL;)