使用GROUP BY,ORDER BY和GROUP_CONCAT进行索引

时间:2020-11-25 22:48:21

SOLVED SEE BELOW

解决了以下问题

I am trying to use both GROUP BY and ORDER BY in my query where I retrieve data sorted by difficulty. I have to use the GROUP BY because of the GROUP CONCAT since some tables such as the 'lookup_peripheral', link multiple values to the same key (content_id). I understand why MYSQL cannot use a index when performing this task since the GROUP BY and ORDER BY statements do not share the same field. However, I am looking for alternative solutions that won't require a day to retrieve the results.

我试图在我的查询中使用GROUP BY和ORDER BY,我检索按难度排序的数据。我必须使用GROUP BY,因为GROUP CONCAT因为某些表(例如'lookup_peripheral')将多个值链接到同一个键(content_id)。我理解为什么MYSQL在执行此任务时不能使用索引,因为GROUP BY和ORDER BY语句不共享相同的字段。但是,我正在寻找替代解决方案,不需要一天的时间来检索结果。

If I omit either the GROUP BY or ORDER BY clause, then the database uses an index, but the results lack either all of the peripherals or are not sorted by difficulty.

如果我省略GROUP BY或ORDER BY子句,那么数据库使用索引,但结果缺少所有外围设备或者没有按困难排序。

I am using the 'lookup_difficulty' table in the FROM so I can use that index in ordering the results. The lookup_xxxxx tables store each allowed value and then the other tables such as peripheral link the submission to the value via the content_id. Everything is referenced to the submission content_id. The content table holds essential info such as member id, name, etc.

我在FROM中使用'lookup_difficulty'表,所以我可以使用该索引来排序结果。 lookup_xxxxx表存储每个允许的值,然后其他表(如外设)通过content_id将提交链接到值。提交content_id引用了所有内容。内容表包含会员ID,姓名等基本信息。

I apologize if my post is not clear enough.

如果我的帖子不够清楚,我道歉。

mysql> describe peripheral;
+------------------+----------+------+-----+---------+-------+
| Field            | Type     | Null | Key | Default | Extra |
+------------------+----------+------+-----+---------+-------+
| peripheral_id    | int(2)   | NO   | PRI | NULL    |       |
| peripheral       | char(30) | NO   |     | NULL    |       |
| peripheral_total | int(5)   | NO   |     | NULL    |       |
+------------------+----------+------+-----+---------+-------+

mysql> select * from peripheral;
+---------------+-----------------+------------------+
| peripheral_id | peripheral      | peripheral_total |
+---------------+-----------------+------------------+
|             1 | periph 1        |                0 |
|             2 | periph 2        |                1 |
|             3 | periph 3        |                3 |
+---------------+-----------------+------------------+

:

mysql> describe lookup_peripheral;
+---------------+---------+------+------+---------+-------+
| Field         | Type    | Null | Key  | Default | Extra |
+---------------+---------+------+------+---------+-------+
| content_id    | int(10) | NO   | INDEX| NULL    |       |
| peripheral_id | int(2)  | NO   |      | NULL    |       |
+---------------+---------+------+------+---------+-------+  


mysql> mysql> select * from lookup_peripheral;
+------------+---------------+
| content_id | peripheral_id |
+------------+---------------+
|         74 |             2 |
|         74 |             5 |
|         75 |             2 |
|         75 |             5 |
|         76 |             3 |
|         76 |             4 |
+------------+---------------+

The following is not using an index on lookup_difficulty, but rather a table sort and temporary table.

以下不是在lookup_difficulty上使用索引,而是使用表排序和临时表。

SELECT group_concat(DISTINCT peripheral.peripheral) as peripheral, content.member, .....
FROM (lookup_difficulty)
LEFT OUTER JOIN lookup_peripheral ON lookup_difficulty.content_id = lookup_peripheral.content_id
LEFT OUTER JOIN peripheral ON peripheral.peripheral_id = lookup_peripheral.peripheral_id
.....
LEFT OUTER JOIN programmer ON programmer.programmer_id = lookup_programmer.programmer_id
LEFT OUTER JOIN lookup_programming_language ON lookup_difficulty.content_id = lookup_programming_language.content_id

GROUP BY lookup_difficulty.content_id
ORDER BY lookup_dfficulty.difficulty_id
LIMIT 30    

The ultimate goal is to retrieve results sorted by difficulty with the correct peripherals attached. I think I need a sub-query to achieve this.

最终目标是通过附加的正确外围设备检索按难度排序的结果。我想我需要一个子查询来实现这一目标。


EDIT: ANSWER BELOW:

编辑:下面的答案:

Figured it out. I did what I suscpected I had to do, which was to add a sub-query. Since MYSQL can only use one index per table, I was unable to GROUP BY and SORT BY together for my particular setup. Instead, I added another query that would use another index on a different table to group the peripherals together. Here what I added in the SELECT statement above:

弄清楚了。我做了我怀疑我必须做的事情,那就是添加一个子查询。由于MYSQL每个表只能使用一个索引,因此我无法将GROUP BY和SORT BY一起用于我的特定设置。相反,我添加了另一个查询,该查询将使用另一个表上的另一个索引将外围设备组合在一起。这是我在上面的SELECT语句中添加的内容:

(SELECT group_concat(DISTINCT peripheral.peripheral) as peripheral
FROM lookup_peripheral
LEFT OUTER JOIN peripheral ON peripheral.peripheral_id = lookup_peripheral.peripheral_id
WHERE lookup_difficulty.content_id = lookup_peripheral.content_id
GROUP BY lookup_peripheral.content_id
LIMIT 1) as peripheral

I used a LEFT OUTER since some entries do not have any peripherals. Total query time is now .02s on a 400MHz processor with 128MB of 100Hz RAM for a 40k row database for most of the tables.

我使用LEFT OUTER,因为有些条目没有任何外围设备。对于大多数表,对于40k行数据库,400MHz处理器上的总查询时间现在为.02s,具有128MB的100Hz RAM。

EXPLAIN now gives me a USING INDEX for the lookup_difficulty table. I added this to achieve that:

EXPLAIN现在为lookup_difficulty表提供了一个USING INDEX。我添加了这个以实现这一点:

ALTER TABLE `pictuts`.`lookup_difficulty` DROP PRIMARY KEY ,
ADD PRIMARY KEY ( `difficulty_id` , `content_id` ) 

Edit 2 I noticed that with large offsets by using pagination, the page will load considerably slower. You may have experienced this with other sites as well. Fortuatly, there is a way to avoid this as pointed out by Peter Zaitsev. Here is my updated snippet to achieve the same timings for offsets of 30K or 0:

编辑2我注意到,通过使用分页使用较大的偏移量,页面加载速度会相当慢。您可能也体验过其他网站。幸运的是,正如Peter Zaitsev所指出的那样,有一种方法可以避免这种情况。这是我更新的片段,以实现30K或0的偏移量相同的时间:

FROM 
SELECT lookup_difficulty.content_id, lookup_difficulty.difficulty_id
FROM lookup_difficulty
LIMIT '.$offset.', '.$per_page.'
) ld

Now just add ld.whatever to every JOIN made and there you have it! My query look like a total mess now, but at least it is optimized. I don't think anyone will make it this far in reading this...

现在只需添加ld.w到每个JOIN,你就拥有它!我的查询现在看起来像一团糟,但至少它已经过优化。我不认为有人会在阅读这篇文章时做到这一点......

1 个解决方案

#1


2  

Put in Justin's answer, so this question gets off the unanswered list:

放入Justin的答案,所以这个问题摆脱了未答复的清单:

Figured it out. I did what I suspected I had to do, which was to add a sub-query. Since MYSQL can only use one index per table, I was unable to GROUP BY and SORT BY together for my particular setup. Instead, I added another query that would use another index on a different table to group the peripherals together. Here what I added in the SELECT statement above:

弄清楚了。我做了我怀疑我必须做的事情,那就是添加一个子查询。由于MYSQL每个表只能使用一个索引,因此我无法将GROUP BY和SORT BY一起用于我的特定设置。相反,我添加了另一个查询,该查询将使用另一个表上的另一个索引将外围设备组合在一起。这是我在上面的SELECT语句中添加的内容:

(SELECT group_concat(DISTINCT p.peripheral) as peripheral
FROM lookup_peripheral lp
LEFT JOIN peripheral p ON p.peripheral_id = lp.peripheral_id
WHERE ld.content_id = lp.content_id
GROUP BY lp.content_id
LIMIT 1) as peripheral

I used a LEFT OUTER since some entries do not have any peripherals. Total query time is now .02s on a 400MHz processor with 128MB of 100Hz RAM for a 40k row database for most of the tables.

我使用LEFT OUTER,因为有些条目没有任何外围设备。对于大多数表,对于40k行数据库,400MHz处理器上的总查询时间现在为.02s,具有128MB的100Hz RAM。

EXPLAIN now gives me a USING INDEX for the lookup_difficulty table. I added this to achieve that:

EXPLAIN现在为lookup_difficulty表提供了一个USING INDEX。我添加了这个以实现这一点:

ALTER TABLE pictuts.lookup_difficulty DROP PRIMARY KEY ,
ADD PRIMARY KEY ( difficulty_id , content_id ) 

Edit 2 I noticed that with large offsets by using pagination, the page will load considerably slower. You may have experienced this with other sites as well. Fortunately, there is a way to avoid this as pointed out by Peter Zaitsev. Here is my updated snippet to achieve the same timings for offsets of 30K or 0:

编辑2我注意到,通过使用分页使用较大的偏移量,页面加载速度会相当慢。您可能也体验过其他网站。幸运的是,Peter Zaitsev指出,有一种方法可以避免这种情况。这是我更新的片段,以实现30K或0的偏移量相同的时间:

FROM 
SELECT ld.content_id, ld.difficulty_id
FROM lookup_difficulty ld
LIMIT '.$per_page.' OFFSET '.$offset.' 
) ld

Now just add ld.whatever to every JOIN made and there you have it! My query look like a total mess now, but at least it is optimized. I don't think anyone will make it this far in reading this...

现在只需添加ld.w到每个JOIN,你就拥有它!我的查询现在看起来像一团糟,但至少它已经过优化。我不认为有人会在阅读这篇文章时做到这一点......

#1


2  

Put in Justin's answer, so this question gets off the unanswered list:

放入Justin的答案,所以这个问题摆脱了未答复的清单:

Figured it out. I did what I suspected I had to do, which was to add a sub-query. Since MYSQL can only use one index per table, I was unable to GROUP BY and SORT BY together for my particular setup. Instead, I added another query that would use another index on a different table to group the peripherals together. Here what I added in the SELECT statement above:

弄清楚了。我做了我怀疑我必须做的事情,那就是添加一个子查询。由于MYSQL每个表只能使用一个索引,因此我无法将GROUP BY和SORT BY一起用于我的特定设置。相反,我添加了另一个查询,该查询将使用另一个表上的另一个索引将外围设备组合在一起。这是我在上面的SELECT语句中添加的内容:

(SELECT group_concat(DISTINCT p.peripheral) as peripheral
FROM lookup_peripheral lp
LEFT JOIN peripheral p ON p.peripheral_id = lp.peripheral_id
WHERE ld.content_id = lp.content_id
GROUP BY lp.content_id
LIMIT 1) as peripheral

I used a LEFT OUTER since some entries do not have any peripherals. Total query time is now .02s on a 400MHz processor with 128MB of 100Hz RAM for a 40k row database for most of the tables.

我使用LEFT OUTER,因为有些条目没有任何外围设备。对于大多数表,对于40k行数据库,400MHz处理器上的总查询时间现在为.02s,具有128MB的100Hz RAM。

EXPLAIN now gives me a USING INDEX for the lookup_difficulty table. I added this to achieve that:

EXPLAIN现在为lookup_difficulty表提供了一个USING INDEX。我添加了这个以实现这一点:

ALTER TABLE pictuts.lookup_difficulty DROP PRIMARY KEY ,
ADD PRIMARY KEY ( difficulty_id , content_id ) 

Edit 2 I noticed that with large offsets by using pagination, the page will load considerably slower. You may have experienced this with other sites as well. Fortunately, there is a way to avoid this as pointed out by Peter Zaitsev. Here is my updated snippet to achieve the same timings for offsets of 30K or 0:

编辑2我注意到,通过使用分页使用较大的偏移量,页面加载速度会相当慢。您可能也体验过其他网站。幸运的是,Peter Zaitsev指出,有一种方法可以避免这种情况。这是我更新的片段,以实现30K或0的偏移量相同的时间:

FROM 
SELECT ld.content_id, ld.difficulty_id
FROM lookup_difficulty ld
LIMIT '.$per_page.' OFFSET '.$offset.' 
) ld

Now just add ld.whatever to every JOIN made and there you have it! My query look like a total mess now, but at least it is optimized. I don't think anyone will make it this far in reading this...

现在只需添加ld.w到每个JOIN,你就拥有它!我的查询现在看起来像一团糟,但至少它已经过优化。我不认为有人会在阅读这篇文章时做到这一点......