在MySQL中加速GROUP BY类型查询的最快方法是什么?

时间:2022-01-15 05:41:56

I've a table of articles, a table of authors, and a table that maps articles to authors.

我有一个文章表,一个作者表和一个将文章映射到作者的表。

I'm doing the following query to find out the authors with the most articles:

我正在进行以下查询以找出文章最多的作者:

SELECT a.*, count(*) c
FROM articleAuthors aa
LEFT JOIN authors a ON aa.author_id=a.id
GROUP BY (author_name)
ORDER BY c DESC LIMIT 50

However this query takes a whole minute to complete. The database has about 1,000,000 records in articles_to_authors table.

但是,此查询需要一分钟才能完成。该数据库在articles_to_authors表中有大约1,000,000条记录。

How could I speed up this GROUP BY query?

我怎么能加快这个GROUP BY查询?

1 个解决方案

#1


3  

Under an assumption of the articleAuthors table having more than 50 distinct authors, I would pre-query just that component and limit to the 50 records you want. Ensure an index exists on (author_id). Also, ensure your authors table has an index on (id). Change your query to

假设articleAuthors表有超过50个不同的作者,我会预先查询该组件并限制你想要的50个记录。确保(author_id)上存在索引。另外,请确保您的authors表具有(id)索引。将您的查询更改为

select
      a.*,
      JustAuthorIDs.cntPerAuthor
   from
      ( select 
              aa.author_id, 
              count(*) cntPerAuthor
           from
              articleAuthors aa
           group by 
              aa.author_id
           order by
              cntPerAuthor DESC
           limit 50 ) JustAuthorIDs
      JOIN Authors a
         on JustAuthorIDs.author_ID = a.id

The order by count descending in the prequery will pre-flush AND be pre-ordered by largest count first and stop after 50 records. Then, a simple join to the authors table to get the name and whatever else.

在预查询中按降序排列的顺序将预先刷新并按先前最大计数预先排序,并在50条记录后停止。然后,简单地连接到authors表以获取名称和其他任何内容。

I have the group by based on the author_ID instead of the name as what if you have two authors called "bill board"... The actual ID will be distinct between the two of them.

我有基于author_ID而不是名称的组,如果你有两个叫做“bill board”的作者......实际的ID将在两者之间截然不同。

Now, with the above being a query, you will always be required to query through all million records every time. For something like this, it would PROBABLY be better to add a single "AuthoredItems" column in the authors table. Then, via a trigger on the authorArticles table, when an entry gets added or deleted, just update the final count for the one author on the author table. Then, build an index on the "AuthoredItems" column. Then, you can super simplify the query by doing

现在,以上是一个查询,您将始终需要每次查询所有百万条记录。对于类似这样的事情,在authors表中添加单个“AuthoredItems”列可能会更好。然后,通过authorArticles表上的触发器,当添加或删除条目时,只需更新作者表上的一个作者的最终计数。然后,在“AuthoredItems”列上构建索引。然后,您可以通过执行来超级简化查询

select a.*
   from authors a
   order by a.AuthoredItems
   limit 50

#1


3  

Under an assumption of the articleAuthors table having more than 50 distinct authors, I would pre-query just that component and limit to the 50 records you want. Ensure an index exists on (author_id). Also, ensure your authors table has an index on (id). Change your query to

假设articleAuthors表有超过50个不同的作者,我会预先查询该组件并限制你想要的50个记录。确保(author_id)上存在索引。另外,请确保您的authors表具有(id)索引。将您的查询更改为

select
      a.*,
      JustAuthorIDs.cntPerAuthor
   from
      ( select 
              aa.author_id, 
              count(*) cntPerAuthor
           from
              articleAuthors aa
           group by 
              aa.author_id
           order by
              cntPerAuthor DESC
           limit 50 ) JustAuthorIDs
      JOIN Authors a
         on JustAuthorIDs.author_ID = a.id

The order by count descending in the prequery will pre-flush AND be pre-ordered by largest count first and stop after 50 records. Then, a simple join to the authors table to get the name and whatever else.

在预查询中按降序排列的顺序将预先刷新并按先前最大计数预先排序,并在50条记录后停止。然后,简单地连接到authors表以获取名称和其他任何内容。

I have the group by based on the author_ID instead of the name as what if you have two authors called "bill board"... The actual ID will be distinct between the two of them.

我有基于author_ID而不是名称的组,如果你有两个叫做“bill board”的作者......实际的ID将在两者之间截然不同。

Now, with the above being a query, you will always be required to query through all million records every time. For something like this, it would PROBABLY be better to add a single "AuthoredItems" column in the authors table. Then, via a trigger on the authorArticles table, when an entry gets added or deleted, just update the final count for the one author on the author table. Then, build an index on the "AuthoredItems" column. Then, you can super simplify the query by doing

现在,以上是一个查询,您将始终需要每次查询所有百万条记录。对于类似这样的事情,在authors表中添加单个“AuthoredItems”列可能会更好。然后,通过authorArticles表上的触发器,当添加或删除条目时,只需更新作者表上的一个作者的最终计数。然后,在“AuthoredItems”列上构建索引。然后,您可以通过执行来超级简化查询

select a.*
   from authors a
   order by a.AuthoredItems
   limit 50