如何简化/改进此MySQL查询的性能?

时间:2022-06-19 06:23:55

I am very new to MySQL and thanks to the great support from you more experienced guys here I am managing to struggle by, while learning a lot in the process.

我是MySQL的新手,感谢你们在这里有更多经验丰富的人的大力支持,我正在努力奋斗,同时在这个过程中学到很多东西。

I have a query that does exactly what I want. However, it looks extremely messy to me and I am certain there must be a way to simplify it.

我有一个查询,它完全符合我的要求。但是,它对我来说看起来非常混乱,我确信必须有一种方法来简化它。

How can this query be improved and optimized for performance?

如何针对性能改进和优化此查询?

Many thanks

            $sQuery = "
        SELECT SQL_CALC_FOUND_ROWS ".str_replace(" , ", " ", implode(", ", $aColumns))."

    FROM $sTable b 
    LEFT JOIN (
   SELECT COUNT(*) AS projects_count, a.songs_id

   FROM $sTable2 a
   GROUP BY a.songs_id
) bb ON bb.songs_id = b.songsID

LEFT JOIN (
   SELECT AVG(rating) AS rating, COUNT(rating) AS ratings_count, c.songid

FROM $sTable3 c

   GROUP BY c.songid   
) bbb ON bbb.songid = b.songsID

LEFT JOIN (
   SELECT c.songid, c.userid,

    CASE WHEN EXISTS 
   ( 
       SELECT songid 
       FROM $sTable3
       WHERE songid = c.songid 
   ) Then 'User Voted'
   else
   (
       'Not Voted'
   )
   end
   AS voted
FROM $sTable3 c
WHERE c.userid = $userid


   GROUP BY c.songid   
) bbbb ON bbbb.songid = b.songsID

EDIT: Here is a description of what the query is doing:-

编辑:这是查询正在做什么的描述: -

I have three tables:

我有三张桌子:

  • $sTable = a table of songs (songid, mp3link, artwork, useruploadid etc.)

    $ sTable =歌曲表(songid,mp3link,artwork,useruploadid等)

  • $sTable2 = a table of projects with songs linked to them (projectid, songid, project name etc.)

    $ sTable2 =与其相关联的歌曲的项目表(projectid,songid,项目名称等)

  • $sTable3 = a table of song ratings (songid, userid, rating)

    $ sTable3 =歌曲收视率表(songid,userid,rating)

All of this data is output to a JSON array and displayed in a table in my application to provide a list of songs, combined with the projects and ratings data.

所有这些数据都输出到JSON数组并显示在我的应用程序的表中,以提供歌曲列表,结合项目和评级数据。

The query itself does the following in this order:-

查询本身按以下顺序执行以下操作: -

  1. Collects all rows from $sTable
  2. 从$ sTable收集所有行

  3. Joins to $sTable2 on songsID and counts the number of rows (projects) in this table which have the same songsID
  4. 在songsID上连接到$ sTable2并计算此表中具有相同songsID的行数(项目)

  5. Joins to $stable3 on songsID and works out an average of the column 'rating' in this table which have the same songsID
  6. 在songsID上加入$ stable3并计算出此表中具有相同songsID的列'rating'的平均值

  7. At this point it also counts the total number of rows in $sTable3 which have the same songID to provide a total number of votes.
  8. 此时,它还计算$ sTable3中具有相同songID的总行数,以提供总投票数。

  9. Finally it performs a check on all these rows to see if the $userid (which is a variable containing the ID of the logged in user) matches the 'userid' stores in $sTable3 for each row in order to check whether a user has already voted on a given songID or not. If it matches then it returns "User Voted" if not it returns "Not Voted". It outputs this as a seperate column into my JSON array which I then check against clientside in my app and add a class to.
  10. 最后,它检查所有这些行,以查看$ userid(包含登录用户ID的变量)是否与$ sTable3中每行的'userid'存储匹配,以便检查用户是否已经是否在给定的歌曲ID上投票。如果匹配则返回“User Voted”,否则返回“Not Voted”。它将此作为单独的列输出到我的JSON数组中,然后我在我的应用程序中检查客户端并添加一个类。

If there is any more detail anyone needs, please just let me know. Thanks all.

如果有任何需要的细节,请告诉我。谢谢大家。

EDIT:

Thanks to Aurimis' excellent first attempt I am closing in on a much more simple solution.

感谢Aurimis出色的首次尝试,我将采用更为简单的解决方案。

This is the code I have tried based on that suggestion.

这是我根据该建议尝试的代码。

SELECT SQL_CALC_FOUND_ROWS ".str_replace(" , ", " ", implode(", ", $aColumns))."

    FROM 
      (SELECT 
        $sTable.songsID, COUNT(rating) AS ratings_count, 
        AVG(rating) AS ratings
      FROM $sTable 
        LEFT JOIN $sTable2 ON $sTable.songsID = $sTable2.songs_id
        LEFT JOIN $sTable3 ON $sTable.songsID = $sTable3.songid
      GROUP BY $sTable.songsID) AS A
    LEFT JOIN $sTable3 AS B ON A.songsID = B.songid AND B.userid = $userid

There are several problems however. I had to remove the first line of your answer as it caused a 500 internal server error:

但是有几个问题。我不得不删除你的答案的第一行,因为它导致500内部服务器错误:

IF(B.userid = NULL, "Not voted", "User Voted") AS voted 

Obviously now the 'voted check' functionality is lost.

显然现在“投票检查”功能已丢失。

Also and more importantly it is not returning all the columns defined in my array, only the songsID. My JSON returns Unknown column 'song_name' in 'field list' - If I remov it from my $aColumns array it will of course move on to the next one.

此外,更重要的是,它不返回我的数组中定义的所有列,只返回songsID。我的JSON在'字段列表'中返回未知列'song_name' - 如果我从$ aColumns数组中删除它,它当然会转到下一个。

I am defining my columns at the beginning of my script as this array is used for filtering and putting together the output for the JSON encode. This is the definition of $aColumns:-

我在我的脚本开头定义我的列,因为这个数组用于过滤和组合JSON编码的输出。这是$ aColumns的定义: -

$aColumns = array( 'songsID', 'song_name', 'artist_band_name', 'author', 'song_artwork', 'song_file', 'genre', 'song_description', 'uploaded_time', 'emotion', 'tempo', 'user', 'happiness', 'instruments', 'similar_artists', 'play_count', 'projects_count',  'rating', 'ratings_count', 'voted');

In order to quickly test the rest of the query I modified the first line within the subquery to select $sTable.* rather than $sTable.songsID (remember $sTable is the songs table)

为了快速测试查询的其余部分,我修改了子查询中的第一行,选择$ sTable。*而不是$ sTable.songsID(记住$ sTable是歌曲表)

Then... The query obviously worked, but with terrible performance of course. But only returned 24 songs out of the 5000 song test dataset. Therefore I changed your first 'JOIN' to a 'LEFT JOIN' so that all 5000 songs were returned. To clarify the query needs to return ALL of the rows in the songs table but with various extra bits of data from the projects and ratings tables for each song.

然后......查询显然有效,但当然表现糟糕。但只返回了5000首歌曲测试数据集中的24首歌曲。因此,我将您的第一个'JOIN'更改为'LEFT JOIN',以便返回所有5000首歌曲。为了澄清查询需要返回歌曲表中的所有行,但是每个歌曲的项目和评级表中都有各种额外的数据。

So... We are getting there and I am certain that this is a much better approach it just needs some modification. Thanks for your help so far Aurimis.

所以...我们到了那里,我确信这是一个更好的方法,只需要一些修改。感谢你对Aurimis的帮助。

2 个解决方案

#1


3  

SELECT SQL_CALC_FOUND_ROWS
    songsID, song_name, artist_band_name, author, song_artwork, song_file,
    genre, song_description, uploaded_time, emotion, tempo,
    `user`, happiness, instruments, similar_artists, play_count,
    projects_count,
    rating, ratings_count,
    IF(user_ratings_count, 'User Voted', 'Not Voted') as voted
FROM (
    SELECT
        sp.songsID, projects_count,
        AVG(rating) as rating,
        COUNT(rating) AS ratings_count,
        COUNT(IF(userid=$userid, 1, NULL)) as user_ratings_count
    FROM (
        SELECT songsID, COUNT(*) as projects_count
        FROM $sTable s
        LEFT JOIN $sTable2 p ON s.songsID = p.songs_id
        GROUP BY songsID) as sp
    LEFT JOIN $sTable3 r ON sp.songsID = r.songid
    GROUP BY sp.songsID) as spr
JOIN $sTable s USING (songsID);

You will need the following indexes:

您将需要以下索引:

  • (songs_id) on $sTable2
  • $ sTable2上的(songs_id)

  • the composite (songid, rating, userid) on $sTable3
  • $ sTable3上的复合(songid,rating,userid)

the ideas behind the query:

查询背后的想法:

  • subqueries operate with INTs so that the result of the subquery would easily fit in memory
  • 子查询以INT运行,因此子查询的结果很容易适合内存

  • left joins are grouped separately to reduce the cartesian product
  • 左连接分别分组以减少笛卡尔积

  • user votes are counted in the same subquery as other ratings to avoid expensive correlated subquery
  • 用户投票在与其他评级相同的子查询中计算,以避免昂贵的相关子查询

  • all othe information is retrieved ib the final join
  • 在最终加入时检索所有其他信息

#2


1  

Let me try based on your description, not the query. I'll just use Songs to indicate Table1, Projects to indicate Table2 and Ratings to indicate Table3 - for clarity.

让我尝试根据您的描述,而不是查询。为了清楚起见,我将仅使用Songs来指示Table1,Projects指示Table2和Ratings以指示Table3。

SELECT 
  /* [column list again] */,
  IF(B.userid = NULL, "Not voted", "Voted") as voted 
FROM 
  (SELECT 
    Songs.SongID, count(rating) as total_votes, 
    avg(rating) as average_rating /*[,.. other columns as you need them] */
  FROM Songs 
    JOIN Projects ON Songs.SongID = Projects.SongID
    LEFT JOIN Ratings ON Songs.SongID = Ratings.SongID
  GROUP BY Songs.SongID) as A
LEFT JOIN Ratings as B ON A.SongID = B.SongID AND B.userid = ? /* your user id */

As you see, you can get all the information on songs in one, relatively simple query (just using Group by and count() / avg() functions). To get the information whether a song was rated by a particular user requires a subquery - where you can do a LEFT JOIN, and if the userid is empty - you know he has not voted.

如您所见,您可以在一个相对简单的查询中获取有关歌曲的所有信息(仅使用Group by和count()/ avg()函数)。要获取特定用户是否对某首歌进行评级的信息,需要子查询 - 您可以在其中执行LEFT JOIN,如果用户标识为空 - 您知道他没有投票。

Now, I did not go through your query in depth, as it really looks complicated. Could be that I missed something - if that's the case, please update the description and I can try again :)

现在,我没有仔细检查你的查询,因为它看起来很复杂。可能是我错过了一些东西 - 如果是这样的话,请更新说明,我可以再试一次:)

#1


3  

SELECT SQL_CALC_FOUND_ROWS
    songsID, song_name, artist_band_name, author, song_artwork, song_file,
    genre, song_description, uploaded_time, emotion, tempo,
    `user`, happiness, instruments, similar_artists, play_count,
    projects_count,
    rating, ratings_count,
    IF(user_ratings_count, 'User Voted', 'Not Voted') as voted
FROM (
    SELECT
        sp.songsID, projects_count,
        AVG(rating) as rating,
        COUNT(rating) AS ratings_count,
        COUNT(IF(userid=$userid, 1, NULL)) as user_ratings_count
    FROM (
        SELECT songsID, COUNT(*) as projects_count
        FROM $sTable s
        LEFT JOIN $sTable2 p ON s.songsID = p.songs_id
        GROUP BY songsID) as sp
    LEFT JOIN $sTable3 r ON sp.songsID = r.songid
    GROUP BY sp.songsID) as spr
JOIN $sTable s USING (songsID);

You will need the following indexes:

您将需要以下索引:

  • (songs_id) on $sTable2
  • $ sTable2上的(songs_id)

  • the composite (songid, rating, userid) on $sTable3
  • $ sTable3上的复合(songid,rating,userid)

the ideas behind the query:

查询背后的想法:

  • subqueries operate with INTs so that the result of the subquery would easily fit in memory
  • 子查询以INT运行,因此子查询的结果很容易适合内存

  • left joins are grouped separately to reduce the cartesian product
  • 左连接分别分组以减少笛卡尔积

  • user votes are counted in the same subquery as other ratings to avoid expensive correlated subquery
  • 用户投票在与其他评级相同的子查询中计算,以避免昂贵的相关子查询

  • all othe information is retrieved ib the final join
  • 在最终加入时检索所有其他信息

#2


1  

Let me try based on your description, not the query. I'll just use Songs to indicate Table1, Projects to indicate Table2 and Ratings to indicate Table3 - for clarity.

让我尝试根据您的描述,而不是查询。为了清楚起见,我将仅使用Songs来指示Table1,Projects指示Table2和Ratings以指示Table3。

SELECT 
  /* [column list again] */,
  IF(B.userid = NULL, "Not voted", "Voted") as voted 
FROM 
  (SELECT 
    Songs.SongID, count(rating) as total_votes, 
    avg(rating) as average_rating /*[,.. other columns as you need them] */
  FROM Songs 
    JOIN Projects ON Songs.SongID = Projects.SongID
    LEFT JOIN Ratings ON Songs.SongID = Ratings.SongID
  GROUP BY Songs.SongID) as A
LEFT JOIN Ratings as B ON A.SongID = B.SongID AND B.userid = ? /* your user id */

As you see, you can get all the information on songs in one, relatively simple query (just using Group by and count() / avg() functions). To get the information whether a song was rated by a particular user requires a subquery - where you can do a LEFT JOIN, and if the userid is empty - you know he has not voted.

如您所见,您可以在一个相对简单的查询中获取有关歌曲的所有信息(仅使用Group by和count()/ avg()函数)。要获取特定用户是否对某首歌进行评级的信息,需要子查询 - 您可以在其中执行LEFT JOIN,如果用户标识为空 - 您知道他没有投票。

Now, I did not go through your query in depth, as it really looks complicated. Could be that I missed something - if that's the case, please update the description and I can try again :)

现在,我没有仔细检查你的查询,因为它看起来很复杂。可能是我错过了一些东西 - 如果是这样的话,请更新说明,我可以再试一次:)