MySQL左连接,组By, Order By, Limit =糟糕的性能

时间:2022-06-13 04:03:59

I am currently developing a an application to allow users to search through a database of documents using various paramaters and returning a set of paged results. I am building it in PHP/MySQL, which is not my usual development platform, but its been grand so far.

我目前正在开发一个应用程序,允许用户通过使用各种参数的文档数据库进行搜索,并返回一组分页的结果。我在PHP/MySQL中构建它,这不是我通常的开发平台,但是到目前为止它是很伟大的。

The problem I am having is that in order to return a full set of results I have to use LEFT JOIN on every table, which completely destroys my performance. The person who developed the database has said that the query I am using will return the correct results, so thats what I have to use. The query is below, I am by no means an SQL Guru and could use some help on this.

我遇到的问题是,为了返回完整的结果集,我必须在每个表上使用左连接,这完全破坏了我的性能。开发这个数据库的人说,我使用的查询将返回正确的结果,所以这就是我要使用的。查询在下面,我绝不是一个SQL专家,可以在这方面使用一些帮助。

I have been thinking that it might be better to split the query into sub-queries? Below is my current query:

我一直在想把查询拆分为子查询会更好吗?以下是我当前的查询:

    SELECT d.title, d.deposition_id, d.folio_start, d.folio_end, pl.place_id, p.surname, p.forename, p.person_type_id, pt.person_type_desc, p.age, d.manuscript_number, dt.day, dt.month, dt.year, plc.county_id, c.county_desc
 FROM deposition d 
 LEFT JOIN person AS p ON p.deposition_id = d.deposition_id 
 LEFT JOIN person_type AS pt ON p.person_type_id = pt.person_type_id 
 LEFT JOIN place_link AS pl ON pl.deposition_id = d.deposition_id 
 LEFT JOIN date AS dt ON dt.deposition_id = d.deposition_id 
 LEFT JOIN place AS plc ON pl.place_id = plc.place_id 
 LEFT JOIN county AS c ON plc.county_id = c.county_id
 WHERE 1 AND d.manuscript_number = '840' 
 GROUP BY d.deposition_id ORDER BY d.folio_start ASC
 LIMIT 0, 20

Any help or guidance would be greatly appreciated!

如有任何帮助或指导,我们将不胜感激!

Deposition Table:

淀积表:

CREATE TABLE IF NOT EXISTS `deposition` (
  `deposition_id` varchar(11) NOT NULL default '',
  `manuscript_number` int(10) NOT NULL default '0',
  `folio_start` varchar(4) NOT NULL default '0',
  `folio_end` varchar(4) default '0',
  `page` int(4) default NULL,
  `deposition_type_id` int(10) NOT NULL default '0',
  `comments` varchar(255) default '',
  `title` varchar(255) default NULL,
  PRIMARY KEY  (`deposition_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Date Table

日期表

CREATE TABLE IF NOT EXISTS `date` (
  `deposition_id` varchar(11) NOT NULL default '',
  `day` int(2) default NULL,
  `month` int(2) default NULL,
  `year` int(4) default NULL,
  PRIMARY KEY  (`deposition_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Person_Type

Person_Type

CREATE TABLE IF NOT EXISTS `person_type` (
  `person_type_id` int(10) NOT NULL auto_increment,
  `person_type_desc` varchar(255) NOT NULL default '',
  PRIMARY KEY  (`person_type_id`)
) ENGINE=InnoDB  DEFAULT CHARSET=latin1 AUTO_INCREMENT=59 ;

3 个解决方案

#1


1  

The poor performance is almost certainly from lack of indexes. Your deposition table doesn't have any indexes, and that probably means the other tables you're referencing don't have any either. You can start by adding an index to your deposition table. From the MySQL shell, or phpMyAdmin, issue the following query.

糟糕的表现几乎肯定是由于缺乏索引。您的沉积表没有任何索引,这可能意味着您正在引用的其他表也没有索引。您可以从向沉积表添加索引开始。从MySQL shell或phmypadmin发出以下查询。

ALTER TABLE deposition ADD INDEX(deposition_id, manuscript_number);

修改表淀积添加索引(deposition_id、折叠cript_number);

You know you're on the right track if the query executes faster after adding the index. From there you might want to put indexes on the other tables on the referenced columns. For instance for this part of your query "LEFT JOIN person AS p ON p.deposition_id = d.deposition_id", you could try adding an index to the person table using.

如果在添加索引之后查询执行得更快,那么您就知道您走对了路。从这里开始,您可能希望将索引放在引用列的其他表上。例如,对于查询的这一部分“将JOIN person作为p保存在p.deposition_id = d.deposition_id”中”,您可以使用它向person表添加一个索引。

ALTER TABLE person ADD INDEX(deposition_id);

修改表人添加索引(deposition_id);

#2


2  

Seems that you want to select one person, place etc. per deposition.

似乎你要选择一个人,地点等每一份证词。

The query you wrote will return you this, but it's not guaranteed which one will it return, and the query is inefficient.

您所写的查询将会返回这个,但是它不能保证它会返回哪一个,而且查询效率很低。

Try this:

试试这个:

SELECT  d.title, d.deposition_id, d.folio_start, d.folio_end, pl.place_id, p.surname, p.forename, p.person_type_id, pt.person_type_desc, p.age, d.manuscript_number, dt.day, dt.month, dt.year, plc.county_id, c.county_desc
FROM    deposition d
LEFT JOIN
        person p
ON      p.id = 
        (
        SELECT  id
        FROM    person pi
        WHERE   pi.deposition_id = d.deposition_id
        ORDER BY
                pi.deposition_id, pi.id
        LIMIT 1
        )
LEFT JOIN
        place_link AS pl
ON      pl.id = 
        (
        SELECT  id
        FROM    place_link AS pli
        WHERE   pli.deposition_id = d.deposition_id
        ORDER BY
                pli.deposition_id, pi.id
        LIMIT 1
        )
LEFT JOIN
        date AS dt
ON      dt.id = 
        (
        SELECT  id
        FROM    date AS dti
        WHERE   dti.deposition_id = d.deposition_id
        ORDER BY
                dti.deposition_id, pi.id
        LIMIT 1
        )
LEFT JOIN
        place AS plc
ON      plc.place_id = pl.place_id 
LEFT JOIN
        county AS c
ON      c.county_id = plc.county_id
WHERE   d.manuscript_number = '840' 
ORDER BY
        d.manuscript_number, d.folio_start
LIMIT   20

Create an index on deposition (manuscript_number, folio_start) for this to work fast

创建一个关于沉积的索引(稿件编号,folio_start)以使其快速工作

Also create a composite index on (deposition_id, id) on person, place_link and date.

还可以在person、place_link和date上创建一个复合索引(deposition_id、id)。

#3


1  

You only need a LEFT JOIN if the joined table might not have a matching value. Is it possible in your database schema for a person to not have a matching person_type? Or deposition to not have a matching row in date? A place not have a matching county?

如果联接表可能没有匹配值,则只需要左连接。在您的数据库模式中,一个人是否可能没有匹配的person_type?还是在日期上没有匹配行?一个没有配套县的地方?

For any of those relationships that must exist for the result to make sense you can change the LEFT JOIN to an INNER JOIN.

对于任何必须存在的关系,您都可以将左连接改为内连接。

These columns should have indexes (unique if possible):

这些列应该有索引(可能是唯一的):

person.deposition_id
date.deposition_id
place_link.deposition_id
place_link.place_id

The date table looks like a bad design; I can't think of a reason to have a table of dates instead of just putting a column of type date (or datetime) in the deposition table. And date is a terrible name for a table because it's a SQL reserved word.

日期表看起来是一个糟糕的设计;我想不出为什么要有一个日期表,而不只是将类型为date(或datetime)的列放在沉积表中。日期对于表来说是一个糟糕的名字,因为它是SQL保留词。

#1


1  

The poor performance is almost certainly from lack of indexes. Your deposition table doesn't have any indexes, and that probably means the other tables you're referencing don't have any either. You can start by adding an index to your deposition table. From the MySQL shell, or phpMyAdmin, issue the following query.

糟糕的表现几乎肯定是由于缺乏索引。您的沉积表没有任何索引,这可能意味着您正在引用的其他表也没有索引。您可以从向沉积表添加索引开始。从MySQL shell或phmypadmin发出以下查询。

ALTER TABLE deposition ADD INDEX(deposition_id, manuscript_number);

修改表淀积添加索引(deposition_id、折叠cript_number);

You know you're on the right track if the query executes faster after adding the index. From there you might want to put indexes on the other tables on the referenced columns. For instance for this part of your query "LEFT JOIN person AS p ON p.deposition_id = d.deposition_id", you could try adding an index to the person table using.

如果在添加索引之后查询执行得更快,那么您就知道您走对了路。从这里开始,您可能希望将索引放在引用列的其他表上。例如,对于查询的这一部分“将JOIN person作为p保存在p.deposition_id = d.deposition_id”中”,您可以使用它向person表添加一个索引。

ALTER TABLE person ADD INDEX(deposition_id);

修改表人添加索引(deposition_id);

#2


2  

Seems that you want to select one person, place etc. per deposition.

似乎你要选择一个人,地点等每一份证词。

The query you wrote will return you this, but it's not guaranteed which one will it return, and the query is inefficient.

您所写的查询将会返回这个,但是它不能保证它会返回哪一个,而且查询效率很低。

Try this:

试试这个:

SELECT  d.title, d.deposition_id, d.folio_start, d.folio_end, pl.place_id, p.surname, p.forename, p.person_type_id, pt.person_type_desc, p.age, d.manuscript_number, dt.day, dt.month, dt.year, plc.county_id, c.county_desc
FROM    deposition d
LEFT JOIN
        person p
ON      p.id = 
        (
        SELECT  id
        FROM    person pi
        WHERE   pi.deposition_id = d.deposition_id
        ORDER BY
                pi.deposition_id, pi.id
        LIMIT 1
        )
LEFT JOIN
        place_link AS pl
ON      pl.id = 
        (
        SELECT  id
        FROM    place_link AS pli
        WHERE   pli.deposition_id = d.deposition_id
        ORDER BY
                pli.deposition_id, pi.id
        LIMIT 1
        )
LEFT JOIN
        date AS dt
ON      dt.id = 
        (
        SELECT  id
        FROM    date AS dti
        WHERE   dti.deposition_id = d.deposition_id
        ORDER BY
                dti.deposition_id, pi.id
        LIMIT 1
        )
LEFT JOIN
        place AS plc
ON      plc.place_id = pl.place_id 
LEFT JOIN
        county AS c
ON      c.county_id = plc.county_id
WHERE   d.manuscript_number = '840' 
ORDER BY
        d.manuscript_number, d.folio_start
LIMIT   20

Create an index on deposition (manuscript_number, folio_start) for this to work fast

创建一个关于沉积的索引(稿件编号,folio_start)以使其快速工作

Also create a composite index on (deposition_id, id) on person, place_link and date.

还可以在person、place_link和date上创建一个复合索引(deposition_id、id)。

#3


1  

You only need a LEFT JOIN if the joined table might not have a matching value. Is it possible in your database schema for a person to not have a matching person_type? Or deposition to not have a matching row in date? A place not have a matching county?

如果联接表可能没有匹配值,则只需要左连接。在您的数据库模式中,一个人是否可能没有匹配的person_type?还是在日期上没有匹配行?一个没有配套县的地方?

For any of those relationships that must exist for the result to make sense you can change the LEFT JOIN to an INNER JOIN.

对于任何必须存在的关系,您都可以将左连接改为内连接。

These columns should have indexes (unique if possible):

这些列应该有索引(可能是唯一的):

person.deposition_id
date.deposition_id
place_link.deposition_id
place_link.place_id

The date table looks like a bad design; I can't think of a reason to have a table of dates instead of just putting a column of type date (or datetime) in the deposition table. And date is a terrible name for a table because it's a SQL reserved word.

日期表看起来是一个糟糕的设计;我想不出为什么要有一个日期表,而不只是将类型为date(或datetime)的列放在沉积表中。日期对于表来说是一个糟糕的名字,因为它是SQL保留词。