哪一个MySQL查询更有效?

时间:2022-12-02 01:06:54
SELECT *
  FROM openorders_tracking
  WHERE id NOT IN (SELECT tracking_id FROM openorders_commission)

SELECT *
  FROM openorders_tracking
LEFT JOIN openorders_commission
  ON openorders_tracking.id=openorders_commission.tracking_id
WHERE openorders_commission.id IS NULL

I'm wondering both specifically for this query, and in general if there is a significant difference.

对于这个查询,我想知道两者是否有显著的区别。

I don't have any data yet, nor indexes yet. I guess it's more in general if "NOT IN" is better than a JOIN or vice-versa.

我还没有任何数据,也没有索引。我想一般来说,如果“不在”比“加入”好,反之亦然。

For those who like EXPLAIN, here is the results for now (again, no indexes yet besides the primaries):

对于那些喜欢解释的人,这里是现在的结果(同样,除了初选之外还没有索引):

mysql> explain SELECT * FROM openorders_tracking WHERE id NOT IN (SELECT trackin
g_id FROM openorders_commission);
+----+--------------------+-----------------------+--------+---------------+----
--+---------+------+------+---------------------+
| id | select_type        | table                 | type   | possible_keys | key
  | key_len | ref  | rows | Extra               |
+----+--------------------+-----------------------+--------+---------------+----
--+---------+------+------+---------------------+
|  1 | PRIMARY            | openorders_tracking   | ALL    | NULL          | NUL
L | NULL    | NULL |  341 | Using where         |
|  2 | DEPENDENT SUBQUERY | openorders_commission | system | NULL          | NUL
L | NULL    | NULL |    0 | const row not found |
+----+--------------------+-----------------------+--------+---------------+----
--+---------+------+------+---------------------+
2 rows in set (0.00 sec)

mysql> explain SELECT * FROM openorders_tracking LEFT JOIN openorders_commission
 ON openorders_tracking.id=openorders_commission.tracking_id WHERE openorders_co
mmission.id IS NULL;
+----+-------------+-----------------------+--------+---------------+------+----
-----+------+------+---------------------+
| id | select_type | table                 | type   | possible_keys | key  | key
_len | ref  | rows | Extra               |
+----+-------------+-----------------------+--------+---------------+------+----
-----+------+------+---------------------+
|  1 | SIMPLE      | openorders_commission | system | PRIMARY       | NULL | NUL
L    | NULL |    0 | const row not found |
|  1 | SIMPLE      | openorders_tracking   | ALL    | NULL          | NULL | NUL
L    | NULL |  341 |                     |
+----+-------------+-----------------------+--------+---------------+------+----
-----+------+------+---------------------+
2 rows in set (0.00 sec)

6 个解决方案

#1


1  

Attacking this from a purely software development approach I would say that this is premature optimization, what you should be striving for is readability. As to which query is more readable would be the call of you and your team. Whilst this doesn't answer the question, which I think should be answered (not by me but someone more qualified like a DBA), you should always think about what you are going to gain by optimizing.

从纯粹的软件开发方法攻击这一点,我认为这是不成熟的优化,您应该争取的是可读性。至于哪种查询更容易读懂,将由您和您的团队决定。虽然这并没有回答我认为应该回答的问题(不是我,而是像DBA这样更有资格的人),但是您应该始终考虑优化将获得什么。

Taken from Wikipedia (Program optimization)

取自Wikipedia(程序优化)

When to optimize

Optimization can reduce readability and add code that is used only to improve the performance. This may complicate programs or systems, making them harder to maintain and debug. As a result, optimization or performance tuning is often performed at the end of the development stage.

优化可以降低可读性,并添加只用于提高性能的代码。这可能会使程序或系统复杂化,使它们更难维护和调试。因此,优化或性能调优常常在开发阶段的末尾执行。

Donald Knuth made the following two statements on optimization:

Donald Knuth作出如下关于优化的声明:

"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil"

“我们应该忘记小的效率,大约97%的时候:过早的优化是所有邪恶的根源”

(He also attributed the quote to Tony Hoare several years later, although this might have been an error as Hoare disclaims having coined the phrase.)

(他还将这句话归因于几年后的托尼•霍尔(Tony Hoare),尽管这可能是个错误,因为霍尔否认创造了这个短语。)

"In established engineering disciplines a 12 % improvement, easily obtained, is never considered marginal and I believe the same viewpoint should prevail in software engineering"

“在已建立的工程学科中,12%的改进是容易获得的,从来都不被认为是微不足道的,我认为同样的观点应该在软件工程中流行。”

"Premature optimization" is a phrase used to describe a situation where a programmer lets performance considerations affect the design of a piece of code. This can result in a design that is not as clean as it could have been or code that is incorrect, because the code is complicated by the optimization and the programmer is distracted by optimizing.

“过早优化”是用来描述程序员让性能考虑影响代码设计的情况。这可能导致设计不像以前那样干净,或者代码不正确,因为代码被优化复杂化了,而程序员被优化的注意力分散了。

An alternative approach is to design first, code from the design and then profile/benchmark the resulting code to see which parts should be optimized. A simple and elegant design is often easier to optimize at this stage, and profiling may reveal unexpected performance problems that would not have been addressed by premature optimization.

另一种方法是先设计,从设计中编写代码,然后对生成的代码进行概要分析/基准测试,以查看应该对哪些部分进行优化。在这个阶段,简单而优雅的设计通常更容易进行优化,而概要分析可能会显示出意料之外的性能问题,而这些问题不会通过提前优化来解决。

In practice, it is often necessary to keep performance goals in mind when first designing software, but the programmer balances the goals of design and optimization.

在实践中,在第一次设计软件时经常需要记住性能目标,但是程序员要平衡设计和优化的目标。

#2


3  

It's an odd thing to ask when you have the two queries and the database in front of you. Try running them, and use EXPLAIN to see the execution plans.

当您面前有两个查询和一个数据库时,问这个问题是很奇怪的。尝试运行它们,并使用EXPLAIN查看执行计划。

My guess is that MySQL will optimize them both into the same execution plan but that might depending column types and your indexing scheme.

我猜MySQL会把它们都优化成相同的执行计划,但这可能取决于列类型和索引方案。

#3


0  

I was told to limit the number of SELECTs per query, so based on that I would say the JOIN is the most efficient.

我被告知要限制每个查询的选择数量,因此基于此,我认为连接是最有效的。

#4


0  

I would use the top query. It's easier to understand, IMHO and its use of "select *" will only select the columns from the table you're after.

我将使用顶部查询。更容易理解的是,IMHO及其使用的“select *”只会从您要查找的表中选择列。

However, it's very possible they will have the same execution plan.

然而,很有可能他们会有相同的执行计划。

#5


0  

SELECT *
  FROM openorders_tracking
LEFT JOIN openorders_commission
  ON openorders_tracking.id=openorders_commission.tracking_id
WHERE openorders_commission.id IS NULL

The above said query will be more efficient. They may produce the same performance this time but once you do proper indexing, this query will help you always. So try to index your tables and test both the queries with worst case scenario (try to insert more and more records and you will see the difference)

上述查询将更有效。它们可能会产生相同的性能,但是一旦你做了正确的索引,这个查询将会一直帮助你。所以试着索引你的表,并测试这两个查询的最差情况(试着插入越来越多的记录,你会看到不同)

#6


0  

I have always understood the NOT EXISTS variant of this type of query to be more efficient normally, as it only checks for the first occurence of the queried value in the lookup table - eg:

我一直认为这种查询的不存在变体通常更有效,因为它只检查查找表中查询值的第一次出现——例如:

SELECT *
  FROM openorders_tracking t
  WHERE NOT EXISTS
  (SELECT NULL FROM openorders_commission c
   WHERE c.tracking_id = t.id)

As ever, check the actual performance of the query to determine which is faster.

与以往一样,检查查询的实际性能,以确定哪个更快。

#1


1  

Attacking this from a purely software development approach I would say that this is premature optimization, what you should be striving for is readability. As to which query is more readable would be the call of you and your team. Whilst this doesn't answer the question, which I think should be answered (not by me but someone more qualified like a DBA), you should always think about what you are going to gain by optimizing.

从纯粹的软件开发方法攻击这一点,我认为这是不成熟的优化,您应该争取的是可读性。至于哪种查询更容易读懂,将由您和您的团队决定。虽然这并没有回答我认为应该回答的问题(不是我,而是像DBA这样更有资格的人),但是您应该始终考虑优化将获得什么。

Taken from Wikipedia (Program optimization)

取自Wikipedia(程序优化)

When to optimize

Optimization can reduce readability and add code that is used only to improve the performance. This may complicate programs or systems, making them harder to maintain and debug. As a result, optimization or performance tuning is often performed at the end of the development stage.

优化可以降低可读性,并添加只用于提高性能的代码。这可能会使程序或系统复杂化,使它们更难维护和调试。因此,优化或性能调优常常在开发阶段的末尾执行。

Donald Knuth made the following two statements on optimization:

Donald Knuth作出如下关于优化的声明:

"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil"

“我们应该忘记小的效率,大约97%的时候:过早的优化是所有邪恶的根源”

(He also attributed the quote to Tony Hoare several years later, although this might have been an error as Hoare disclaims having coined the phrase.)

(他还将这句话归因于几年后的托尼•霍尔(Tony Hoare),尽管这可能是个错误,因为霍尔否认创造了这个短语。)

"In established engineering disciplines a 12 % improvement, easily obtained, is never considered marginal and I believe the same viewpoint should prevail in software engineering"

“在已建立的工程学科中,12%的改进是容易获得的,从来都不被认为是微不足道的,我认为同样的观点应该在软件工程中流行。”

"Premature optimization" is a phrase used to describe a situation where a programmer lets performance considerations affect the design of a piece of code. This can result in a design that is not as clean as it could have been or code that is incorrect, because the code is complicated by the optimization and the programmer is distracted by optimizing.

“过早优化”是用来描述程序员让性能考虑影响代码设计的情况。这可能导致设计不像以前那样干净,或者代码不正确,因为代码被优化复杂化了,而程序员被优化的注意力分散了。

An alternative approach is to design first, code from the design and then profile/benchmark the resulting code to see which parts should be optimized. A simple and elegant design is often easier to optimize at this stage, and profiling may reveal unexpected performance problems that would not have been addressed by premature optimization.

另一种方法是先设计,从设计中编写代码,然后对生成的代码进行概要分析/基准测试,以查看应该对哪些部分进行优化。在这个阶段,简单而优雅的设计通常更容易进行优化,而概要分析可能会显示出意料之外的性能问题,而这些问题不会通过提前优化来解决。

In practice, it is often necessary to keep performance goals in mind when first designing software, but the programmer balances the goals of design and optimization.

在实践中,在第一次设计软件时经常需要记住性能目标,但是程序员要平衡设计和优化的目标。

#2


3  

It's an odd thing to ask when you have the two queries and the database in front of you. Try running them, and use EXPLAIN to see the execution plans.

当您面前有两个查询和一个数据库时,问这个问题是很奇怪的。尝试运行它们,并使用EXPLAIN查看执行计划。

My guess is that MySQL will optimize them both into the same execution plan but that might depending column types and your indexing scheme.

我猜MySQL会把它们都优化成相同的执行计划,但这可能取决于列类型和索引方案。

#3


0  

I was told to limit the number of SELECTs per query, so based on that I would say the JOIN is the most efficient.

我被告知要限制每个查询的选择数量,因此基于此,我认为连接是最有效的。

#4


0  

I would use the top query. It's easier to understand, IMHO and its use of "select *" will only select the columns from the table you're after.

我将使用顶部查询。更容易理解的是,IMHO及其使用的“select *”只会从您要查找的表中选择列。

However, it's very possible they will have the same execution plan.

然而,很有可能他们会有相同的执行计划。

#5


0  

SELECT *
  FROM openorders_tracking
LEFT JOIN openorders_commission
  ON openorders_tracking.id=openorders_commission.tracking_id
WHERE openorders_commission.id IS NULL

The above said query will be more efficient. They may produce the same performance this time but once you do proper indexing, this query will help you always. So try to index your tables and test both the queries with worst case scenario (try to insert more and more records and you will see the difference)

上述查询将更有效。它们可能会产生相同的性能,但是一旦你做了正确的索引,这个查询将会一直帮助你。所以试着索引你的表,并测试这两个查询的最差情况(试着插入越来越多的记录,你会看到不同)

#6


0  

I have always understood the NOT EXISTS variant of this type of query to be more efficient normally, as it only checks for the first occurence of the queried value in the lookup table - eg:

我一直认为这种查询的不存在变体通常更有效,因为它只检查查找表中查询值的第一次出现——例如:

SELECT *
  FROM openorders_tracking t
  WHERE NOT EXISTS
  (SELECT NULL FROM openorders_commission c
   WHERE c.tracking_id = t.id)

As ever, check the actual performance of the query to determine which is faster.

与以往一样,检查查询的实际性能,以确定哪个更快。