使用子选择和值列表的子句中的MySQL

时间:2022-11-12 03:59:19

I have two levels of filtering I need to do on some related data. The first query looks something like:

我有两个层次的过滤,我需要做一些相关的数据。第一个查询如下:

SELECT t1.fk_id 
FROM t1 
LEFT JOIN t3 ON t3.fk_id = t1.fk_id
WHERE t1.field1 > 10 AND t3.field2 = Y

The second query runs against another table with the same fk_id field, and looks something like

第二个查询针对另一个具有相同fk_id字段的表运行,看起来类似于

SELECT t2.fk_id, SUM(t2.field3) AS sum_3, SUM(t2.field_4) AS sum_4 
FROM t2 
WHERE fk_id IN (fk_values_from_query_1)
GROUP BY t2.fk_id
HAVING sum_3 > 1000

Now, I can run this 2 different ways, from what I can tell - though I'm not tied to either method, any open to other methods as well. I could either embed the first query into the second query as a SUB-SELECT, which I understand to be really bad from a performance perspective. Or, I could extract the values from the results of query 1, and embed them as a list in query 2 (in my application code).

现在,我可以用两种不同的方式来运行,根据我所知道的-尽管我不受任何一种方法的约束,也不受任何其他方法的约束。我可以将第一个查询作为子选择嵌入到第二个查询中,从性能的角度来看,我认为这非常糟糕。或者,我可以从查询1的结果中提取值,并将它们嵌入到查询2中的列表(在我的应用程序代码中)。

The two parts to this question are:

这个问题的两部分是:

  1. Is there any difference, performance wise, between the 2 query structures described above?
  2. 在性能上,上述两个查询结构之间有什么区别吗?
  3. Is there a better way to structure these 2 queries?
  4. 有更好的方法来组织这两个查询吗?

Benchmarks

基准

I didn't fully test this, but ran my version, and the version posted by Barmar, against my data. My query was running in approximately 4.23 seconds, while Barmar's version took only 0.60 seconds to run. That's an 85% improvement!

我并没有对它进行全面的测试,但是运行了我的版本,以及Barmar发布的版本。我的查询大约运行了4.23秒,而Barmar的版本只运行了0.60秒。这是一个提高85% !

1 个解决方案

#1


3  

You should combine them using a JOIN:

您应该使用连接将它们合并:

SELECT t2.fk_id, SUM(t2.field3) AS sum_3, SUM(t2.field_4) AS sum_4 
FROM t2
JOIN (SELECT distinct t1.fk_id
      FROM t1
      JOIN t2 ON t3.fk_id = t1.fk_id
      WHERE t1.field1 > 10 AND t3.field2 = 'Y') t4
ON t2.fk_id = t4.fk_id
GROUP BY t2.fk_id
HAVING sum_3 > 1000

I've consistently found that MySQL performs horribly on WHERE col IN (subquery) queries, compared to the analogous join. I haven't compared it with queries where I substitute the values from the subquery, because I've only done that when it wasn't possible to do it in a single query (e.g. I need to match data on different servers).

我一直发现,与类似的连接相比,MySQL在(子查询)查询中的位置上执行得非常糟糕。我还没有将它与替换子查询中的值的查询进行比较,因为我只在不可能在单个查询中完成(例如,我需要在不同的服务器上匹配数据)时进行比较。

BTW, there's no point in using a LEFT JOIN if you're also filtering on values in the table being joined with.

顺便说一句,如果还在筛选要连接的表中的值,那么使用左连接是没有意义的。

In all cases, make sure that you have indexes on the keys used in the join or IN clause.

在所有情况下,请确保在join或In子句中使用的键上有索引。

#1


3  

You should combine them using a JOIN:

您应该使用连接将它们合并:

SELECT t2.fk_id, SUM(t2.field3) AS sum_3, SUM(t2.field_4) AS sum_4 
FROM t2
JOIN (SELECT distinct t1.fk_id
      FROM t1
      JOIN t2 ON t3.fk_id = t1.fk_id
      WHERE t1.field1 > 10 AND t3.field2 = 'Y') t4
ON t2.fk_id = t4.fk_id
GROUP BY t2.fk_id
HAVING sum_3 > 1000

I've consistently found that MySQL performs horribly on WHERE col IN (subquery) queries, compared to the analogous join. I haven't compared it with queries where I substitute the values from the subquery, because I've only done that when it wasn't possible to do it in a single query (e.g. I need to match data on different servers).

我一直发现,与类似的连接相比,MySQL在(子查询)查询中的位置上执行得非常糟糕。我还没有将它与替换子查询中的值的查询进行比较,因为我只在不可能在单个查询中完成(例如,我需要在不同的服务器上匹配数据)时进行比较。

BTW, there's no point in using a LEFT JOIN if you're also filtering on values in the table being joined with.

顺便说一句,如果还在筛选要连接的表中的值,那么使用左连接是没有意义的。

In all cases, make sure that you have indexes on the keys used in the join or IN clause.

在所有情况下,请确保在join或In子句中使用的键上有索引。