MySQL加速左外连接/检查空查询

时间:2022-06-05 01:45:11

The object of my query is to get all rows from table a where gender = f and username does not exist in table b where campid = xxxx. Here is the query I am using with success:

我查询的对象是从表a获取所有行,其中性别= f和用户名不存在于表b,其中campid = xxxx。下面是我成功使用的查询:

SELECT `id` 
FROM pool 
  LEFT JOIN sent 
    ON  pool.username = sent.username 
    AND sent.campid = 'YA1LGfh9' 
WHERE sent.username IS NULL 
  AND pool.gender = 'f'

The problem is that the query takes over 9 minutes to complete, the pool table contains over 10 million rows and the sent table is eventually going to grow even larger than that. I have created indexes for many of the columns including username and gender. However, MySQL refuses to use any of my indexes for this query. I even tried using FORCE INDEX. Here are my indexes from pool and the output of EXPLAIN for my query:

问题是,查询需要9分钟才能完成,池表包含超过1000万行,而发送的表最终会变得更大。我为许多列创建了索引,包括用户名和性别。但是,MySQL拒绝为这个查询使用我的任何索引。我甚至试过使用力指数。下面是我在pool的索引和我查询的EXPLAIN的输出:

+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| pool  |          0 | PRIMARY  |            1 | id          | A         |     9326880 |     NULL | NULL   |      | BTREE      |         |
| pool  |          1 | username |            1 | username    | A         |     9326880 |     NULL | NULL   |      | BTREE      |         |
| pool  |          1 | source   |            1 | source      | A         |           6 |     NULL | NULL   |      | BTREE      |         |
| pool  |          1 | gender   |            1 | gender      | A         |           9 |     NULL | NULL   |      | BTREE      |         |
| pool  |          1 | location |            1 | location    | A         |       59030 |     NULL | NULL   |      | BTREE      |         |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
6 rows in set (0.00 sec)

mysql> explain SELECT `id` FROM pool FORCE INDEX (username) LEFT JOIN sent ON pool.username = sent.username AND sent.campid = 'YA1LGfh9' WHERE sent.username IS NULL AND pool.gender = 'f';
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows    | Extra                   |
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------------------+
|  1 | SIMPLE      | pool  | ALL  | NULL          | NULL | NULL    | NULL | 9326881 | Using where             |
|  1 | SIMPLE      | sent  | ALL  | NULL          | NULL | NULL    | NULL |     351 | Using where; Not exists |
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------------------+
2 rows in set (0.00 sec)

also, here are my indexes for the sent table:

另外,我的发送表索引如下:

+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| sent  |          0 | PRIMARY  |            1 | primary_key | A         |         351 |     NULL | NULL   |      | BTREE      |         |
| sent  |          1 | username |            1 | username    | A         |         351 |     NULL | NULL   |      | BTREE      |         |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
2 rows in set (0.00 sec)

You can see that no indexes are not being used and so my query takes extremely too long. If anyone has a solution that involves reworking the query, please show me an example of how to do it using my data structure so that I won't have any confusion of how to implement and test. Thank you.

您可以看到没有索引没有被使用,因此我的查询花费了非常长的时间。如果有人有涉及重新处理查询的解决方案,请向我展示一个如何使用我的数据结构来执行查询的示例,这样我就不会对如何实现和测试产生任何混淆。谢谢你!

1 个解决方案

#1


4  

First, your original query was correct in your placement of everything... including the camp. By using a LEFT JOIN from Pool to Sent, and then pulling a required equality such as "CAMP" into the WHERE clause as previously suggested is ultimately converting that into an INNER JOIN, thus requiring entry on both sides. Leave it as you had it.

首先,您最初的查询在您放置所有东西时是正确的……包括营。通过从池中使用左连接到send,然后将一个必需的等式(如“CAMP”)拖放到WHERE子句中(如前面所建议的),最终将其转换为内部连接,因此需要在两边都输入。随你的便。

You already have an index on user name on the sent table, but I would do the following.

您已经在发送的表上有关于用户名的索引,但是我将执行以下操作。

build an index on the "sent" table on (CampID, UserName) as a composite (ie: multiple key) index. This way the left join will be optimized for BOTH entries.

在“发送”表(CampID,用户名)上构建一个索引作为复合索引(即:多个键)。这样左连接将为这两个条目进行优化。

On your "pool" table, try a composite index on 3 fields of (gender, username, id ).

在“pool”表上,尝试在3个字段(性别、用户名、id)上使用复合索引。

By doing this, you can take advantage of NOT having to go through all the actual pages of data that encompass your 10+ million records. Since the index HAS the columns for compare, it doesn't have to find the actual record and look at the columns, it can use those of the index directly.

通过这样做,您可以利用不必浏览包含您的10+ 100万条记录的所有实际数据页。由于索引有用于比较的列,所以它不必查找实际记录并查看列,它可以直接使用索引的列。

Also, for grins, I added keyword "STRAIGHT_JOIN" which tells MySQL to query exactly as I show and don't try to think for me. MANY times, I've found this to significantly improve query performance... On very few have I been given feedback that it has NOT helped.

另外,对于咧嘴笑的人,我添加了关键字“STRAIGHT_JOIN”,它告诉MySQL按照显示的方式进行查询,不要试图替我考虑。很多时候,我发现这可以显著提高查询性能……很少有人给我反馈说这没有帮助。

SELECT STRAIGHT_JOIN
      p.id
   FROM 
      pool p
         LEFT JOIN sent s
            ON s.campid = 'YA1LGfh9' 
            AND p.username = s.username 
   WHERE 
          p.gender = 'f'
      AND s.username IS NULL 

All that said, you are still going to be returning how many records out of the 10+ million... if the pool has 10+ million, and the single camp only has 5,000. You will still be returning almost the entire set.

尽管如此,你仍将返回1000多万中有多少条记录……如果游泳池有1000多万,而一个营地只有5000。您将仍然返回几乎整个集合。

#1


4  

First, your original query was correct in your placement of everything... including the camp. By using a LEFT JOIN from Pool to Sent, and then pulling a required equality such as "CAMP" into the WHERE clause as previously suggested is ultimately converting that into an INNER JOIN, thus requiring entry on both sides. Leave it as you had it.

首先,您最初的查询在您放置所有东西时是正确的……包括营。通过从池中使用左连接到send,然后将一个必需的等式(如“CAMP”)拖放到WHERE子句中(如前面所建议的),最终将其转换为内部连接,因此需要在两边都输入。随你的便。

You already have an index on user name on the sent table, but I would do the following.

您已经在发送的表上有关于用户名的索引,但是我将执行以下操作。

build an index on the "sent" table on (CampID, UserName) as a composite (ie: multiple key) index. This way the left join will be optimized for BOTH entries.

在“发送”表(CampID,用户名)上构建一个索引作为复合索引(即:多个键)。这样左连接将为这两个条目进行优化。

On your "pool" table, try a composite index on 3 fields of (gender, username, id ).

在“pool”表上,尝试在3个字段(性别、用户名、id)上使用复合索引。

By doing this, you can take advantage of NOT having to go through all the actual pages of data that encompass your 10+ million records. Since the index HAS the columns for compare, it doesn't have to find the actual record and look at the columns, it can use those of the index directly.

通过这样做,您可以利用不必浏览包含您的10+ 100万条记录的所有实际数据页。由于索引有用于比较的列,所以它不必查找实际记录并查看列,它可以直接使用索引的列。

Also, for grins, I added keyword "STRAIGHT_JOIN" which tells MySQL to query exactly as I show and don't try to think for me. MANY times, I've found this to significantly improve query performance... On very few have I been given feedback that it has NOT helped.

另外,对于咧嘴笑的人,我添加了关键字“STRAIGHT_JOIN”,它告诉MySQL按照显示的方式进行查询,不要试图替我考虑。很多时候,我发现这可以显著提高查询性能……很少有人给我反馈说这没有帮助。

SELECT STRAIGHT_JOIN
      p.id
   FROM 
      pool p
         LEFT JOIN sent s
            ON s.campid = 'YA1LGfh9' 
            AND p.username = s.username 
   WHERE 
          p.gender = 'f'
      AND s.username IS NULL 

All that said, you are still going to be returning how many records out of the 10+ million... if the pool has 10+ million, and the single camp only has 5,000. You will still be returning almost the entire set.

尽管如此,你仍将返回1000多万中有多少条记录……如果游泳池有1000多万,而一个营地只有5000。您将仍然返回几乎整个集合。