使用左外连接进行慢查询并且为空条件

时间:2022-09-25 12:36:43

I've got a simple query (postgresql if that matters) that retrieves all items for some_user excluding the ones she has on her wishlist:

我有一个简单的查询(postgresql,如果这很重要),它检索some_user的所有项目,不包括她在心愿单上的项目:

select i.* 
from core_item i 
left outer join core_item_in_basket b on (i.id=b.item_id and b.user_id=__some_user__)
where b.on_wishlist is null;

The above query runs in ~50000ms (yep, the number is correct). If I remove the "b.on_wishlist is null" condition or make it "b.on_wishlist is not null", the query runs in some 50ms (quite a change).

以上查询运行在~50000ms(是的,数字是正确的)。如果我删除“b.on_wishlist为null”条件或使其“b.on_wishlist is not null”,则查询将在大约50ms内运行(相当大的变化)。

The query has more joins and conditions but this is irrelevant as only this one slows it down.

该查询具有更多的连接和条件,但这是无关紧要的,因为只有这一个减慢了它。

Some info on the database size:

有关数据库大小的一些信息:

  • core_items has ~ 10.000 records
  • core_items有~10,000条记录

  • core_user has ~5.000 records
  • core_user有~5000条记录

  • core_item_in_basket has ~2.000
  • core_item_in_basket有~2.000

  • records (of which some 50% has on_wishlist = true, the rest is null)
  • 记录(其中约50%的on_wishlist = true,其余为null)

I don't have any indexes (except for ids and foreign keys) on those two tables.

我在这两个表上没有任何索引(除了id和外键)。

The question is: what should I do to make this run faster? I've got a few ideas myself to check out this evening, but I'd like you guys to help if possible, as well.

问题是:我该怎么做才能让这个更快?我自己今晚有一些想法可以查看,但如果可能的话,我希望你们能帮忙。

Thanks!

4 个解决方案

#1


Sorry for adding 2nd answer, but * doesn't let me format comments properly, and since formatting is essential, I have to post answer.

很抱歉添加第二个答案,但*不允许我正确格式化评论,因为格式化是必不可少的,我必须发布答案。

Couple of options:

几种选择:

  1. CREATE INDEX q ON core_item_in_basket (user_id, item_id) WHERE on_wishlist is null;
  2. CREATE INDEX q ON core_item_in_basket(user_id,item_id)WHERE on_wishlist为null;

  3. same index, but change order of columns in it.
  4. 相同的索引,但改变其中列的顺序。

  5. SELECT i.* FROM core_item i WHERE i.id not in (select item_id FROM core_item_in_basket WHERE on_wishlist is null AND user_id = __some_user__); (this query can benefit from index from point #1, but will not benefit from index #2.
  6. SELECT i。* FROM core_item i WHERE i.id not in(select item_id FROM core_item_in_basket WHERE on_wishlist为null AND user_id = __some_user__); (此查询可以从第1点的索引中受益,但不会从索引#2中受益。

  7. SELECT * from core_item where id in (select id from core_item EXCEPT select item_id FROM core_item_in_basket WHERE on_wishlist is null AND user_id = __some_user__);
  8. SELECT * from core_item where id in(select id from core_item EXCEPT select item_id FROM core_item_in_basket WHERE on_wishlist为null AND user_id = __some_user__);

Let us know the results :)

让我们知道结果 :)

#2


try using not exists:

尝试使用不存在:

select i.* 
from   core_item i 
where  not exists (select * from core_item_in_basket b where i.id=b.item_id and b.user_id=__some_user__)

#3


You might want to explain more about the purpose of this query - as some techniques make and some don't make sense, depending on use case.

您可能想要更多地解释此查询的目的 - 正如某些技术所做的那样,有些技术没有意义,具体取决于用例。

How often are you running it?

你多久运行一次?

Is it run for only 1 user, or you run it for all users in some kind of loop?

它是仅为1个用户运行,还是在某种循环中为所有用户运行?

Do: explain analyze and put the output on explain.depesz.com so you will see why it is so slow.

做:解释分析并将输出放在explain.depesz.com上,这样你就会明白它为什么这么慢。

#4


Have you tried adding an index on on_wishlist?

您是否尝试在on_wishlist上添加索引?

It seems that this column needs to be checked for every row in the query. If your tables are that big, this might have quite a significant impact on the query speed.

似乎需要为查询中的每一行检查此列。如果您的表很大,这可能会对查询速度产生很大影响。

As you put the on_wishlist condition in the where clause, which will cause it (depending on the what the query planer decides) to be evaluated after the join has been performed, that comparison has to be done for potentially every row resulting from the join. Both the core_items and core_item_in_basket tables are pretty big, and you don't have an index for that column, so there is very little for the query optimizer to do, which probably leads to the excessive query time.

当您将on_wishlist条件放在where子句中时,这将导致它(取决于查询计划器决定的内容)在执行连接后进行评估,因此必须对连接产生的每一行进行该比较。 core_items和core_item_in_basket表都非常大,并且您没有该列的索引,因此查询优化器几乎没有,这可能会导致查询时间过长。

The size of core_user should have no influence (as it is not referenced in the query).

core_user的大小应该没有影响(因为它没有在查询中引用)。

#1


Sorry for adding 2nd answer, but * doesn't let me format comments properly, and since formatting is essential, I have to post answer.

很抱歉添加第二个答案,但*不允许我正确格式化评论,因为格式化是必不可少的,我必须发布答案。

Couple of options:

几种选择:

  1. CREATE INDEX q ON core_item_in_basket (user_id, item_id) WHERE on_wishlist is null;
  2. CREATE INDEX q ON core_item_in_basket(user_id,item_id)WHERE on_wishlist为null;

  3. same index, but change order of columns in it.
  4. 相同的索引,但改变其中列的顺序。

  5. SELECT i.* FROM core_item i WHERE i.id not in (select item_id FROM core_item_in_basket WHERE on_wishlist is null AND user_id = __some_user__); (this query can benefit from index from point #1, but will not benefit from index #2.
  6. SELECT i。* FROM core_item i WHERE i.id not in(select item_id FROM core_item_in_basket WHERE on_wishlist为null AND user_id = __some_user__); (此查询可以从第1点的索引中受益,但不会从索引#2中受益。

  7. SELECT * from core_item where id in (select id from core_item EXCEPT select item_id FROM core_item_in_basket WHERE on_wishlist is null AND user_id = __some_user__);
  8. SELECT * from core_item where id in(select id from core_item EXCEPT select item_id FROM core_item_in_basket WHERE on_wishlist为null AND user_id = __some_user__);

Let us know the results :)

让我们知道结果 :)

#2


try using not exists:

尝试使用不存在:

select i.* 
from   core_item i 
where  not exists (select * from core_item_in_basket b where i.id=b.item_id and b.user_id=__some_user__)

#3


You might want to explain more about the purpose of this query - as some techniques make and some don't make sense, depending on use case.

您可能想要更多地解释此查询的目的 - 正如某些技术所做的那样,有些技术没有意义,具体取决于用例。

How often are you running it?

你多久运行一次?

Is it run for only 1 user, or you run it for all users in some kind of loop?

它是仅为1个用户运行,还是在某种循环中为所有用户运行?

Do: explain analyze and put the output on explain.depesz.com so you will see why it is so slow.

做:解释分析并将输出放在explain.depesz.com上,这样你就会明白它为什么这么慢。

#4


Have you tried adding an index on on_wishlist?

您是否尝试在on_wishlist上添加索引?

It seems that this column needs to be checked for every row in the query. If your tables are that big, this might have quite a significant impact on the query speed.

似乎需要为查询中的每一行检查此列。如果您的表很大,这可能会对查询速度产生很大影响。

As you put the on_wishlist condition in the where clause, which will cause it (depending on the what the query planer decides) to be evaluated after the join has been performed, that comparison has to be done for potentially every row resulting from the join. Both the core_items and core_item_in_basket tables are pretty big, and you don't have an index for that column, so there is very little for the query optimizer to do, which probably leads to the excessive query time.

当您将on_wishlist条件放在where子句中时,这将导致它(取决于查询计划器决定的内容)在执行连接后进行评估,因此必须对连接产生的每一行进行该比较。 core_items和core_item_in_basket表都非常大,并且您没有该列的索引,因此查询优化器几乎没有,这可能会导致查询时间过长。

The size of core_user should have no influence (as it is not referenced in the query).

core_user的大小应该没有影响(因为它没有在查询中引用)。