性能:WHERE IN子句vs(INSERT + INNER JOIN)

时间:2022-10-26 03:19:12

I have a use-case where I need to perform a very high number of SELECT SQL

我有一个用例,我需要执行非常多的SELECT SQL

I have two approaches at this moment:

我现在有两种方法:

  1. Query by a list of identifiers. So, I first used WHERE IN clause:

    按标识符列表查询。所以,我首先使用WHERE IN子句:

    • SELECT COL1, COL2, COL3, COL4 FROM MAIN_TABLE WHERE COL1 IN ( 1,2,3,8,11,78,59,65,74,25,36,54558,78854,558 )
    • 选择COL1,COL2,COL3,COL4从MAIN_TABLE到COL1 IN(1,2,3,8,11,78,59,65,74,25,36,54558,78854,558)

  2. I can create a table, let's say, CACHE_TABLE, and first INSERT the identifiers ( 1,2,3,8,11,78,59,65,74,25,36,54558,78854,558 ) into it by a unique key CACHEID and the JOIN this CACHE_TABLE with MAIN_TABLE to get the desired result:

    我可以创建一个表,比方说,CACHE_TABLE,并首先通过一个唯一的INSERT标识符(1,2,3,8,11,78,59,65,74,25,36,54558,78854,558)键CACHEID和JOIN此CACHE_TABLE与MAIN_TABLE以获得所需的结果:

    • SELECT MT.COL1, MT.COL2, MT.COL3, MT.COL4 FROM MAIN_TABLE MT JOIN CACHE_TABLE CT ON CT.IDENTIFIER = MT.COL1 WHERE CT.CACHEID =
    • 选择MT.COL1,MT.COL2,MT.COL3,MT.COL4来自MAIN_TABLE MT JOIN CACHE_TABLE CT ON CT.IDENTIFIER = MT.COL1 WHERE CT.CACHEID =

Performance is really critical in my use-case. So I wanted to know if the approach #2 would yield better performance than approach #1. Also, if there is a better alternative approach(s) for this

在我的用例中,性能非常重要。所以我想知道方法#2是否会产生比方法#1更好的性能。此外,如果有更好的替代方法

Thanks a ton in Advance!!

非常感谢!

2 个解决方案

#1


2  

your answer is best performance approach #2. In my experience IN is a very slow operator, since SQL normally evaluates it as a series of WHERE clauses separated by "OR" (WHERE x=Y OR x=Z OR...). As with ALL THINGS SQL though, your mileage may vary. The speed will depend a lot on indexes

你的答案是最好的表现方法#2。根据我的经验,IN是一个非常慢的运算符,因为SQL通常将它计算为由“OR”分隔的一系列WHERE子句(WHERE x = Y OR x = Z OR ...)。与ALL THINGS SQL一样,您的里程可能会有所不同。速度将取决于索引

#2


0  

You need to test the two approaches.

您需要测试这两种方法。

For a single query, I would expect in to win in most cases -- simply because creating the table and then uses it requires multiple round-trips to the database.

对于单个查询,我希望在大多数情况下获胜 - 仅仅因为创建表然后使用它需要多次往返数据库。

In addition, some databases optimize constant lists (for instance, MySQL does a binary search on values rather than a sequential search).

此外,一些数据库优化常量列表(例如,MySQL对值进行二进制搜索而不是顺序搜索)。

The one thing that will help either version is an index on (col1) or (col1, col2, col3, col4).

有助于任一版本的一件事是(col1)或(col1,col2,col3,col4)上的索引。

#1


2  

your answer is best performance approach #2. In my experience IN is a very slow operator, since SQL normally evaluates it as a series of WHERE clauses separated by "OR" (WHERE x=Y OR x=Z OR...). As with ALL THINGS SQL though, your mileage may vary. The speed will depend a lot on indexes

你的答案是最好的表现方法#2。根据我的经验,IN是一个非常慢的运算符,因为SQL通常将它计算为由“OR”分隔的一系列WHERE子句(WHERE x = Y OR x = Z OR ...)。与ALL THINGS SQL一样,您的里程可能会有所不同。速度将取决于索引

#2


0  

You need to test the two approaches.

您需要测试这两种方法。

For a single query, I would expect in to win in most cases -- simply because creating the table and then uses it requires multiple round-trips to the database.

对于单个查询,我希望在大多数情况下获胜 - 仅仅因为创建表然后使用它需要多次往返数据库。

In addition, some databases optimize constant lists (for instance, MySQL does a binary search on values rather than a sequential search).

此外,一些数据库优化常量列表(例如,MySQL对值进行二进制搜索而不是顺序搜索)。

The one thing that will help either version is an index on (col1) or (col1, col2, col3, col4).

有助于任一版本的一件事是(col1)或(col1,col2,col3,col4)上的索引。