使用“not in”子查询编写select语句的最有效方法是什么?

时间:2022-09-23 17:43:50

What is the most efficient way to write a select statement similar to the below.

编写类似于下面的select语句的最有效方法是什么。

SELECT *
FROM Orders
WHERE Orders.Order_ID not in (Select Order_ID FROM HeldOrders)

The gist is you want the records from one table when the item is not in another table.

要点是当项目不在另一个表中时,您需要来自一个表的记录。

5 个解决方案

#1


8  

"Most efficient" is going to be different depending on tables sizes, indexes, and so on. In other words it's going to differ depending on the specific case you're using.

“最高效”将根据表大小,索引等而有所不同。换句话说,它会根据您使用的具体情况而有所不同。

There are three ways I commonly use to accomplish what you want, depending on the situation.

根据具体情况,我通常使用三种方法来完成您想要的任务。

1. Your example works fine if Orders.order_id is indexed, and HeldOrders is fairly small.

1.如果对Orders.order_id编制索引,并且HeldOrders相当小,那么您的示例工作正常。

2. Another method is the "correlated subquery" which is a slight variation of what you have...

另一种方法是“相关子查询”,这是你所拥有的一点点变化......

SELECT *
FROM Orders o
WHERE Orders.Order_ID not in (Select Order_ID 
                              FROM HeldOrders h 
                              where h.order_id = o.order_id)

Note the addition of the where clause. This tends to work better when HeldOrders has a large number of rows. Order_ID needs to be indexed in both tables.

注意添加where子句。当HeldOrders有大量行时,这往往会更好。 Order_ID需要在两个表中编制索引。

3. Another method I use sometimes is left outer join...

我有时使用的另一种方法是左外连接......

SELECT *
FROM Orders o
left outer join HeldOrders h on h.order_id = o.order_id
where h.order_id is null

When using the left outer join, h.order_id will have a value in it matching o.order_id when there is a matching row. If there isn't a matching row, h.order_id will be NULL. By checking for the NULL values in the where clause you can filter on everything that doesn't have a match.

当使用左外连接时,当匹配行时,h.order_id将在其中匹配o.order_id的值。如果没有匹配的行,则h.order_id将为NULL。通过检查where子句中的NULL值,您可以过滤掉没有匹配项的所有内容。

Each of these variations can work more or less efficiently in various scenarios.

在各种情况下,这些变化中的每一种都可以或多或少地有效地工

#2


20  

For starters, a link to an old article in my blog on how NOT IN predicate works in SQL Server (and in other systems too):

对于初学者,我的博客中有关NOT IN谓词如何在SQL Server(以及其他系统)中工作的旧文章的链接:


You can rewrite it as follows:

您可以按如下方式重写它:

SELECT  *
FROM    Orders o
WHERE   NOT EXISTS
        (
        SELECT  NULL
        FROM    HeldOrders ho
        WHERE   ho.OrderID = o.OrderID
        )

, however, most databases will treat these queries the same.

但是,大多数数据库都会将这些查询视为相同。

Both these queries will use some kind of an ANTI JOIN.

这两个查询都将使用某种ANTI JOIN。

This is useful for SQL Server if you want to check two or more columns, since SQL Server does not support this syntax:

如果要检查两个或更多列,这对SQL Server很有用,因为SQL Server不支持此语法:

SELECT  *
FROM    Orders o
WHERE   (col1, col2) NOT IN
        (
        SELECT  col1, col2
        FROM    HeldOrders ho
        )

Note, however, that NOT IN may be tricky due to the way it treats NULL values.

但请注意,由于处理NULL值的方式,NOT IN可能会很棘手。

If Held.Orders is nullable, no records are found and the subquery returns but a single NULL, the whole query will return nothing (both IN and NOT IN will evaluate to NULL in this case).

如果Held.Orders可以为空,则没有找到记录,并且子查询只返回一个NULL,整个查询将不返回任何内容(在这种情况下,IN和NOT IN都将计算为NULL)。

Consider these data:

考虑以下数据:

Orders:

OrderID
---
1

HeldOrders:

OrderID
---
2
NULL

This query:

这个查询:

SELECT  *
FROM    Orders o
WHERE   OrderID NOT IN
        (
        SELECT  OrderID
        FROM    HeldOrders ho
        )

will return nothing, which is probably not what you'd expect.

什么都不会返回,这可能不是你所期望的。

However, this one:

但是,这一个:

SELECT  *
FROM    Orders o
WHERE   NOT EXISTS
        (
        SELECT  NULL
        FROM    HeldOrders ho
        WHERE   ho.OrderID = o.OrderID
        )

will return the row with OrderID = 1.

将返回OrderID = 1的行。

Note that LEFT JOIN solutions proposed by others is far from being a most efficient solution.

请注意,其他人提出的LEFT JOIN解决方案远非最有效的解决方案。

This query:

这个查询:

SELECT  *
FROM    Orders o
LEFT JOIN
        HeldOrders ho
ON      ho.OrderID = o.OrderID
WHERE   ho.OrderID IS NULL

will use a filter condition that will need to evaluate and filter out all matching rows which can be numerius

将使用一个过滤条件,需要评估和过滤掉所有匹配的行

An ANTI JOIN method used by both IN and EXISTS will just need to make sure that a record does not exists once per each row in Orders, so it will eliminate all possible duplicates first:

IN和EXISTS使用的ANTI JOIN方法只需要确保Orders中每行记录不存在一次,因此它将首先消除所有可能的重复:

  • NESTED LOOPS ANTI JOIN and MERGE ANTI JOIN will just skip the duplicates when evaluating HeldOrders.
  • NESTED LOOPS ANTI JOIN和MERGE ANTI JOIN将在评估HeldOrders时跳过重复项。
  • A HASH ANTI JOIN will eliminate duplicates when building the hash table.
  • HASH ANTI JOIN将在构建哈希表时消除重复。

#3


4  

You can use a LEFT OUTER JOIN and check for NULL on the right table.

您可以使用LEFT OUTER JOIN并在右表中检查NULL。

SELECT O1.*
FROM Orders O1
LEFT OUTER JOIN HeldOrders O2
ON O1.Order_ID = O2.Order_Id
WHERE O2.Order_Id IS NULL

#4


1  

I'm not sure what is the most efficient, but other options are:

我不确定什么是最有效的,但其他选择是:

1. Use EXISTS

SELECT * 
FROM ORDERS O 
WHERE NOT EXISTS (SELECT 1 
                  FROM HeldOrders HO 
                  WHERE O.Order_ID = HO.OrderID)

2. Use EXCEPT

SELECT O.Order_ID 
FROM ORDERS O 
EXCEPT 
SELECT HO.Order_ID 
FROM HeldOrders

#5


0  

Try

尝试

SELECT *
FROM Orders
LEFT JOIN HeldOrders
ON HeldOrders.Order_ID = Orders.Order_ID
WHERE HeldOrders.Order_ID IS NULL

#1


8  

"Most efficient" is going to be different depending on tables sizes, indexes, and so on. In other words it's going to differ depending on the specific case you're using.

“最高效”将根据表大小,索引等而有所不同。换句话说,它会根据您使用的具体情况而有所不同。

There are three ways I commonly use to accomplish what you want, depending on the situation.

根据具体情况,我通常使用三种方法来完成您想要的任务。

1. Your example works fine if Orders.order_id is indexed, and HeldOrders is fairly small.

1.如果对Orders.order_id编制索引,并且HeldOrders相当小,那么您的示例工作正常。

2. Another method is the "correlated subquery" which is a slight variation of what you have...

另一种方法是“相关子查询”,这是你所拥有的一点点变化......

SELECT *
FROM Orders o
WHERE Orders.Order_ID not in (Select Order_ID 
                              FROM HeldOrders h 
                              where h.order_id = o.order_id)

Note the addition of the where clause. This tends to work better when HeldOrders has a large number of rows. Order_ID needs to be indexed in both tables.

注意添加where子句。当HeldOrders有大量行时,这往往会更好。 Order_ID需要在两个表中编制索引。

3. Another method I use sometimes is left outer join...

我有时使用的另一种方法是左外连接......

SELECT *
FROM Orders o
left outer join HeldOrders h on h.order_id = o.order_id
where h.order_id is null

When using the left outer join, h.order_id will have a value in it matching o.order_id when there is a matching row. If there isn't a matching row, h.order_id will be NULL. By checking for the NULL values in the where clause you can filter on everything that doesn't have a match.

当使用左外连接时,当匹配行时,h.order_id将在其中匹配o.order_id的值。如果没有匹配的行,则h.order_id将为NULL。通过检查where子句中的NULL值,您可以过滤掉没有匹配项的所有内容。

Each of these variations can work more or less efficiently in various scenarios.

在各种情况下,这些变化中的每一种都可以或多或少地有效地工

#2


20  

For starters, a link to an old article in my blog on how NOT IN predicate works in SQL Server (and in other systems too):

对于初学者,我的博客中有关NOT IN谓词如何在SQL Server(以及其他系统)中工作的旧文章的链接:


You can rewrite it as follows:

您可以按如下方式重写它:

SELECT  *
FROM    Orders o
WHERE   NOT EXISTS
        (
        SELECT  NULL
        FROM    HeldOrders ho
        WHERE   ho.OrderID = o.OrderID
        )

, however, most databases will treat these queries the same.

但是,大多数数据库都会将这些查询视为相同。

Both these queries will use some kind of an ANTI JOIN.

这两个查询都将使用某种ANTI JOIN。

This is useful for SQL Server if you want to check two or more columns, since SQL Server does not support this syntax:

如果要检查两个或更多列,这对SQL Server很有用,因为SQL Server不支持此语法:

SELECT  *
FROM    Orders o
WHERE   (col1, col2) NOT IN
        (
        SELECT  col1, col2
        FROM    HeldOrders ho
        )

Note, however, that NOT IN may be tricky due to the way it treats NULL values.

但请注意,由于处理NULL值的方式,NOT IN可能会很棘手。

If Held.Orders is nullable, no records are found and the subquery returns but a single NULL, the whole query will return nothing (both IN and NOT IN will evaluate to NULL in this case).

如果Held.Orders可以为空,则没有找到记录,并且子查询只返回一个NULL,整个查询将不返回任何内容(在这种情况下,IN和NOT IN都将计算为NULL)。

Consider these data:

考虑以下数据:

Orders:

OrderID
---
1

HeldOrders:

OrderID
---
2
NULL

This query:

这个查询:

SELECT  *
FROM    Orders o
WHERE   OrderID NOT IN
        (
        SELECT  OrderID
        FROM    HeldOrders ho
        )

will return nothing, which is probably not what you'd expect.

什么都不会返回,这可能不是你所期望的。

However, this one:

但是,这一个:

SELECT  *
FROM    Orders o
WHERE   NOT EXISTS
        (
        SELECT  NULL
        FROM    HeldOrders ho
        WHERE   ho.OrderID = o.OrderID
        )

will return the row with OrderID = 1.

将返回OrderID = 1的行。

Note that LEFT JOIN solutions proposed by others is far from being a most efficient solution.

请注意,其他人提出的LEFT JOIN解决方案远非最有效的解决方案。

This query:

这个查询:

SELECT  *
FROM    Orders o
LEFT JOIN
        HeldOrders ho
ON      ho.OrderID = o.OrderID
WHERE   ho.OrderID IS NULL

will use a filter condition that will need to evaluate and filter out all matching rows which can be numerius

将使用一个过滤条件,需要评估和过滤掉所有匹配的行

An ANTI JOIN method used by both IN and EXISTS will just need to make sure that a record does not exists once per each row in Orders, so it will eliminate all possible duplicates first:

IN和EXISTS使用的ANTI JOIN方法只需要确保Orders中每行记录不存在一次,因此它将首先消除所有可能的重复:

  • NESTED LOOPS ANTI JOIN and MERGE ANTI JOIN will just skip the duplicates when evaluating HeldOrders.
  • NESTED LOOPS ANTI JOIN和MERGE ANTI JOIN将在评估HeldOrders时跳过重复项。
  • A HASH ANTI JOIN will eliminate duplicates when building the hash table.
  • HASH ANTI JOIN将在构建哈希表时消除重复。

#3


4  

You can use a LEFT OUTER JOIN and check for NULL on the right table.

您可以使用LEFT OUTER JOIN并在右表中检查NULL。

SELECT O1.*
FROM Orders O1
LEFT OUTER JOIN HeldOrders O2
ON O1.Order_ID = O2.Order_Id
WHERE O2.Order_Id IS NULL

#4


1  

I'm not sure what is the most efficient, but other options are:

我不确定什么是最有效的,但其他选择是:

1. Use EXISTS

SELECT * 
FROM ORDERS O 
WHERE NOT EXISTS (SELECT 1 
                  FROM HeldOrders HO 
                  WHERE O.Order_ID = HO.OrderID)

2. Use EXCEPT

SELECT O.Order_ID 
FROM ORDERS O 
EXCEPT 
SELECT HO.Order_ID 
FROM HeldOrders

#5


0  

Try

尝试

SELECT *
FROM Orders
LEFT JOIN HeldOrders
ON HeldOrders.Order_ID = Orders.Order_ID
WHERE HeldOrders.Order_ID IS NULL