内部使用子查询连接派生表

时间:2022-09-25 15:39:19

Environment: SQL 2008 R2

环境:SQL 2008 R2

I created a derived table using sub query and joined with main table. I just like to know if subquery is executed only once or will it be executed for each row in result set. Consider following example (fictional table names for reference only)

我使用子查询创建了一个派生表,并与主表连接。我只想知道子查询是仅执行一次还是将对结果集中的每一行执行。请考虑以下示例(虚构表名仅供参考)

SELECT E.EID,DT.Salary FROM Employees E
INNER JOIN
(
    SELECT EID, (SR.Rate * AD.DaysAttended) Salary
    FROM SalaryRate SR
    INNER JOIN AttendanceDetails AD on AD.EID=SR.EID
) DT --Derived Table for inner join
ON DT.EID=E.EID

So, the subquery used for Inner Join will be executed only once or multiple times??

那么,用于Inner Join的子查询只会被执行一次或多次?

If I re-write above query using OUTER APPLY, I know for sure the subquery will be executed for each row. See Below.

如果我使用OUTER APPLY重写上面的查询,我肯定知道将为每一行执行子查询。见下文。

SELECT E.EID,DT.Salary FROM Employees E
OUTER APPLY
(
    SELECT (SR.Rate * AD.DaysAttended) Salary
    FROM SalaryRate SR
    INNER JOIN AttendanceDetails AD on AD.EID=SR.EID
    WHERE SR.EID=E.EID
) DT --Derived Table for outer apply

So just want to ensure that Inner Join will execute sub query only once.

所以只想确保Inner Join只执行一次子查询。

3 个解决方案

#1


8  

The first thing to note is that your queries are not comparable, OUTER APPLY needs to be replaced with CROSS APPLY, or INNER JOIN with LEFT JOIN.

首先要注意的是,您的查询无法比较,OUTER APPLY需要替换为CROSS APPLY,或者INNER JOIN替换为LEFT JOIN。

When they are made comparable though, you can see that the query plans for both queries are identical. I have just mocked up a sample DDL:

当它们具有可比性时,您可以看到两个查询的查询计划是相同的。我刚刚模拟了一个样本DDL:

CREATE TABLE #Employees (EID INT NOT NULL);
INSERT #Employees VALUES (0);
CREATE TABLE #SalaryRate (EID INT NOT NULL, Rate MONEY NOT NULL);
CREATE TABLE #AttendanceDetails (EID INT NOT NULL, DaysAttended INT NOT NULL);

Running the following:

运行以下内容:

SELECT E.EID,DT.Salary FROM #Employees E
OUTER APPLY
(
    SELECT (SR.Rate * AD.DaysAttended) Salary
    FROM #SalaryRate SR
    INNER JOIN #AttendanceDetails AD on AD.EID=SR.EID
    WHERE SR.EID=E.EID
) DT; --Derived Table for outer apply

SELECT E.EID,DT.Salary FROM #Employees E
LEFT JOIN
(
    SELECT SR.EID, (SR.Rate * AD.DaysAttended) Salary
    FROM #SalaryRate SR
    INNER JOIN #AttendanceDetails AD on AD.EID=SR.EID
) DT --Derived Table for inner join
ON DT.EID=E.EID;

Gives the following plan:

给出以下计划:

内部使用子查询连接派生表

And changing to INNER/CROSS:

并改为INNER / CROSS:

SELECT E.EID,DT.Salary FROM #Employees E
CROSS APPLY
(
    SELECT (SR.Rate * AD.DaysAttended) Salary
    FROM #SalaryRate SR
    INNER JOIN #AttendanceDetails AD on AD.EID=SR.EID
    WHERE SR.EID=E.EID
) DT; --Derived Table for outer apply


SELECT E.EID,DT.Salary FROM #Employees E
INNER JOIN
(
    SELECT SR.EID, (SR.Rate * AD.DaysAttended) Salary
    FROM #SalaryRate SR
    INNER JOIN #AttendanceDetails AD on AD.EID=SR.EID
) DT --Derived Table for inner join
ON DT.EID=E.EID;

Gives the following plan:

给出以下计划:

内部使用子查询连接派生表

These are the plans where there is no data in the outer tables, and only one row in employees, so not really realistic. In the case of the outer apply, SQL Server is able to determine that there is only one row in employees, so it would be beneficial to just do a nested loop join (i.e. row by row lookup) to the outer tables. After putting 1,000 rows in employees, using LEFT JOIN/OUTER APPLY yields the following plan:

这些是外部表中没有数据的计划,员工中只有一行,所以不太现实。在外部应用的情况下,SQL Server能够确定员工中只有一行,因此仅对外部表执行嵌套循环连接(即逐行查找)将是有益的。在员工中放置1,000行后,使用LEFT JOIN / OUTER APPLY产生以下计划:

内部使用子查询连接派生表

You can see here that the join is now a hash match join, which means (in it's simplest terms) that SQL Server has determined that the best plan is to execute the outer query first, hash the results and then lookup from employees. This however does not mean that the subquery as a whole is executed and the results stored, for simplicity purposes you could consider this, but predicates from the outer query can still be still be used, for example, if the subquery were executed and stored internally, the following query would present massive overhead:

你可以在这里看到连接现在是一个哈希匹配连接,这意味着(用它最简单的术语)SQL Server已经确定最好的计划是首先执行外部查询,哈希结果然后从员工查找。然而,这并不意味着执行整个子查询并存储结果,为简单起见,您可以考虑这一点,但仍然可以使用外部查询的谓词,例如,如果子查询在内部执行和存储,以下查询会产生巨大的开销:

SELECT E.EID,DT.Salary FROM #Employees E
LEFT JOIN
(
    SELECT SR.EID, (SR.Rate * AD.DaysAttended) Salary
    FROM #SalaryRate SR
    INNER JOIN #AttendanceDetails AD on AD.EID=SR.EID
) DT --Derived Table for inner join
ON DT.EID=E.EID
WHERE E.EID = 1;

What whould be the point in retrieving all employee rates, storing the results, only to actually look up one employee? Inspection of the execution plan shows that the EID = 1 predicate is passed to the table scan on #AttendanceDetails:

检索所有员工费率,存储结果,仅实际查找一名员工的重点是什么?检查执行计划显示EID = 1谓词传递给#AttendanceDetails上的表扫描:

内部使用子查询连接派生表

So the answer to the following points is:

所以以下几点的答案是:

  • If I re-write above query using OUTER APPLY, I know for sure the subquery will be executed for each row.
  • 如果我使用OUTER APPLY重写上面的查询,我肯定知道将为每一行执行子查询。
  • Inner Join will execute sub query only once.
  • Inner Join只执行一次子查询。

It depends. Using APPLY SQL Server will attempt to rewrite the query as a JOIN if possible, as this will yield the optimal plan, so using OUTER APPLY does not guarantee that the query will be executed once for each row. Similarly using LEFT JOIN does not guarantee that the query is executed only once.

这取决于。使用APPLY如果可能,SQL Server将尝试将查询重写为JOIN,因为这将产生最佳计划,因此使用OUTER APPLY不保证查询将针对每一行执行一次。类似地,使用LEFT JOIN并不能保证查询只执行一次。

SQL is a declarative language, in that you tell it what you want it to do, not how to do it, so you shouldn't rely on specific commands to elicit specific behaviour, instead, if you find performance issues, check the execution plan, and IO statistics to find out how it is doing it, and identify how you can improve your query.

SQL是一种声明性语言,因为你告诉它你想要它做什么,而不是如何做,所以你不应该依赖特定的命令来引出特定的行为,相反,如果你发现性能问题,检查执行计划和IO统计信息,以了解它是如何做到的,并确定如何改进查询。

Further more, SQL Server does not matierialise subqueries, usually the definition is expanded out into the main query, so even though you have written:

此外,SQL Server不会对子查询进行编码,通常将定义扩展到主查询中,因此即使您已编写:

SELECT E.EID,DT.Salary FROM #Employees E
INNER JOIN
(
    SELECT SR.EID, (SR.Rate * AD.DaysAttended) Salary
    FROM #SalaryRate SR
    INNER JOIN #AttendanceDetails AD on AD.EID=SR.EID
) DT --Derived Table for inner join
ON DT.EID=E.EID;

What is actually executed is more like:

实际执行的更像是:

SELECT  e.EID, sr.Rate * ad.DaysAttended AS Salary
FROM    #Employees e
        INNER JOIN #SalaryRate sr
            on e.EID = sr.EID
        INNER JOIN #AttendanceDetails ad
            ON ad.EID = sr.EID;

#2


1  

The sub-query will be evaluated only once. To avoid confusion, we could simply think of the sub-query as a single table/view since the outer and inner inner queries are not co-related.

子查询只会被评估一次。为了避免混淆,我们可以简单地将子查询视为单个表/视图,因为外部和内部内部查询不是相关的。

#3


1  

With INNER JOIN your Sub-Query will be execute only once and its records may gets stored internally in tempdb worktable on complex operations, then JOINed with the 1st table.

使用INNER JOIN,您的Sub-Query将只执行一次,并且其记录可以在内部存储在复杂操作的tempdb工作表中,然后与第一个表一起使用。

With APPLY clause, the Sub-Query will be executed for every row in the 1st table.

使用APPLY子句,将对第1个表中的每一行执行子查询。

edit: use CTE

编辑:使用CTE

;with SalaryRateCTE as 
(
    SELECT EID, (SR.Rate * AD.DaysAttended) AS Salary
    FROM SalaryRate SR
    INNER JOIN AttendanceDetails AD on AD.EID=SR.EID
)
SELECT E.EID, DT.Salary 
FROM Employees E
INNER JOIN SalaryRateCTE DT --Derived Table for inner join
ON DT.EID = E.EID

#1


8  

The first thing to note is that your queries are not comparable, OUTER APPLY needs to be replaced with CROSS APPLY, or INNER JOIN with LEFT JOIN.

首先要注意的是,您的查询无法比较,OUTER APPLY需要替换为CROSS APPLY,或者INNER JOIN替换为LEFT JOIN。

When they are made comparable though, you can see that the query plans for both queries are identical. I have just mocked up a sample DDL:

当它们具有可比性时,您可以看到两个查询的查询计划是相同的。我刚刚模拟了一个样本DDL:

CREATE TABLE #Employees (EID INT NOT NULL);
INSERT #Employees VALUES (0);
CREATE TABLE #SalaryRate (EID INT NOT NULL, Rate MONEY NOT NULL);
CREATE TABLE #AttendanceDetails (EID INT NOT NULL, DaysAttended INT NOT NULL);

Running the following:

运行以下内容:

SELECT E.EID,DT.Salary FROM #Employees E
OUTER APPLY
(
    SELECT (SR.Rate * AD.DaysAttended) Salary
    FROM #SalaryRate SR
    INNER JOIN #AttendanceDetails AD on AD.EID=SR.EID
    WHERE SR.EID=E.EID
) DT; --Derived Table for outer apply

SELECT E.EID,DT.Salary FROM #Employees E
LEFT JOIN
(
    SELECT SR.EID, (SR.Rate * AD.DaysAttended) Salary
    FROM #SalaryRate SR
    INNER JOIN #AttendanceDetails AD on AD.EID=SR.EID
) DT --Derived Table for inner join
ON DT.EID=E.EID;

Gives the following plan:

给出以下计划:

内部使用子查询连接派生表

And changing to INNER/CROSS:

并改为INNER / CROSS:

SELECT E.EID,DT.Salary FROM #Employees E
CROSS APPLY
(
    SELECT (SR.Rate * AD.DaysAttended) Salary
    FROM #SalaryRate SR
    INNER JOIN #AttendanceDetails AD on AD.EID=SR.EID
    WHERE SR.EID=E.EID
) DT; --Derived Table for outer apply


SELECT E.EID,DT.Salary FROM #Employees E
INNER JOIN
(
    SELECT SR.EID, (SR.Rate * AD.DaysAttended) Salary
    FROM #SalaryRate SR
    INNER JOIN #AttendanceDetails AD on AD.EID=SR.EID
) DT --Derived Table for inner join
ON DT.EID=E.EID;

Gives the following plan:

给出以下计划:

内部使用子查询连接派生表

These are the plans where there is no data in the outer tables, and only one row in employees, so not really realistic. In the case of the outer apply, SQL Server is able to determine that there is only one row in employees, so it would be beneficial to just do a nested loop join (i.e. row by row lookup) to the outer tables. After putting 1,000 rows in employees, using LEFT JOIN/OUTER APPLY yields the following plan:

这些是外部表中没有数据的计划,员工中只有一行,所以不太现实。在外部应用的情况下,SQL Server能够确定员工中只有一行,因此仅对外部表执行嵌套循环连接(即逐行查找)将是有益的。在员工中放置1,000行后,使用LEFT JOIN / OUTER APPLY产生以下计划:

内部使用子查询连接派生表

You can see here that the join is now a hash match join, which means (in it's simplest terms) that SQL Server has determined that the best plan is to execute the outer query first, hash the results and then lookup from employees. This however does not mean that the subquery as a whole is executed and the results stored, for simplicity purposes you could consider this, but predicates from the outer query can still be still be used, for example, if the subquery were executed and stored internally, the following query would present massive overhead:

你可以在这里看到连接现在是一个哈希匹配连接,这意味着(用它最简单的术语)SQL Server已经确定最好的计划是首先执行外部查询,哈希结果然后从员工查找。然而,这并不意味着执行整个子查询并存储结果,为简单起见,您可以考虑这一点,但仍然可以使用外部查询的谓词,例如,如果子查询在内部执行和存储,以下查询会产生巨大的开销:

SELECT E.EID,DT.Salary FROM #Employees E
LEFT JOIN
(
    SELECT SR.EID, (SR.Rate * AD.DaysAttended) Salary
    FROM #SalaryRate SR
    INNER JOIN #AttendanceDetails AD on AD.EID=SR.EID
) DT --Derived Table for inner join
ON DT.EID=E.EID
WHERE E.EID = 1;

What whould be the point in retrieving all employee rates, storing the results, only to actually look up one employee? Inspection of the execution plan shows that the EID = 1 predicate is passed to the table scan on #AttendanceDetails:

检索所有员工费率,存储结果,仅实际查找一名员工的重点是什么?检查执行计划显示EID = 1谓词传递给#AttendanceDetails上的表扫描:

内部使用子查询连接派生表

So the answer to the following points is:

所以以下几点的答案是:

  • If I re-write above query using OUTER APPLY, I know for sure the subquery will be executed for each row.
  • 如果我使用OUTER APPLY重写上面的查询,我肯定知道将为每一行执行子查询。
  • Inner Join will execute sub query only once.
  • Inner Join只执行一次子查询。

It depends. Using APPLY SQL Server will attempt to rewrite the query as a JOIN if possible, as this will yield the optimal plan, so using OUTER APPLY does not guarantee that the query will be executed once for each row. Similarly using LEFT JOIN does not guarantee that the query is executed only once.

这取决于。使用APPLY如果可能,SQL Server将尝试将查询重写为JOIN,因为这将产生最佳计划,因此使用OUTER APPLY不保证查询将针对每一行执行一次。类似地,使用LEFT JOIN并不能保证查询只执行一次。

SQL is a declarative language, in that you tell it what you want it to do, not how to do it, so you shouldn't rely on specific commands to elicit specific behaviour, instead, if you find performance issues, check the execution plan, and IO statistics to find out how it is doing it, and identify how you can improve your query.

SQL是一种声明性语言,因为你告诉它你想要它做什么,而不是如何做,所以你不应该依赖特定的命令来引出特定的行为,相反,如果你发现性能问题,检查执行计划和IO统计信息,以了解它是如何做到的,并确定如何改进查询。

Further more, SQL Server does not matierialise subqueries, usually the definition is expanded out into the main query, so even though you have written:

此外,SQL Server不会对子查询进行编码,通常将定义扩展到主查询中,因此即使您已编写:

SELECT E.EID,DT.Salary FROM #Employees E
INNER JOIN
(
    SELECT SR.EID, (SR.Rate * AD.DaysAttended) Salary
    FROM #SalaryRate SR
    INNER JOIN #AttendanceDetails AD on AD.EID=SR.EID
) DT --Derived Table for inner join
ON DT.EID=E.EID;

What is actually executed is more like:

实际执行的更像是:

SELECT  e.EID, sr.Rate * ad.DaysAttended AS Salary
FROM    #Employees e
        INNER JOIN #SalaryRate sr
            on e.EID = sr.EID
        INNER JOIN #AttendanceDetails ad
            ON ad.EID = sr.EID;

#2


1  

The sub-query will be evaluated only once. To avoid confusion, we could simply think of the sub-query as a single table/view since the outer and inner inner queries are not co-related.

子查询只会被评估一次。为了避免混淆,我们可以简单地将子查询视为单个表/视图,因为外部和内部内部查询不是相关的。

#3


1  

With INNER JOIN your Sub-Query will be execute only once and its records may gets stored internally in tempdb worktable on complex operations, then JOINed with the 1st table.

使用INNER JOIN,您的Sub-Query将只执行一次,并且其记录可以在内部存储在复杂操作的tempdb工作表中,然后与第一个表一起使用。

With APPLY clause, the Sub-Query will be executed for every row in the 1st table.

使用APPLY子句,将对第1个表中的每一行执行子查询。

edit: use CTE

编辑:使用CTE

;with SalaryRateCTE as 
(
    SELECT EID, (SR.Rate * AD.DaysAttended) AS Salary
    FROM SalaryRate SR
    INNER JOIN AttendanceDetails AD on AD.EID=SR.EID
)
SELECT E.EID, DT.Salary 
FROM Employees E
INNER JOIN SalaryRateCTE DT --Derived Table for inner join
ON DT.EID = E.EID