SQL Server中的临时表使用情况

时间:2021-06-01 07:07:04

This is a bit of an open question but I would really like to hear people opinions.

这是一个悬而未决的问题,但我真的很想听听别人的意见。

I rarely make use of explicitly declared temporary tables (either table variables or regular #tmp tables) as I believe not doing so leads to more concise, readable and debuggable T-SQL. I also think that SQL can do a better job than I of making use of temporary storage when it's required (such as when you use a derived table in a query).

我很少使用显式声明的临时表(表变量或常规#tmp表),因为我认为不这样做会导致更简洁,可读和可调试的T-SQL。我还认为SQL可以比我在需要时使用临时存储(例如在查询中使用派生表时)做得更好。

The only exception is when the database is not a typical relational database but a star or snowflake schema. I understand that it's best to apply filters to the fact table first and then use the resultant temp table to get the values from your dimensions.

唯一的例外是数据库不是典型的关系数据库而是星形或雪花模式。我知道最好先将过滤器应用于事实表,然后使用生成的临时表来获取维度中的值。

Is this the common opinion or does anyone have an opposing view?

这是普遍意见还是有人反对意见?

6 个解决方案

#1


14  

Temporary tables are most useful for a complex batch process like a report or ETL job. Generally you would expect to use them fairly rarely in a transactional application.

临时表对于复杂的批处理过程(如报表或ETL作业)最有用。通常,您希望在事务性应用程序中很少使用它们。

If you're doing complex query with a join involving multiple large tables (perhaps for a report) the query optimiser may not actually be able to optimise this in one hit, so temporary tables become a win here - they decompose the query into a series of simpler ones that give the query optimiser less opportunity to screw up the plan. Sometimes you have an operation that cannot be done in a single SQL statement at all, so multiple steps for processing are necessary to do the job at all. Again, we're talking about more complex manipulations here.

如果您正在使用涉及多个大型表(可能是报表)的连接进行复杂查询,则查询优化器实际上可能无法在一次命中中对此进行优化,因此临时表在此处成为胜利 - 它们将查询分解为一系列更简单的那些使查询优化器更少有机会搞砸计划。有时您的操作根本无法在单个SQL语句中完成,因此需要多个处理步骤来完成工作。我们再次讨论更复杂的操作。

You can also create a tempory table for an intermediate result and then index the table, possibly even putting a clustered index on it to optimise a subsequent query. This might also be a quick and dirty way to optimise a report query on a system where you are not allowed to add indexes to the database schema. SELECT INTO is useful for this type of operation as it is minimally logged (and therefore fast) and doesn't require to align the columns of a select and insert.

您还可以为中间结果创建一个临时表,然后索引该表,甚至可能在其上放置聚簇索引以优化后续查询。这可能是一种快速而又脏的方法,可以在不允许向数据库模式添加索引的系统上优化报表查询。 SELECT INTO对于此类操作很有用,因为它记录最少(因此速度很快),并且不需要对齐select和insert的列。

Other reasons might include extracting data from XML fields using CROSS APPLY and xpath queries. Generally it's much more efficient to extract this into a temp table and then work on the temp table. They're also much faster than CTE's for some tasks as they materialise the query results rather than re-evaluating the query.

其他原因可能包括使用CROSS APPLY和xpath查询从XML字段中提取数据。通常,将其提取到临时表中然后处理临时表会更有效。对于某些任务,它们也比CTE快得多,因为它们实现了查询结果而不是重新评估查询。

One thing to note is that temporary tables are exactly the same structure that the query engine uses to store intermediate join results, so there is no performance penalty to using them. Temporary tables also allow multi-phase tasks using set operations and make cursors almost (not quite but almost) unnecessary in T-SQL code.

需要注意的一点是,临时表与查询引擎用于存储中间连接结果的结构完全相同,因此使用它们不会有性能损失。临时表还允许使用set操作的多阶段任务,并且在T-SQL代码中几乎(几乎不是)几乎不需要游标。

'Code Smell' is an overstatement but if I saw a lot of simple operations involving temporary tables I would be wondering what was going on.

'Code Smell'是一种夸大其词但如果我看到很多涉及临时表的简单操作,我会想知道发生了什么。

#2


5  

It really depends on what you are doing. I generally try to avoid them, but sometimes you need to do something complicated that takes multiple steps. Generally this is way beyond the simple select from table stuff. Like anything else it's a tool that you have to know when to use.

这真的取决于你在做什么。我通常会尽量避免它们,但有时你需要做一些复杂的事情,需要多个步骤。通常这超出了从表格中选择的简单方法。像其他任何东西一样,它是一个你必须知道何时使用的工具。

I would agree with you that I normally let the db handle stuff behind the scenes, but there are times when it's optimization is off and you have to go in and do it by hand.

我同意你的意见,我通常让db在幕后处理这些东西,但有时候它的优化是关闭的,你必须亲自去做。

#3


3  

I see temp tables as a sort of SQL code smell, to be used only as a last resort. If you are having to cache data before you get a final result set, then it usually indicates bad DB design to me.

我认为临时表是一种SQL代码的味道,只能作为最后的手段使用。如果您在获得最终结果集之前必须缓存数据,那么它通常会向我指示错误的数据库设计。

#4


3  

Temp tables certainly have appropriate uses, they're not a code smell if they're used correctly. One of the nice things about them is that they live in tempdb, which is typically set to Simple recovery model. This means that if you're using temp tables for what they're good for (mostly bulk operations), you're generating a minimal amount of log compared to what the same operation would do on tables in your production db, which probably is in Full recovery model.

临时表肯定有适当的用途,如果使用正确,它们不是代码味道。关于它们的一个好处是它们存在于tempdb中,通常设置为简单恢复模型。这意味着如果你正在使用临时表来获得它们的优势(主要是批量操作),那么与生产数据库中的表相同的操作相比,你生成的日志量最少,这可能是在完全恢复模型中。

If, as another poster suggested, your production db is on good hardware, but your tempdb isn't, ask your DBA to move it. SQL Server itself uses tempdb quite a bit to process your queries, so it's important for tempdb to have a high performance home.

如果,正如另一张海报建议的那样,您的生产数据库位于良好的硬件上,但您的tempdb不是,请让您的DBA移动它。 SQL Server本身使用tempdb来处理您的查询,因此tempdb具有高性能的主页非常重要。

Table variables are a different creature entirely. They live only in memory. One good use for them is if you've got a function that you need to call for each row in your query with CROSS APPLY. If that function is expensive, but the number of different results you can get from it is small, you might get significantly higher performance from precomputing the results of all the possible calls (or perhaps all possible calls for your dataset) and storing that in a table variable, then joining to that table variable instead of using CROSS APPLY.

表变量完全是一个不同的生物。他们只活在记忆中。对他们来说一个很好的用途是,如果你有一个函数,你需要使用CROSS APPLY调用查询中的每一行。如果该功能很昂贵,但是您可以从中获得的结果数量很少,那么通过预先计算所有可能调用的结果(或者可能是对数据集的所有可能调用)并将其存储在表变量,然后加入到该表变量而不是使用CROSS APPLY。

#5


0  

I, too, avoid temporary tables. It is my understanding that temporary tables on MS SQL Server are always in the file group of the master database. What that means is that, while your production application tables are most probably on some expensive, high performance RAID set up, your temporary tables are located wherever MS SQL Server was installed which is most probably on your C: drive under the Program Files directory.

我也避免临时表。据我所知,MS SQL Server上的临时表始终位于master数据库的文件组中。这意味着,虽然您的生产应用程序表很可能是在一些昂贵的高性能RAID设置上,但您的临时表位于安装MS SQL Server的位置,这很可能位于Program Files目录下的C:驱动器上。

#6


0  

Also useful when you have a dataset that needs to be retrieved once and used over and over in subsequent statements.

当您拥有需要一次检索并在后续语句中反复使用的数据集时,此选项也很有用。

Makes these long batch processes more readable (sometimes this is more important than performance).

使这些长批处理过程更具可读性(有时这比性能更重要)。

#1


14  

Temporary tables are most useful for a complex batch process like a report or ETL job. Generally you would expect to use them fairly rarely in a transactional application.

临时表对于复杂的批处理过程(如报表或ETL作业)最有用。通常,您希望在事务性应用程序中很少使用它们。

If you're doing complex query with a join involving multiple large tables (perhaps for a report) the query optimiser may not actually be able to optimise this in one hit, so temporary tables become a win here - they decompose the query into a series of simpler ones that give the query optimiser less opportunity to screw up the plan. Sometimes you have an operation that cannot be done in a single SQL statement at all, so multiple steps for processing are necessary to do the job at all. Again, we're talking about more complex manipulations here.

如果您正在使用涉及多个大型表(可能是报表)的连接进行复杂查询,则查询优化器实际上可能无法在一次命中中对此进行优化,因此临时表在此处成为胜利 - 它们将查询分解为一系列更简单的那些使查询优化器更少有机会搞砸计划。有时您的操作根本无法在单个SQL语句中完成,因此需要多个处理步骤来完成工作。我们再次讨论更复杂的操作。

You can also create a tempory table for an intermediate result and then index the table, possibly even putting a clustered index on it to optimise a subsequent query. This might also be a quick and dirty way to optimise a report query on a system where you are not allowed to add indexes to the database schema. SELECT INTO is useful for this type of operation as it is minimally logged (and therefore fast) and doesn't require to align the columns of a select and insert.

您还可以为中间结果创建一个临时表,然后索引该表,甚至可能在其上放置聚簇索引以优化后续查询。这可能是一种快速而又脏的方法,可以在不允许向数据库模式添加索引的系统上优化报表查询。 SELECT INTO对于此类操作很有用,因为它记录最少(因此速度很快),并且不需要对齐select和insert的列。

Other reasons might include extracting data from XML fields using CROSS APPLY and xpath queries. Generally it's much more efficient to extract this into a temp table and then work on the temp table. They're also much faster than CTE's for some tasks as they materialise the query results rather than re-evaluating the query.

其他原因可能包括使用CROSS APPLY和xpath查询从XML字段中提取数据。通常,将其提取到临时表中然后处理临时表会更有效。对于某些任务,它们也比CTE快得多,因为它们实现了查询结果而不是重新评估查询。

One thing to note is that temporary tables are exactly the same structure that the query engine uses to store intermediate join results, so there is no performance penalty to using them. Temporary tables also allow multi-phase tasks using set operations and make cursors almost (not quite but almost) unnecessary in T-SQL code.

需要注意的一点是,临时表与查询引擎用于存储中间连接结果的结构完全相同,因此使用它们不会有性能损失。临时表还允许使用set操作的多阶段任务,并且在T-SQL代码中几乎(几乎不是)几乎不需要游标。

'Code Smell' is an overstatement but if I saw a lot of simple operations involving temporary tables I would be wondering what was going on.

'Code Smell'是一种夸大其词但如果我看到很多涉及临时表的简单操作,我会想知道发生了什么。

#2


5  

It really depends on what you are doing. I generally try to avoid them, but sometimes you need to do something complicated that takes multiple steps. Generally this is way beyond the simple select from table stuff. Like anything else it's a tool that you have to know when to use.

这真的取决于你在做什么。我通常会尽量避免它们,但有时你需要做一些复杂的事情,需要多个步骤。通常这超出了从表格中选择的简单方法。像其他任何东西一样,它是一个你必须知道何时使用的工具。

I would agree with you that I normally let the db handle stuff behind the scenes, but there are times when it's optimization is off and you have to go in and do it by hand.

我同意你的意见,我通常让db在幕后处理这些东西,但有时候它的优化是关闭的,你必须亲自去做。

#3


3  

I see temp tables as a sort of SQL code smell, to be used only as a last resort. If you are having to cache data before you get a final result set, then it usually indicates bad DB design to me.

我认为临时表是一种SQL代码的味道,只能作为最后的手段使用。如果您在获得最终结果集之前必须缓存数据,那么它通常会向我指示错误的数据库设计。

#4


3  

Temp tables certainly have appropriate uses, they're not a code smell if they're used correctly. One of the nice things about them is that they live in tempdb, which is typically set to Simple recovery model. This means that if you're using temp tables for what they're good for (mostly bulk operations), you're generating a minimal amount of log compared to what the same operation would do on tables in your production db, which probably is in Full recovery model.

临时表肯定有适当的用途,如果使用正确,它们不是代码味道。关于它们的一个好处是它们存在于tempdb中,通常设置为简单恢复模型。这意味着如果你正在使用临时表来获得它们的优势(主要是批量操作),那么与生产数据库中的表相同的操作相比,你生成的日志量最少,这可能是在完全恢复模型中。

If, as another poster suggested, your production db is on good hardware, but your tempdb isn't, ask your DBA to move it. SQL Server itself uses tempdb quite a bit to process your queries, so it's important for tempdb to have a high performance home.

如果,正如另一张海报建议的那样,您的生产数据库位于良好的硬件上,但您的tempdb不是,请让您的DBA移动它。 SQL Server本身使用tempdb来处理您的查询,因此tempdb具有高性能的主页非常重要。

Table variables are a different creature entirely. They live only in memory. One good use for them is if you've got a function that you need to call for each row in your query with CROSS APPLY. If that function is expensive, but the number of different results you can get from it is small, you might get significantly higher performance from precomputing the results of all the possible calls (or perhaps all possible calls for your dataset) and storing that in a table variable, then joining to that table variable instead of using CROSS APPLY.

表变量完全是一个不同的生物。他们只活在记忆中。对他们来说一个很好的用途是,如果你有一个函数,你需要使用CROSS APPLY调用查询中的每一行。如果该功能很昂贵,但是您可以从中获得的结果数量很少,那么通过预先计算所有可能调用的结果(或者可能是对数据集的所有可能调用)并将其存储在表变量,然后加入到该表变量而不是使用CROSS APPLY。

#5


0  

I, too, avoid temporary tables. It is my understanding that temporary tables on MS SQL Server are always in the file group of the master database. What that means is that, while your production application tables are most probably on some expensive, high performance RAID set up, your temporary tables are located wherever MS SQL Server was installed which is most probably on your C: drive under the Program Files directory.

我也避免临时表。据我所知,MS SQL Server上的临时表始终位于master数据库的文件组中。这意味着,虽然您的生产应用程序表很可能是在一些昂贵的高性能RAID设置上,但您的临时表位于安装MS SQL Server的位置,这很可能位于Program Files目录下的C:驱动器上。

#6


0  

Also useful when you have a dataset that needs to be retrieved once and used over and over in subsequent statements.

当您拥有需要一次检索并在后续语句中反复使用的数据集时,此选项也很有用。

Makes these long batch processes more readable (sometimes this is more important than performance).

使这些长批处理过程更具可读性(有时这比性能更重要)。