多个INSERT语句vs多个INSERT值

I'm running a performance comparison between using 1000 INSERT statements:

我正在对使用1000个INSERT语句的性能进行比较:

INSERT INTO T_TESTS (TestId, FirstName, LastName, Age) 
   VALUES ('6f3f7257-a3d8-4a78-b2e1-c9b767cfe1c1', 'First 0', 'Last 0', 0)
INSERT INTO T_TESTS (TestId, FirstName, LastName, Age) 
   VALUES ('32023304-2e55-4768-8e52-1ba589b82c8b', 'First 1', 'Last 1', 1)
...
INSERT INTO T_TESTS (TestId, FirstName, LastName, Age) 
   VALUES ('f34d95a7-90b1-4558-be10-6ceacd53e4c4', 'First 999', 'Last 999', 999)

..versus using single INSERT statement with 1000 values:

. .与使用具有1000个值的单INSERT语句相比:

INSERT INTO T_TESTS (TestId, FirstName, LastName, Age) 
VALUES 
('db72b358-e9b5-4101-8d11-7d7ea3a0ae7d', 'First 0', 'Last 0', 0),
('6a4874ab-b6a3-4aa4-8ed4-a167ab21dd3d', 'First 1', 'Last 1', 1),
...
('9d7f2a58-7e57-4ed4-ba54-5e9e335fb56c', 'First 999', 'Last 999', 999)

To my big surprise, the results are the opposite of what I thought:

令我惊讶的是，结果与我所想的恰恰相反:

1000 INSERT statements: 290 msec.
1000插入语句:290 msec。
1 INSERT statement with 1000 VALUES: 2800 msec.
插入语句1000个值:2800毫秒。

The test is executed directly in MSSQL Management Studio with SQL Server Profiler used for measurement (and I've got similar results running it from C# code using SqlClient, which is even more suprising considering all the DAL layers roundtrips)

测试直接在MSSQL管理Studio中执行，使用SQL Server Profiler进行测量(我得到了类似的结果，使用SqlClient从c#代码运行它，考虑到所有的DAL层的往返，这更令人吃惊)

Can this be reasonable or somehow explained? How come, a supposedly faster method results in 10 times (!) worse performance?

这是合理的还是解释的?为什么更快的方法会导致10倍的性能下降?

Thank you.

谢谢你！

EDIT: Attaching execution plans for both: 多个INSERT语句vs多个INSERT值

编辑:为两者附加执行计划:

4 个解决方案

#1

112

Addition: SQL Server 2012 shows some improved performance in this area but doesn't seem to tackle the specific issues noted below. This should apparently be fixed in the next major version after SQL Server 2012!

此外:SQL Server 2012在这方面显示了一些改进的性能，但似乎没有解决下面提到的具体问题。这显然应该在SQL Server 2012之后的下一个主要版本中修复!

Your plan shows the single inserts are using parameterised procedures (possibly auto parameterised) so parse/compile time for these should be minimal.

您的计划显示单个插入使用了参数化的过程(可能是自动参数化的)，因此解析/编译时间应该是最少的。

I thought I'd look into this a bit more though so set up a loop (script) and tried adjusting the number of VALUES clauses and recording the compile time.

我想我应该再深入研究一下这个问题，所以设置一个循环(脚本)并尝试调整值子句的数量并记录编译时间。

I then divided the compile time by the number of rows to get the average compile time per clause. The results are below

然后我将编译时间除以行数，得到每个子句的平均编译时间。下面的结果

多个INSERT语句vs多个INSERT值

Up until 250 VALUES clauses present the compile time / number of clauses has a slight upward trend but nothing too dramatic.

在250个值子句出现之前，编译时间/子句的数量有轻微的上升趋势，但没有太大的变化。

多个INSERT语句vs多个INSERT值

But then there is a sudden change.

但突然之间就发生了变化。

That section of the data is shown below.

该数据部分如下所示。

+------+----------------+-------------+---------------+---------------+
| Rows | CachedPlanSize | CompileTime | CompileMemory | Duration/Rows |
+------+----------------+-------------+---------------+---------------+
|  245 |            528 |          41 |          2400 | 0.167346939   |
|  246 |            528 |          40 |          2416 | 0.162601626   |
|  247 |            528 |          38 |          2416 | 0.153846154   |
|  248 |            528 |          39 |          2432 | 0.157258065   |
|  249 |            528 |          39 |          2432 | 0.156626506   |
|  250 |            528 |          40 |          2448 | 0.16          |
|  251 |            400 |         273 |          3488 | 1.087649402   |
|  252 |            400 |         274 |          3496 | 1.087301587   |
|  253 |            400 |         282 |          3520 | 1.114624506   |
|  254 |            408 |         279 |          3544 | 1.098425197   |
|  255 |            408 |         290 |          3552 | 1.137254902   |
+------+----------------+-------------+---------------+---------------+

The cached plan size which had been growing linearly suddenly drops but CompileTime increases 7 fold and CompileMemory shoots up. This is the cut off point between the plan being an auto parametrized one (with 1,000 parameters) to a non parametrized one. Thereafter it seems to get linearly less efficient (in terms of number of value clauses processed in a given time).

高速增长的缓存计划突然间减少了，但是编译时间增加了7倍，编译过程也加快了。这是计划是一个自动参数化的(有1000个参数)和一个非参数化的计划之间的截止点。之后，它的效率似乎线性地降低了(根据在给定时间内处理的值子句的数量)。

Not sure why this should be. Presumably when it is compiling a plan for specific literal values it must perform some activity that does not scale linearly (such as sorting).

不知道为什么会这样。当它为特定的文字值编译计划时，它必须执行一些不能线性伸缩的活动(例如排序)。

It doesn't seem to affect the size of the cached query plan when I tried a query consisting entirely of duplicate rows and neither affects the order of the output of the table of the constants (and as you are inserting into a heap time spent sorting would be pointless anyway even if it did).

它似乎并不影响大小的缓存的查询计划当我尝试查询组成完全重复的行,既不影响输出的顺序表的常量(当你插入一个堆排序的时间无论如何是没有意义的,即使它)。

Moreover if a clustered index is added to the table the plan still shows an explicit sort step so it doesn't seem to be sorting at compile time to avoid a sort at run time.

此外，如果将一个聚集索引添加到表中，该计划仍然会显示一个显式的排序步骤，这样在编译时就不会进行排序，以避免在运行时进行排序。

多个INSERT语句vs多个INSERT值

I tried to look at this in a debugger but the public symbols for my version of SQL Server 2008 don't seem to be available so instead I had to look at the equivalent UNION ALL construction in SQL Server 2005.

我试图在调试器中查看它，但是我的SQL Server 2008版本的公共符号似乎不可用，所以我必须查看SQL Server 2005中等效的UNION ALL构造。

A typical stack trace is below

下面是一个典型的堆栈跟踪

sqlservr.exe!FastDBCSToUnicode()  + 0xac bytes  
sqlservr.exe!nls_sqlhilo()  + 0x35 bytes    
sqlservr.exe!CXVariant::CmpCompareStr()  + 0x2b bytes   
sqlservr.exe!CXVariantPerformCompare<167,167>::Compare()  + 0x18 bytes  
sqlservr.exe!CXVariant::CmpCompare()  + 0x11f67d bytes  
sqlservr.exe!CConstraintItvl::PcnstrItvlUnion()  + 0xe2 bytes   
sqlservr.exe!CConstraintProp::PcnstrUnion()  + 0x35e bytes  
sqlservr.exe!CLogOp_BaseSetOp::PcnstrDerive()  + 0x11a bytes    
sqlservr.exe!CLogOpArg::PcnstrDeriveHandler()  + 0x18f bytes    
sqlservr.exe!CLogOpArg::DeriveGroupProperties()  + 0xa9 bytes   
sqlservr.exe!COpArg::DeriveNormalizedGroupProperties()  + 0x40 bytes    
sqlservr.exe!COptExpr::DeriveGroupProperties()  + 0x18a bytes   
sqlservr.exe!COptExpr::DeriveGroupProperties()  + 0x146 bytes   
sqlservr.exe!COptExpr::DeriveGroupProperties()  + 0x146 bytes   
sqlservr.exe!COptExpr::DeriveGroupProperties()  + 0x146 bytes   
sqlservr.exe!CQuery::PqoBuild()  + 0x3cb bytes  
sqlservr.exe!CStmtQuery::InitQuery()  + 0x167 bytes 
sqlservr.exe!CStmtDML::InitNormal()  + 0xf0 bytes   
sqlservr.exe!CStmtDML::Init()  + 0x1b bytes 
sqlservr.exe!CCompPlan::FCompileStep()  + 0x176 bytes   
sqlservr.exe!CSQLSource::FCompile()  + 0x741 bytes  
sqlservr.exe!CSQLSource::FCompWrapper()  + 0x922be bytes    
sqlservr.exe!CSQLSource::Transform()  + 0x120431 bytes  
sqlservr.exe!CSQLSource::Compile()  + 0x2ff bytes

So going off the names in the stack trace it appears to spend a lot of time comparing strings.

因此，在堆栈跟踪中去掉名称，似乎要花费大量时间来比较字符串。

This KB article indicates that DeriveNormalizedGroupProperties is associated with what used to be called the normalization stage of query processing

这个KB的文章指出，DeriveNormalizedGroupProperties与以前被称为查询处理的规范化阶段有关。

This stage is now called binding or algebrizing and it takes the expression parse tree output from the previous parse stage and outputs an algebrized expression tree (query processor tree) to go forward to optimization (trivial plan optimization in this case) [ref].

这个阶段现在被称为绑定或代数化，它使用前一个解析阶段的表达式解析树输出并输出一个代数化的表达式树(查询处理器树)来进行优化(本例中的平凡计划优化)[ref]。

I tried one more experiment (Script) which was to re-run the original test but looking at three different cases.

我尝试了另一个实验(脚本)，它是重新运行原始测试，但是查看了三个不同的情况。

First Name and Last Name Strings of length 10 characters with no duplicates.
长度为10个字符的第一个名称和姓氏字符串，没有副本。
First Name and Last Name Strings of length 50 characters with no duplicates.
名称和姓氏字符串长度为50个字符，没有重复。
First Name and Last Name Strings of length 10 characters with all duplicates.
姓名和姓氏字符串长度为10个字符，所有字符都是重复的。

多个INSERT语句vs多个INSERT值

It can clearly be seen that the longer the strings the worse things get and that conversely the more duplicates the better things get. As previously mentioned duplicates don't affect the cached plan size so I presume that there must be a process of duplicate identification when constructing the algebrized expression tree itself.

可以清楚地看到，字符串越长，事情就越糟糕反过来，重复越多事情就越好。如前所述，复制不会影响缓存的计划大小，因此我假设在构建代数化表达式树时必须有一个重复标识过程。

Edit

编辑

One place where this information is leveraged is shown by @Lieven here

@Lieven在这里显示了利用这些信息的一个地方

SELECT * 
FROM (VALUES ('Lieven1', 1),
             ('Lieven2', 2),
             ('Lieven3', 3))Test (name, ID)
ORDER BY name, 1/ (ID - ID)

Because at compile time it can determine that the Name column has no duplicates it skips ordering by the secondary 1/ (ID - ID) expression at run time (the sort in the plan only has one ORDER BY column) and no divide by zero error is raised. If duplicates are added to the table then the sort operator shows two order by columns and the expected error is raised.

因为在编译时，它可以确定Name列没有重复，它在运行时(在计划中只有一个ORDER by列)中跳过了二级1/ (ID - ID)表达式，并且没有出现零错误。如果将重复项添加到表中，那么排序操作符将按列显示两个顺序，并引发预期的错误。

#2

It is not too surprising: the execution plan for the tiny insert is computed once, and then reused 1000 times. Parsing and preparing the plan is quick, because it has only four values to del with. A 1000-row plan, on the other hand, needs to deal with 4000 values (or 4000 parameters if you parameterized your C# tests). This could easily eat up the time savings you gain by eliminating 999 roundtrips to SQL Server, especially if your network is not overly slow.

这并不奇怪:对小插入执行计划计算一次，然后重用1000次。解析和准备计划是很快的，因为它只有4个值要替换。另一方面，1000行计划需要处理4000个值(如果您参数化c#测试，则需要处理4000个参数)。这可以很容易地消耗掉您通过取消999次到SQL Server的往返所节省的时间，特别是如果您的网络不是太慢的话。

#3

The issue probably has to do with the time it takes to compile the query.

这个问题可能与编译查询所需的时间有关。

If you want to speed up the inserts, what you really need to do is wrap them in a transaction:

如果你想要加速插入，你真正需要做的是将它们打包到事务中:

BEGIN TRAN;
INSERT INTO T_TESTS (TestId, FirstName, LastName, Age) 
   VALUES ('6f3f7257-a3d8-4a78-b2e1-c9b767cfe1c1', 'First 0', 'Last 0', 0);
INSERT INTO T_TESTS (TestId, FirstName, LastName, Age) 
   VALUES ('32023304-2e55-4768-8e52-1ba589b82c8b', 'First 1', 'Last 1', 1);
...
INSERT INTO T_TESTS (TestId, FirstName, LastName, Age) 
   VALUES ('f34d95a7-90b1-4558-be10-6ceacd53e4c4', 'First 999', 'Last 999', 999);
COMMIT TRAN;

From C#, you might also consider using a table valued parameter. Issuing multiple commands in a single batch, by separating them with semicolons, is another approach that will also help.

从c#中，您还可以考虑使用表值参数。通过使用分号分隔多个命令，在一个批处理中发出多个命令，这也是另一种有用的方法。

#4

I ran into a similar situation trying to convert a table with several 100k rows with a C++ program (MFC/ODBC).

我遇到了类似的情况，试图用c++程序(MFC/ODBC)转换具有多个100k行的表。

Since this operation took a very long time, I figured bundling multiple inserts into one (up to 1000 due to MSSQL limitations). My guess that a lot of single insert statements would create an overhead similar to what is described here.

由于这个操作花费了很长时间，我认为将多个插入绑定到一个(由于MSSQL的限制，最多可以达到1000个)。我猜想，许多单insert语句会产生与这里描述的类似的开销。

However, it turns out that the conversion took actually quite a bit longer:

然而，事实证明，转换过程实际上花费了相当长的时间:

        Method 1       Method 2     Method 3 
        Single Insert  Multi Insert Joined Inserts
Rows    1000           1000         1000
Insert  390 ms         765 ms       270 ms
per Row 0.390 ms       0.765 ms     0.27 ms

So, 1000 single calls to CDatabase::ExecuteSql each with a single INSERT statement (method 1) are roughly twice as fast as a single call to CDatabase::ExecuteSql with a multi-line INSERT statement with 1000 value tuples (method 2).

因此，对CDatabase::ExecuteSql的单个调用(方法1)的速度大约是对CDatabase::ExecuteSql的单个调用(带有1000个值元组的多行插入语句)的两倍(方法2)。

Update: So, the next thing I tried was to bundle 1000 separate INSERT statements into a single string and have the server execute that (method 3). It turns out this is even a bit faster than method 1.

更新:因此，接下来我尝试将1000个独立的INSERT语句捆绑到一个字符串中，并让服务器执行这个(方法3)。

Edit: I am using Microsoft SQL Server Express Edition (64-bit) v10.0.2531.0

编辑:我使用的是Microsoft SQL Server Express Edition(64位)v10.0.2531.0。

#1

112

Addition: SQL Server 2012 shows some improved performance in this area but doesn't seem to tackle the specific issues noted below. This should apparently be fixed in the next major version after SQL Server 2012!

此外:SQL Server 2012在这方面显示了一些改进的性能，但似乎没有解决下面提到的具体问题。这显然应该在SQL Server 2012之后的下一个主要版本中修复!

Your plan shows the single inserts are using parameterised procedures (possibly auto parameterised) so parse/compile time for these should be minimal.

您的计划显示单个插入使用了参数化的过程(可能是自动参数化的)，因此解析/编译时间应该是最少的。

I thought I'd look into this a bit more though so set up a loop (script) and tried adjusting the number of VALUES clauses and recording the compile time.

我想我应该再深入研究一下这个问题，所以设置一个循环(脚本)并尝试调整值子句的数量并记录编译时间。

I then divided the compile time by the number of rows to get the average compile time per clause. The results are below

然后我将编译时间除以行数，得到每个子句的平均编译时间。下面的结果

多个INSERT语句vs多个INSERT值

Up until 250 VALUES clauses present the compile time / number of clauses has a slight upward trend but nothing too dramatic.

在250个值子句出现之前，编译时间/子句的数量有轻微的上升趋势，但没有太大的变化。

多个INSERT语句vs多个INSERT值

But then there is a sudden change.

但突然之间就发生了变化。

That section of the data is shown below.

该数据部分如下所示。

+------+----------------+-------------+---------------+---------------+
| Rows | CachedPlanSize | CompileTime | CompileMemory | Duration/Rows |
+------+----------------+-------------+---------------+---------------+
|  245 |            528 |          41 |          2400 | 0.167346939   |
|  246 |            528 |          40 |          2416 | 0.162601626   |
|  247 |            528 |          38 |          2416 | 0.153846154   |
|  248 |            528 |          39 |          2432 | 0.157258065   |
|  249 |            528 |          39 |          2432 | 0.156626506   |
|  250 |            528 |          40 |          2448 | 0.16          |
|  251 |            400 |         273 |          3488 | 1.087649402   |
|  252 |            400 |         274 |          3496 | 1.087301587   |
|  253 |            400 |         282 |          3520 | 1.114624506   |
|  254 |            408 |         279 |          3544 | 1.098425197   |
|  255 |            408 |         290 |          3552 | 1.137254902   |
+------+----------------+-------------+---------------+---------------+

Not sure why this should be. Presumably when it is compiling a plan for specific literal values it must perform some activity that does not scale linearly (such as sorting).

不知道为什么会这样。当它为特定的文字值编译计划时，它必须执行一些不能线性伸缩的活动(例如排序)。

Moreover if a clustered index is added to the table the plan still shows an explicit sort step so it doesn't seem to be sorting at compile time to avoid a sort at run time.

此外，如果将一个聚集索引添加到表中，该计划仍然会显示一个显式的排序步骤，这样在编译时就不会进行排序，以避免在运行时进行排序。

多个INSERT语句vs多个INSERT值

我试图在调试器中查看它，但是我的SQL Server 2008版本的公共符号似乎不可用，所以我必须查看SQL Server 2005中等效的UNION ALL构造。

A typical stack trace is below

下面是一个典型的堆栈跟踪

sqlservr.exe!FastDBCSToUnicode()  + 0xac bytes  
sqlservr.exe!nls_sqlhilo()  + 0x35 bytes    
sqlservr.exe!CXVariant::CmpCompareStr()  + 0x2b bytes   
sqlservr.exe!CXVariantPerformCompare<167,167>::Compare()  + 0x18 bytes  
sqlservr.exe!CXVariant::CmpCompare()  + 0x11f67d bytes  
sqlservr.exe!CConstraintItvl::PcnstrItvlUnion()  + 0xe2 bytes   
sqlservr.exe!CConstraintProp::PcnstrUnion()  + 0x35e bytes  
sqlservr.exe!CLogOp_BaseSetOp::PcnstrDerive()  + 0x11a bytes    
sqlservr.exe!CLogOpArg::PcnstrDeriveHandler()  + 0x18f bytes    
sqlservr.exe!CLogOpArg::DeriveGroupProperties()  + 0xa9 bytes   
sqlservr.exe!COpArg::DeriveNormalizedGroupProperties()  + 0x40 bytes    
sqlservr.exe!COptExpr::DeriveGroupProperties()  + 0x18a bytes   
sqlservr.exe!COptExpr::DeriveGroupProperties()  + 0x146 bytes   
sqlservr.exe!COptExpr::DeriveGroupProperties()  + 0x146 bytes   
sqlservr.exe!COptExpr::DeriveGroupProperties()  + 0x146 bytes   
sqlservr.exe!CQuery::PqoBuild()  + 0x3cb bytes  
sqlservr.exe!CStmtQuery::InitQuery()  + 0x167 bytes 
sqlservr.exe!CStmtDML::InitNormal()  + 0xf0 bytes   
sqlservr.exe!CStmtDML::Init()  + 0x1b bytes 
sqlservr.exe!CCompPlan::FCompileStep()  + 0x176 bytes   
sqlservr.exe!CSQLSource::FCompile()  + 0x741 bytes  
sqlservr.exe!CSQLSource::FCompWrapper()  + 0x922be bytes    
sqlservr.exe!CSQLSource::Transform()  + 0x120431 bytes  
sqlservr.exe!CSQLSource::Compile()  + 0x2ff bytes

So going off the names in the stack trace it appears to spend a lot of time comparing strings.

因此，在堆栈跟踪中去掉名称，似乎要花费大量时间来比较字符串。

This KB article indicates that DeriveNormalizedGroupProperties is associated with what used to be called the normalization stage of query processing

这个KB的文章指出，DeriveNormalizedGroupProperties与以前被称为查询处理的规范化阶段有关。

I tried one more experiment (Script) which was to re-run the original test but looking at three different cases.

我尝试了另一个实验(脚本)，它是重新运行原始测试，但是查看了三个不同的情况。

First Name and Last Name Strings of length 10 characters with no duplicates.
长度为10个字符的第一个名称和姓氏字符串，没有副本。
First Name and Last Name Strings of length 50 characters with no duplicates.
名称和姓氏字符串长度为50个字符，没有重复。
First Name and Last Name Strings of length 10 characters with all duplicates.
姓名和姓氏字符串长度为10个字符，所有字符都是重复的。

多个INSERT语句vs多个INSERT值

Edit

编辑

One place where this information is leveraged is shown by @Lieven here

@Lieven在这里显示了利用这些信息的一个地方

SELECT * 
FROM (VALUES ('Lieven1', 1),
             ('Lieven2', 2),
             ('Lieven3', 3))Test (name, ID)
ORDER BY name, 1/ (ID - ID)

#2

#3

The issue probably has to do with the time it takes to compile the query.

这个问题可能与编译查询所需的时间有关。

If you want to speed up the inserts, what you really need to do is wrap them in a transaction:

如果你想要加速插入，你真正需要做的是将它们打包到事务中:

BEGIN TRAN;
INSERT INTO T_TESTS (TestId, FirstName, LastName, Age) 
   VALUES ('6f3f7257-a3d8-4a78-b2e1-c9b767cfe1c1', 'First 0', 'Last 0', 0);
INSERT INTO T_TESTS (TestId, FirstName, LastName, Age) 
   VALUES ('32023304-2e55-4768-8e52-1ba589b82c8b', 'First 1', 'Last 1', 1);
...
INSERT INTO T_TESTS (TestId, FirstName, LastName, Age) 
   VALUES ('f34d95a7-90b1-4558-be10-6ceacd53e4c4', 'First 999', 'Last 999', 999);
COMMIT TRAN;

From C#, you might also consider using a table valued parameter. Issuing multiple commands in a single batch, by separating them with semicolons, is another approach that will also help.

从c#中，您还可以考虑使用表值参数。通过使用分号分隔多个命令，在一个批处理中发出多个命令，这也是另一种有用的方法。

#4

I ran into a similar situation trying to convert a table with several 100k rows with a C++ program (MFC/ODBC).

我遇到了类似的情况，试图用c++程序(MFC/ODBC)转换具有多个100k行的表。

However, it turns out that the conversion took actually quite a bit longer:

然而，事实证明，转换过程实际上花费了相当长的时间:

        Method 1       Method 2     Method 3 
        Single Insert  Multi Insert Joined Inserts
Rows    1000           1000         1000
Insert  390 ms         765 ms       270 ms
per Row 0.390 ms       0.765 ms     0.27 ms

因此，对CDatabase::ExecuteSql的单个调用(方法1)的速度大约是对CDatabase::ExecuteSql的单个调用(带有1000个值元组的多行插入语句)的两倍(方法2)。

更新:因此，接下来我尝试将1000个独立的INSERT语句捆绑到一个字符串中，并让服务器执行这个(方法3)。

Edit: I am using Microsoft SQL Server Express Edition (64-bit) v10.0.2531.0

编辑:我使用的是Microsoft SQL Server Express Edition(64位)v10.0.2531.0。

秒客网

多个INSERT语句vs多个INSERT值

4 个解决方案

#1

#2

#3

#4

#1

#2

#3

#4

相关文章