从JAVA插入SQL Server时,我可以获得“BULK INSERT”般的速度吗?

时间:2022-02-11 07:17:23

During my quest to find the fastest method to get data from JAVA to SQL Server, I have noticed that the fastest JAVA-method I can come up with, is still 12 times slower than using BULK INSERT.

在我寻找从JAVA到SQL Server获取数据的最快方法的过程中,我注意到我能想到的最快的JAVA方法,仍然比使用BULK INSERT慢12倍。

My data is being generated from within JAVA, and BULK INSERT only supports reading data from a text file, so using BULK INSERT is not an option unless I output my data to a temporary text file. This in turn, would of course be a huge performance hit.

我的数据是从JAVA中生成的,BULK INSERT只支持从文本文件中读取数据,因此除非我将数据输出到临时文本文件,否则不能使用BULK INSERT。反过来,这当然会受到巨大的性能影响。

When inserting from JAVA, insert speeds are around 2500 rows per second. Even when I measure the time after the for loop, and just before the executeBatch. So "creating" the data in-memory is not the bottleneck.

从JAVA插入时,插入速度约为每秒2500行。即使我在for循环之后和executeBatch之前测量时间。因此,“创建”内存中的数据不是瓶颈。

When inserting with BATCH INSERT, insert speeds are around 30000 rows per second.

使用BATCH INSERT插入时,插入速度约为每秒30000行。

Both tests have been done on the server. So network is also not a bottleneck. Any clue as to why BATCH INSERT is faster? And, if the same performance can be attained from within JAVA?

两个测试都在服务器上完成。所以网络也不是瓶颈。有关为什么BATCH INSERT更快的任何线索?而且,如果在JAVA内可以获得相同的性能?

This is just a big dataset that needs to get loaded once. So it would be OK to temporary disable any kind of logging (already tried simple logging), disable indexes (table has none), locking, whatever, ...

这只是一个需要加载一次的大数据集。因此可以临时禁用任何类型的日志记录(已经尝试过简单的日志记录),禁用索引(表没有),锁定,等等......

My test-setup so far

到目前为止我的测试设置

Database:

CREATE TABLE TestTable   
   (  Col1 varchar(50)
    , Col2 int);  

JAVA:

// This seems to be essential to get good speeds, otherwise batching is not used.
conn.setAutoCommit(false);

PreparedStatement prepStmt = conn.prepareStatement("INSERT INTO TestTable (Col1, Col2) VALUES (?, ?)");
for (int i = 1; i <= 10000; i++) {
    prepStmt.setString(1,"X");            
    prepStmt.setInt(2,100);
    prepStmt.addBatch();
}
prepStmt.executeBatch();
conn.commit();

BULK INSERT:

// A text file containing "X 100" over and over again... so the same data as generated in JAVA
bulk insert TestTable FROM 'c:\test\test.txt';

2 个解决方案

#1


4  

While BULK INSERT is the fastest way of doing bulk insert, SQL Server supports remote (client-driven) bulk insert operations both through the native driver and ODBC. From version 4.2 onwards of the JDBC driver, this functionality is exposed through the SQLServerBulkCopy class, which does not directly read from files but does support reading from a RowSet, ResultSet or a custom implementation of ISQLServerBulkRecord for generated data. This functionality is equivalent to the .NET SqlBulkCopy class, with largely the same interface, and should be the fastest way of performing bulk operations short of a server-based BULK INSERT.

虽然BULK INSERT是进行批量插入的最快方法,但SQL Server通过本机驱动程序和ODBC支持远程(客户端驱动)批量插入操作。从JDBC驱动程序的4.2版本开始,此功能通过SQLServerBulkCopy类公开,该类不直接从文件读取,但支持从RowSet,ResultSet或ISQLServerBulkRecord的自定义实现读取生成的数据。此功能等同于.NET SqlBulkCopy类,具有大致相同的接口,并且应该是执行批量操作的最快方式,而不是基于服务器的BULK INSERT。

EDIT: Example by OP

编辑:OP的例子

Below you can find an example use-case that could be used to test the performance of SQLServerBulkCSVFileRecord, a method similar to SQLServerBulkCopy except that it reads from a text file. In my test case, test.txt contained a million rows with "X tab 100"

下面是一个示例用例,可用于测试SQLServerBulkCSVVileRecord的性能,这是一种类似于SQLServerBulkCopy的方法,除了它从文本文件中读取。在我的测试用例中,test.txt包含一百万行“X tab 100”

CREATE TABLE TestTable (Col1 varchar(50), Col2 int);

The table should not have any indexes enabled.

该表不应启用任何索引。

In JAVA

// Make sure to use version 4.2, as SQLServerBulkCSVFileRecord is not included in version 4.1
import com.microsoft.sqlserver.jdbc.*;

long startTime = System.currentTimeMillis();
SQLServerBulkCSVFileRecord fileRecord = null;  

fileRecord = new SQLServerBulkCSVFileRecord("C:\\temp\\test.txt", true);   
fileRecord.addColumnMetadata(1, null, java.sql.Types.NVARCHAR, 50, 0);  
fileRecord.addColumnMetadata(2, null, java.sql.Types.INTEGER, 0, 0);  
Class.forName("com.microsoft.sqlserver.jdbc.SQLServerDriver");  
Connection destinationConnection = DriverManager.getConnection("jdbc:sqlserver://Server\\\\Instance:1433", "user", "pass");
SQLServerBulkCopyOptions copyOptions = new SQLServerBulkCopyOptions();  

// Depending on the size of the data being uploaded, and the amount of RAM, an optimum can be found here. Play around with this to improve performance.
copyOptions.setBatchSize(300000); 

// This is crucial to get good performance
copyOptions.setTableLock(true);  

SQLServerBulkCopy bulkCopy =  new SQLServerBulkCopy(destinationConnection);
bulkCopy.setBulkCopyOptions(copyOptions);  
bulkCopy.setDestinationTableName("TestTable");
bulkCopy.writeToServer(fileRecord);

long endTime   = System.currentTimeMillis();
long totalTime = endTime - startTime;
System.out.println(totalTime + "ms");

Using this example, I was able to get insert speeds of up to 30000 rows per second.

使用这个例子,我能够获得高达每秒30000行的插入速度。

#2


0  

Below is the fastest method I could find that does not use SQLServerBulkCopy. It is a lot slower than SQLServerBulkCopy though. Instead of 30000 rows per second it inserts at 2500 rows per second. For a lot of use-cases, this might still be interesting. The main things to keep in mind is to set AutoCommit to false, use large batches use PreparedStatements, and disable any indexes.

下面是我能找到的最快的方法,它不使用SQLServerBulkCopy。它比SQLServerBulkCopy慢很多。它不是每秒30000行,而是以每秒2500行的速度插入。对于很多用例,这可能仍然很有趣。要记住的主要事项是将AutoCommit设置为false,使用大批量使用PreparedStatements,并禁用任何索引。

Connection db_connection = DriverManager.getConnection("jdbc:sqlserver://Server\\\\Instance:1433", "User", "Pass");

// This is crucial to getting good performance
db_connection.setAutoCommit(false);

PreparedStatement prepStmt = db_connection.prepareStatement("INSERT INTO TestTable (Col1, Col2) VALUES (?, ?)");
for (int i = 1; i <= 10000; i++) {
    prepStmt.setString(1,"X");            
    prepStmt.setInt(2,100);
    prepStmt.addBatch();
}
prepStmt.executeBatch();
db_connection.commit();

#1


4  

While BULK INSERT is the fastest way of doing bulk insert, SQL Server supports remote (client-driven) bulk insert operations both through the native driver and ODBC. From version 4.2 onwards of the JDBC driver, this functionality is exposed through the SQLServerBulkCopy class, which does not directly read from files but does support reading from a RowSet, ResultSet or a custom implementation of ISQLServerBulkRecord for generated data. This functionality is equivalent to the .NET SqlBulkCopy class, with largely the same interface, and should be the fastest way of performing bulk operations short of a server-based BULK INSERT.

虽然BULK INSERT是进行批量插入的最快方法,但SQL Server通过本机驱动程序和ODBC支持远程(客户端驱动)批量插入操作。从JDBC驱动程序的4.2版本开始,此功能通过SQLServerBulkCopy类公开,该类不直接从文件读取,但支持从RowSet,ResultSet或ISQLServerBulkRecord的自定义实现读取生成的数据。此功能等同于.NET SqlBulkCopy类,具有大致相同的接口,并且应该是执行批量操作的最快方式,而不是基于服务器的BULK INSERT。

EDIT: Example by OP

编辑:OP的例子

Below you can find an example use-case that could be used to test the performance of SQLServerBulkCSVFileRecord, a method similar to SQLServerBulkCopy except that it reads from a text file. In my test case, test.txt contained a million rows with "X tab 100"

下面是一个示例用例,可用于测试SQLServerBulkCSVVileRecord的性能,这是一种类似于SQLServerBulkCopy的方法,除了它从文本文件中读取。在我的测试用例中,test.txt包含一百万行“X tab 100”

CREATE TABLE TestTable (Col1 varchar(50), Col2 int);

The table should not have any indexes enabled.

该表不应启用任何索引。

In JAVA

// Make sure to use version 4.2, as SQLServerBulkCSVFileRecord is not included in version 4.1
import com.microsoft.sqlserver.jdbc.*;

long startTime = System.currentTimeMillis();
SQLServerBulkCSVFileRecord fileRecord = null;  

fileRecord = new SQLServerBulkCSVFileRecord("C:\\temp\\test.txt", true);   
fileRecord.addColumnMetadata(1, null, java.sql.Types.NVARCHAR, 50, 0);  
fileRecord.addColumnMetadata(2, null, java.sql.Types.INTEGER, 0, 0);  
Class.forName("com.microsoft.sqlserver.jdbc.SQLServerDriver");  
Connection destinationConnection = DriverManager.getConnection("jdbc:sqlserver://Server\\\\Instance:1433", "user", "pass");
SQLServerBulkCopyOptions copyOptions = new SQLServerBulkCopyOptions();  

// Depending on the size of the data being uploaded, and the amount of RAM, an optimum can be found here. Play around with this to improve performance.
copyOptions.setBatchSize(300000); 

// This is crucial to get good performance
copyOptions.setTableLock(true);  

SQLServerBulkCopy bulkCopy =  new SQLServerBulkCopy(destinationConnection);
bulkCopy.setBulkCopyOptions(copyOptions);  
bulkCopy.setDestinationTableName("TestTable");
bulkCopy.writeToServer(fileRecord);

long endTime   = System.currentTimeMillis();
long totalTime = endTime - startTime;
System.out.println(totalTime + "ms");

Using this example, I was able to get insert speeds of up to 30000 rows per second.

使用这个例子,我能够获得高达每秒30000行的插入速度。

#2


0  

Below is the fastest method I could find that does not use SQLServerBulkCopy. It is a lot slower than SQLServerBulkCopy though. Instead of 30000 rows per second it inserts at 2500 rows per second. For a lot of use-cases, this might still be interesting. The main things to keep in mind is to set AutoCommit to false, use large batches use PreparedStatements, and disable any indexes.

下面是我能找到的最快的方法,它不使用SQLServerBulkCopy。它比SQLServerBulkCopy慢很多。它不是每秒30000行,而是以每秒2500行的速度插入。对于很多用例,这可能仍然很有趣。要记住的主要事项是将AutoCommit设置为false,使用大批量使用PreparedStatements,并禁用任何索引。

Connection db_connection = DriverManager.getConnection("jdbc:sqlserver://Server\\\\Instance:1433", "User", "Pass");

// This is crucial to getting good performance
db_connection.setAutoCommit(false);

PreparedStatement prepStmt = db_connection.prepareStatement("INSERT INTO TestTable (Col1, Col2) VALUES (?, ?)");
for (int i = 1; i <= 10000; i++) {
    prepStmt.setString(1,"X");            
    prepStmt.setInt(2,100);
    prepStmt.addBatch();
}
prepStmt.executeBatch();
db_connection.commit();