向Oracle SQL数据库加载大量数据

时间:2022-09-20 13:06:08

I was wondering if anyone had any experience with what I am about to embark on. I have several csv files which are all around a GB or so in size and I need to load them into a an oracle database. While most of my work after loading will be read-only I will need to load updates from time to time. Basically I just need a good tool for loading several rows of data at a time up to my db.

我想知道有没有人有过我将要从事的工作的经验。我有几个csv文件,大小大约在GB左右,我需要将它们加载到oracle数据库中。虽然我加载后的大部分工作都是只读的,但我需要不时地加载更新。基本上,我只需要一个好的工具来一次加载几行数据,直到我的db。

Here is what I have found so far:

以下是我到目前为止所发现的:

  1. I could use SQL Loader t do a lot of the work

    我可以使用SQL Loader做很多工作。

  2. I could use Bulk-Insert commands

    我可以使用大容量插入命令

  3. Some sort of batch insert.

    某种批次插入。

Using prepared statement somehow might be a good idea. I guess I was wondering what everyone thinks is the fastest way to get this insert done. Any tips?

用事先准备好的陈述可能是个好主意。我想我想知道大家认为什么是完成插入的最快方式。任何建议吗?

3 个解决方案

#1


5  

I would be very surprised if you could roll your own utility that will outperform SQL*Loader Direct Path Loads. Oracle built this utility for exactly this purpose - the likelihood of building something more efficient is practically nil. There is also the Parallel Direct Path Load, which allows you to have multiple direct path load processes running concurrently.

如果您能够滚动自己的实用程序,使其优于SQL*Loader直接路径加载,我将非常吃惊。Oracle正是为了这个目的而构建这个工具的——构建更高效的工具的可能性实际上为零。还有并行直接路径负载,允许同时运行多个直接路径负载进程。

From the manual:

从手册:

Instead of filling a bind array buffer and passing it to the Oracle database with a SQL INSERT statement, a direct path load uses the direct path API to pass the data to be loaded to the load engine in the server. The load engine builds a column array structure from the data passed to it.

与使用SQL INSERT语句填充绑定数组缓冲区并将其传递给Oracle数据库不同,direct path load使用direct path API将要加载的数据传递给服务器中的load引擎。加载引擎从传递给它的数据构建列数组结构。

The direct path load engine uses the column array structure to format Oracle data blocks and build index keys. The newly formatted database blocks are written directly to the database (multiple blocks per I/O request using asynchronous writes if the host platform supports asynchronous I/O).

直接路径加载引擎使用列数组结构来格式化Oracle数据块并构建索引键。新格式化的数据库块直接写入数据库(如果主机平台支持异步I/O,则使用异步写对每个I/O请求执行多个块)。

Internally, multiple buffers are used for the formatted blocks. While one buffer is being filled, one or more buffers are being written if asynchronous I/O is available on the host platform. Overlapping computation with I/O increases load performance.

在内部,格式化块使用多个缓冲区。当一个缓冲区被填充时,如果在主机平台上可以使用异步I/O,就会写入一个或多个缓冲区。与I/O重叠的计算增加了负载性能。

There are cases where Direct Path Load cannot be used.

有些情况下无法使用直接路径负载。

#2


0  

With that amount of data, you'd better be sure of your backing store - the dbf disks' free space.

有了这么多的数据,您最好确定您的备份存储——dbf磁盘的空闲空间。

sqlldr is script drive, very efficient, generally more efficient than a sql script. The only thing I wonder about is the magnitude of the data. I personally would consider several to many sqlldr processes and assign each one a subset of data and let the processes run in parallel.

sqlldr是脚本驱动,非常高效,通常比sql脚本更高效。我唯一想知道的是数据的大小。我个人会考虑几个到许多sqlldr进程,并为每个进程分配一个数据子集,让这些进程并行运行。

You said you wanted to load a few records at a time? That may take a lot longer than you think. Did you mean a few files at a time?

你说你想一次载入几张唱片?这可能比你想象的要长得多。你的意思是一次几个文件?

#3


0  

You may be able to create an external table on the CSV files and load them in by SELECTing from the external table into another table. Whether this method will be quicker not sure however might be quicker in terms of messing around getting sql*loader to work especially when you have a criteria for UPDATEs.

您可以在CSV文件上创建一个外部表,并通过从外部表选择到另一个表来加载它们。这个方法是否会更快,但不确定是否会更快,特别是当您有一个更新标准时。

#1


5  

I would be very surprised if you could roll your own utility that will outperform SQL*Loader Direct Path Loads. Oracle built this utility for exactly this purpose - the likelihood of building something more efficient is practically nil. There is also the Parallel Direct Path Load, which allows you to have multiple direct path load processes running concurrently.

如果您能够滚动自己的实用程序,使其优于SQL*Loader直接路径加载,我将非常吃惊。Oracle正是为了这个目的而构建这个工具的——构建更高效的工具的可能性实际上为零。还有并行直接路径负载,允许同时运行多个直接路径负载进程。

From the manual:

从手册:

Instead of filling a bind array buffer and passing it to the Oracle database with a SQL INSERT statement, a direct path load uses the direct path API to pass the data to be loaded to the load engine in the server. The load engine builds a column array structure from the data passed to it.

与使用SQL INSERT语句填充绑定数组缓冲区并将其传递给Oracle数据库不同,direct path load使用direct path API将要加载的数据传递给服务器中的load引擎。加载引擎从传递给它的数据构建列数组结构。

The direct path load engine uses the column array structure to format Oracle data blocks and build index keys. The newly formatted database blocks are written directly to the database (multiple blocks per I/O request using asynchronous writes if the host platform supports asynchronous I/O).

直接路径加载引擎使用列数组结构来格式化Oracle数据块并构建索引键。新格式化的数据库块直接写入数据库(如果主机平台支持异步I/O,则使用异步写对每个I/O请求执行多个块)。

Internally, multiple buffers are used for the formatted blocks. While one buffer is being filled, one or more buffers are being written if asynchronous I/O is available on the host platform. Overlapping computation with I/O increases load performance.

在内部,格式化块使用多个缓冲区。当一个缓冲区被填充时,如果在主机平台上可以使用异步I/O,就会写入一个或多个缓冲区。与I/O重叠的计算增加了负载性能。

There are cases where Direct Path Load cannot be used.

有些情况下无法使用直接路径负载。

#2


0  

With that amount of data, you'd better be sure of your backing store - the dbf disks' free space.

有了这么多的数据,您最好确定您的备份存储——dbf磁盘的空闲空间。

sqlldr is script drive, very efficient, generally more efficient than a sql script. The only thing I wonder about is the magnitude of the data. I personally would consider several to many sqlldr processes and assign each one a subset of data and let the processes run in parallel.

sqlldr是脚本驱动,非常高效,通常比sql脚本更高效。我唯一想知道的是数据的大小。我个人会考虑几个到许多sqlldr进程,并为每个进程分配一个数据子集,让这些进程并行运行。

You said you wanted to load a few records at a time? That may take a lot longer than you think. Did you mean a few files at a time?

你说你想一次载入几张唱片?这可能比你想象的要长得多。你的意思是一次几个文件?

#3


0  

You may be able to create an external table on the CSV files and load them in by SELECTing from the external table into another table. Whether this method will be quicker not sure however might be quicker in terms of messing around getting sql*loader to work especially when you have a criteria for UPDATEs.

您可以在CSV文件上创建一个外部表,并通过从外部表选择到另一个表来加载它们。这个方法是否会更快,但不确定是否会更快,特别是当您有一个更新标准时。