如何使用insert语句将不同RDBMS的数百万数据插入到SQL Server数据库中?

时间:2021-08-05 15:42:59

I have two databases in my SQL Server with each database containing 1 single table as of now.

到目前为止,我的SQL服务器中有两个数据库,每个数据库包含一个表。

I have 2 database like below :

我有两个数据库如下:

1) Db1 (MySQL)

1)Db1(MySQL)

2) Db2 (Oracle)

2)Db2(Oracle)

Now what I want to do is fill my database table of SQL Server db1 with data from Db1 from MySQL like below :

现在,我要做的是将SQL Server db1的数据库表填入db1的数据库表,如下所示:

Insert into Table1 select * from Table1

Select * from Table1(Mysql Db1) - Data coming from Mysql database

从表1(Mysql Db1)中选择* -来自Mysql数据库的数据

Insert into Table1(Sql server Db1) - Insert data coming from Mysql database considering same schema

插入到表1(Sql server Db1)——插入来自Mysql数据库的数据,考虑到相同的模式

I don't want to use sqlbulk copy as I don't want to insert chunk by chunk data. I want to insert all data in 1 go considering millions of data as my operation is just not limited to insert records in database. So user have to sit wait for a long like first millions of data inserting chunk by chunk in database and then again for my further operation which is also long running operation.

我不想使用sqlbulk拷贝,因为我不想按块数据插入块。我想将所有的数据插入到1中,考虑到数百万的数据,因为我的操作并不局限于在数据库中插入记录。所以用户不得不坐着等很长一段时间,比如在数据库中按块插入数百万数据,然后再等待我的进一步操作,这也是一个长期运行的操作。

So if I have this process speed up then I can have my second operation also speed up considering all records are in my 1 local sql server instance.

因此,如果进程速度加快,那么我的第二个操作也会加快,因为所有记录都在我的1本地sql server实例中。

Is this possible to achieve in a C# application?

这在c#应用程序中可以实现吗?

Update: I researched about Linked server as @GorDon Linoff suggested me that linked server can be use to achieve this scenario but based on my research it seems like i cannot create linked server through code.

更新:我研究了链接服务器,@GorDon Linoff建议我可以使用链接服务器来实现这个场景,但是根据我的研究,似乎我不能通过代码创建链接服务器。

I want to do this with the help of ado.net.

我想在ado.net的帮助下做这件事。

This is what I am trying to do exactly:

这正是我想做的:

Consider I have 2 different client RDBMS with 2 database and some tables in client premises.

假设我有两个不同的客户端RDBMS,其中有两个数据库和客户端前提中的一些表。

So database is like this :

数据库是这样的

Sql Server :

Db1

Order
Id      Amount
1       100
2       200
3       300
4       400


Mysql or Oracle :

Db1:

Order
Id      Amount
1       1000
2       2000
3       3000
4       400

Now I want to compare Amount column from source (SQL Server) to destination database (MySQL or Oracle).

现在我想比较一下从源(SQL Server)到目标数据库(MySQL或Oracle)的数量列。

I will be use to join this 2 different RDBMS databases tables to compare Amount columns.

我将使用这两个不同的RDBMS数据库表来比较数量列。

In C# what I can do is like fetch chunk by chunk records in my datatable (in memory) then compare this records with the help of code but this will take so much time considering millions of records.

在c#中,我可以做的是在我的datatable(内存中)中按块获取数据,然后在代码的帮助下比较这些记录,但考虑到数百万条记录,这将花费太多时间。

So I want to do something better than this.

我想做得更好。

Hence I was thinking that i bring out this 2 RDBMS records in my local SQL server instance in 2 databases and then create join query joining this 2 tables based on Id and then take advantage of DBMS processing capability which can compare this millions of records efficiently.

因此我想在我的本地SQL server实例中拿出这两个RDBMS记录在两个数据库中,然后根据Id创建连接这两个表的连接查询,然后利用DBMS处理能力,可以有效地比较这数百万条记录。

Query like this compares millions of records efficiently :

这样的查询可以有效地比较数百万条记录:

select SqlServer.Id,Mysql.Id,SqlServer.Amount,Mysql.Amount from SqlServerDb.dbo.Order as SqlServer
Left join MysqlDb.dbo.Order as Mysql on SqlServer.Id=Mysql.Id
where SqlServer.Amount != Mysql.Amount

Above query works when I have this 2 different RDBMS data in my local server instance with database : SqlServerDb and MysqlDb and this will fetch below records whose amount is not matching :

当我在本地服务器实例中使用数据库:SqlServerDb和MysqlDb时,当我有这两个不同的RDBMS数据时,上面的查询就会起作用。

So I am trying to get those records from source(Sql server Db) to MySQL whose Amount column value is not matching.

因此,我试图将这些记录从源(Sql server Db)获取到数量列值不匹配的MySQL。

Expected Output :

预期的输出:

Id      Amount
1       1000
2       2000
3       3000

So there is any way to achieve this scenario?

有什么方法可以实现这个场景吗?

6 个解决方案

#1


4  

On the SELECT side, create a .csv file (tab-delimited) using SELECT ... INTO OUTFILE ...

在SELECT侧,使用SELECT…创建.csv文件(表分隔)。到输出文件…

On the INSERT side, use LOAD DATA INFILE ... (or whatever the target machine syntax is).

在插入端,使用加载数据INFILE…(或任何目标机器语法)。

Doing it all at once may be easier to code than chunking, and may (or may not) be faster running.

一次性完成这些操作可能比组块更容易,而且可能(也可能不会)更快地运行。

#2


2  

SqlBulkCopy can accept either a DataTable or a System.Data.IDataReader as its input.

SqlBulkCopy可以接受一个DataTable或者System.Data。IDataReader作为它的输入。

Using your query to read the source DB, set up a ADO.Net DataReader on the source MySQL or Oracle DB and pass the reader to the WriteToServer() method of the SqlBulkCopy.

使用查询读取源数据库,设置一个ADO。在源MySQL或Oracle DB上的Net DataReader,并将读取器传递给SqlBulkCopy的WriteToServer()方法。

This can copy almost any number of rows without limit. I have copied hundreds of millions of rows using the data reader approach.

它可以无限制地复制几乎任意数量的行。我使用数据读取器方法复制了数亿行。

#3


1  

What about adding a changed date in the remote database.

在远程数据库中添加更改的日期怎么样?

Then you could get all rows that have changed since the last sync and just compare those?

然后你可以得到自上次同步以来所有的行并进行比较?

#4


1  

First of all do not use linked server. It is tempting but it will more trouble than it is bringing on the table. Like updates and inserts will fetch all of the target db to source db and do insert/update and post all data to target back.

首先,不要使用链接服务器。这很诱人,但它带来的麻烦要比它带来的麻烦大。就像更新和插入一样,将把所有目标db获取到源db,并进行插入/更新,并将所有数据发送回目标db。

As far as I understand you are trying to copy changed data to target system for some stuff.

据我所知,您正在试图将更改的数据复制到目标系统中。

I recommend using a timestamp column on source table. When anything changes on source table timestamp column is updated by sql server.

我建议在源表上使用时间戳列。当源表时间戳列上的任何更改被sql server更新时。

On target, get max ID and max timestamp. two queries at max.

在目标上,获取最大ID和最大时间戳。两个查询在马克斯。

On source, rows where source.ID <= target.MaxID && source.timestamp >= target.MaxTimeTamp is true, are the rows that changed after last sync (need update). And rows where source.ID > target.MaxID is true, are the rows that are inserted after last sync.

在源代码中,在源代码所在的行。ID < =目标。MaxID & &来源。时间戳> =目标。MaxTimeTamp为true,是在最后一次同步后更改的行(需要更新)。行和来源。ID >的目标。maxd是正确的,是在最后一次同步之后插入的行。

Now you do not have to compare two worlds, and you just got all updates and inserts.

现在您不需要比较两个世界,您只需要获得所有的更新和插入。

#5


1  

You need to create a linked server connection using ODBC and the proper driver, after that you can execute the queries using openquery.

您需要使用ODBC和适当的驱动程序创建一个链接的服务器连接,之后您可以使用openquery执行查询。

Take a look at openquery:

看看openquery:

https://msdn.microsoft.com/en-us/library/ms188427(v=sql.120).aspx

https://msdn.microsoft.com/en-us/library/ms188427(v = sql.120). aspx

#6


1  

Yes, SQL Server is very efficient when it's working with sets so let's keep that in play.

是的,SQL Server在使用集合时非常有效,所以我们继续使用它。

In a nutshell, what I'm pitching is

简而言之,我要讲的是

  1. Load data from the source to a staging table on the target database (staging table = table to temporarily hold raw data from the source table, same structure as the source table... add tracking columns to taste). This will be done by your C# code... select from source_table into DataTable then SqlBulkCopy to the staging table.

    将数据从源加载到目标数据库上的staging表(staging表=表)中,以临时保存来自源表的原始数据,与源表具有相同的结构……添加跟踪列的味道)。这将由您的c#代码完成…从source_table中选择DataTable,然后将SqlBulkCopy选择到staging表。

  2. Have a stored proc on the target database to reconcile the data between your target table and the staging table. Your C# code calls the stored proc.

    在目标数据库上有一个存储的proc,用于协调目标表和staging表之间的数据。您的c#代码调用存储的proc。

Given that you're talking about millions of rows, another thing that can make things faster is dropping indices on the staging table before inserting to it and recreating those after the inserts and before any select is performed.

假设您正在讨论数百万行,另一种可以使事情更快的方法是在插入到staging表之前删除索引,然后在插入之后和执行任何select之前重新创建它们。

#1


4  

On the SELECT side, create a .csv file (tab-delimited) using SELECT ... INTO OUTFILE ...

在SELECT侧,使用SELECT…创建.csv文件(表分隔)。到输出文件…

On the INSERT side, use LOAD DATA INFILE ... (or whatever the target machine syntax is).

在插入端,使用加载数据INFILE…(或任何目标机器语法)。

Doing it all at once may be easier to code than chunking, and may (or may not) be faster running.

一次性完成这些操作可能比组块更容易,而且可能(也可能不会)更快地运行。

#2


2  

SqlBulkCopy can accept either a DataTable or a System.Data.IDataReader as its input.

SqlBulkCopy可以接受一个DataTable或者System.Data。IDataReader作为它的输入。

Using your query to read the source DB, set up a ADO.Net DataReader on the source MySQL or Oracle DB and pass the reader to the WriteToServer() method of the SqlBulkCopy.

使用查询读取源数据库,设置一个ADO。在源MySQL或Oracle DB上的Net DataReader,并将读取器传递给SqlBulkCopy的WriteToServer()方法。

This can copy almost any number of rows without limit. I have copied hundreds of millions of rows using the data reader approach.

它可以无限制地复制几乎任意数量的行。我使用数据读取器方法复制了数亿行。

#3


1  

What about adding a changed date in the remote database.

在远程数据库中添加更改的日期怎么样?

Then you could get all rows that have changed since the last sync and just compare those?

然后你可以得到自上次同步以来所有的行并进行比较?

#4


1  

First of all do not use linked server. It is tempting but it will more trouble than it is bringing on the table. Like updates and inserts will fetch all of the target db to source db and do insert/update and post all data to target back.

首先,不要使用链接服务器。这很诱人,但它带来的麻烦要比它带来的麻烦大。就像更新和插入一样,将把所有目标db获取到源db,并进行插入/更新,并将所有数据发送回目标db。

As far as I understand you are trying to copy changed data to target system for some stuff.

据我所知,您正在试图将更改的数据复制到目标系统中。

I recommend using a timestamp column on source table. When anything changes on source table timestamp column is updated by sql server.

我建议在源表上使用时间戳列。当源表时间戳列上的任何更改被sql server更新时。

On target, get max ID and max timestamp. two queries at max.

在目标上,获取最大ID和最大时间戳。两个查询在马克斯。

On source, rows where source.ID <= target.MaxID && source.timestamp >= target.MaxTimeTamp is true, are the rows that changed after last sync (need update). And rows where source.ID > target.MaxID is true, are the rows that are inserted after last sync.

在源代码中,在源代码所在的行。ID < =目标。MaxID & &来源。时间戳> =目标。MaxTimeTamp为true,是在最后一次同步后更改的行(需要更新)。行和来源。ID >的目标。maxd是正确的,是在最后一次同步之后插入的行。

Now you do not have to compare two worlds, and you just got all updates and inserts.

现在您不需要比较两个世界,您只需要获得所有的更新和插入。

#5


1  

You need to create a linked server connection using ODBC and the proper driver, after that you can execute the queries using openquery.

您需要使用ODBC和适当的驱动程序创建一个链接的服务器连接,之后您可以使用openquery执行查询。

Take a look at openquery:

看看openquery:

https://msdn.microsoft.com/en-us/library/ms188427(v=sql.120).aspx

https://msdn.microsoft.com/en-us/library/ms188427(v = sql.120). aspx

#6


1  

Yes, SQL Server is very efficient when it's working with sets so let's keep that in play.

是的,SQL Server在使用集合时非常有效,所以我们继续使用它。

In a nutshell, what I'm pitching is

简而言之,我要讲的是

  1. Load data from the source to a staging table on the target database (staging table = table to temporarily hold raw data from the source table, same structure as the source table... add tracking columns to taste). This will be done by your C# code... select from source_table into DataTable then SqlBulkCopy to the staging table.

    将数据从源加载到目标数据库上的staging表(staging表=表)中,以临时保存来自源表的原始数据,与源表具有相同的结构……添加跟踪列的味道)。这将由您的c#代码完成…从source_table中选择DataTable,然后将SqlBulkCopy选择到staging表。

  2. Have a stored proc on the target database to reconcile the data between your target table and the staging table. Your C# code calls the stored proc.

    在目标数据库上有一个存储的proc,用于协调目标表和staging表之间的数据。您的c#代码调用存储的proc。

Given that you're talking about millions of rows, another thing that can make things faster is dropping indices on the staging table before inserting to it and recreating those after the inserts and before any select is performed.

假设您正在讨论数百万行,另一种可以使事情更快的方法是在插入到staging表之前删除索引,然后在插入之后和执行任何select之前重新创建它们。