以编程方式将XML数据导入MS SQL服务器

时间:2021-09-17 16:33:54

I have 5 large XML files which I am keen to analyse. All of them are too large to open in a text editor and so I do not know their XML schemas.


I have tried to import them into SQL server, however the process has given me an error even though I am pretty sure they are valid, as they were sourced from very reputable programmers.

我已经尝试将它们导入到SQL server中,但是这个过程给了我一个错误,尽管我很确定它们是有效的,因为它们来自非常有信誉的程序员。

I have also tried other methods but each struggles with the large file sizes (MySQL) or state that the files contain invalid XML characters (Access & Excel).

我也尝试过其他方法,但每种方法都与大文件大小(MySQL)或文件包含无效的XML字符(Access & Excel)有关。

How can I read and insert the data programmatically? Can this be done via SQL query?


Thanks a lot!


11 个解决方案



See this blog post by unofficial * team member Brent Ozar:

参见非官方*团队成员Brent Ozar的博客:http://www.brentozar.com/archive/2009/06/howto -the-*-xml-into-sql-server/



As of 2013...


The only time saving option in my opinion to load large/huge XML files in SQL Server is (as someone previously briefly mentioned) to use the SQLXML 4.0 library.

在我看来,在SQL Server中加载大型/大型XML文件的惟一节省时间的选项是(正如前面提到的)使用SQLXML 4.0库。

This is the solution I adopted to load huge XML files (7GB in size) on a daily basis. The previous process which was using C# manipulation in the Script Task took hours to complete. Using SQLXML 4.0 takes 15-20 minutes. How to install SQLXML 4.0. step by step here. For practical examples in how to do it end to end follow this MSDN link.

这是我每天用来装载大型XML文件(7GB大小)的解决方案。在脚本任务中使用c#操作的前一个过程需要几个小时才能完成。使用SQLXML 4.0需要15-20分钟。如何安装SQLXML 4.0。一步一步。关于如何完成它的实际示例,请遵循此MSDN链接。

My XML has also nested elements, so it's quite complex, the result is 10 tables with 2.5 to 4 million rows each (the daily file sometimes is more than 7GB). My work was based purely on information I learned and applied from the two links provided above.


  • Advantages:


    • it's fast
    • 它的速度非常快
    • it's Microsoft (http://www.microsoft.com/en-gb/download/details.aspx?id=30403)
    • 这是微软(http://www.microsoft.com/en-gb/download/details.aspx?id=30403)
    • SSIS package will be very much simplified
    • SSIS包将非常简化。
    • you don't need to spend hours and hours to change the SSIS package if your XML schema changes. SQLXML is able to create the tables in the SQL Server for you every time you run the package, based on the XSD relationships you provide.
    • 如果XML模式发生更改,则不需要花费大量时间来更改SSIS包。SQLXML能够在每次运行包时为您在SQL服务器中创建表,基于您提供的XSD关系。
  • Disadvantages


    • creating the XSD may take a while and requires some knowledge. When I did it I learned something new, so this was not a real a disadvantage for me.
    • 创建XSD可能需要一段时间,需要一些知识。当我这样做的时候,我学到了一些新的东西,所以这对我来说并不是一个真正的劣势。
    • when seeing how simple the SSIS package is, your manager will have the impression that you didn't do any work.
    • 当看到SSIS包有多简单时,您的经理会觉得您没有做任何工作。

To view large files use Large Text File Viewer, nice little gem.


Note: The question is quite old, but the "issue" remains hot. I added this post for the developers who Google how to BULK LOAD XML files in SSIS and land here.




Try the free LogParser utility from Microsoft: http://www.microsoft.com/DownLoads/details.aspx?FamilyID=890cd06b-abf8-4c25-91b2-f8d975cf8c07&displaylang=en


It's designed to give you SQL-like access to large text files including XML. Something like


Select top 1000 * from myFile.xml

...should work to get you started. Also, beware that the documentation will appear in your start menu along side the executable after installation--I don't think there's a good copy on line.




You kind of have to know the schema. Try downloading TextPad or something similar to view the files.


Once you know the schema you can do a couple of things to get them into SQL. One approach would be to use OpenXML http://msdn.microsoft.com/en-us/library/ms186918.aspx.

一旦您了解了模式,您就可以做一些事情将它们转换为SQL。一种方法是使用OpenXML http://msdn.microsoft.com/en-us/library/ms186918.aspx。



I've tested the mssql xml parser extensively, the bcp.exe utility works great for this. The trick is coming up with the right row terminator since it has to be a value that cannot occur in your document. For instance you can do this:

我已经对mssql xml解析器bcp进行了广泛的测试。exe实用工具对此非常有用。关键在于找到正确的行终止符,因为它必须是一个在文档中不能出现的值。例如,你可以这样做:

create table t1(x xml)

Ceate a simple text file that contains only your chosen delimiter. For example place this string in delim.txt:



- + + + + + + + +

Then concatenate that to the end of your document instance, from the command line:


copy myFile.xml + delim.txt out.xml /b

myFile副本。xml + delim。三种。xml / b

After this you can BCP it into the database like :


bcp.exe test.dbo.t1 in out.xml -T -c -r -++++++++-

bcp。exe test.dbo。t1中。xml - t -c -r -++++++++-

If the document is UTF-16 then replace the -c switch with -w




Have you tried SQL Server XML Bulk Load?

您尝试过SQL Server XML批量加载吗?



The first thing I did was to get the first X bytes (e.g. the first 1 MB) of the XML files so I could take a look at them with the editor of my choice.

我所做的第一件事是获得XML文件的前X个字节(例如前1 MB),以便我可以使用我选择的编辑器查看它们。

If you have Cygwin installed you already own a nice GNU utility to achive this: head


head.exe -c1M comments.xml > comments_small.xml

Alternatively you can find a native port of the most GNU utilities here: http://unxutils.sourceforge.net/




For viewing very large files, I've found the V file viewer to be excellent.


I've used it on files as large as 8GB. For files which are fixed record length, it is extremely easy to navigate based on block size, because it is disk-based.


Note that there is no editing capability.


Having said that, one difficulty with XML is that it's not really a good format for large "streams", since it has an overall beginning and end structure, and a parser which cannot hold the entire file in memory may have to do some pretty fancy tricks to ensure that it complies with a DTD or schema.




Take a look at this bog post http://benchmarkitconsulting.com/colin-stasiuk/2009/01/15/parsing-xml-into-a-table-structure-possible-update/

看看这篇博文http://benchmarkitting.com/colin - stasiuk/2009/01/15/par- xml-into-a-table-structure- update/

And this so question: Parsing XML into a SQL table WITHOUT predefining structure. Possible?




Have you tried using OPENROWSET to import your big XML files into a SQL Server table?

您尝试过使用OPENROWSET将大型XML文件导入SQL Server表吗?

    XmlData XML

INSERT XmlTable(XmlData)
    OPENROWSET(BULK '(your path)\xmldata.xml',
) AS X

Since I don't have any 5GB files at hand, I can't really test it myself.


There's another way you might tackle this : streaming Linq-To-Xml. Check out this blog post where James Newton-King shows how to read XElement one-by-one, and a two-part series here and here on the same topic by the Microsoft XML team blog.

还有另一种方法可以解决这个问题:将linq到xml流。看看这个博客,James Newton-King在这里展示了如何逐个阅读XElement,并在这里和这里用微软XML团队的博客介绍了一个两部分的系列文章。





You should load your XML into an XML database, e.g. Berkeley DB XML or Xindice

您应该将XML加载到XML数据库中,例如Berkeley DB XML或Xindice

Also, I'm not sure if it can scale to 850mb, but First Object XML Editor, and the parser library on which it's built, can handle quite large files.


Also, Baretail should display your files without breaking a sweat.




See this blog post by unofficial * team member Brent Ozar:

参见非官方*团队成员Brent Ozar的博客:http://www.brentozar.com/archive/2009/06/howto -the-*-xml-into-sql-server/



As of 2013...


The only time saving option in my opinion to load large/huge XML files in SQL Server is (as someone previously briefly mentioned) to use the SQLXML 4.0 library.

在我看来,在SQL Server中加载大型/大型XML文件的惟一节省时间的选项是(正如前面提到的)使用SQLXML 4.0库。

This is the solution I adopted to load huge XML files (7GB in size) on a daily basis. The previous process which was using C# manipulation in the Script Task took hours to complete. Using SQLXML 4.0 takes 15-20 minutes. How to install SQLXML 4.0. step by step here. For practical examples in how to do it end to end follow this MSDN link.

这是我每天用来装载大型XML文件(7GB大小)的解决方案。在脚本任务中使用c#操作的前一个过程需要几个小时才能完成。使用SQLXML 4.0需要15-20分钟。如何安装SQLXML 4.0。一步一步。关于如何完成它的实际示例,请遵循此MSDN链接。

My XML has also nested elements, so it's quite complex, the result is 10 tables with 2.5 to 4 million rows each (the daily file sometimes is more than 7GB). My work was based purely on information I learned and applied from the two links provided above.


  • Advantages:


    • it's fast
    • 它的速度非常快
    • it's Microsoft (http://www.microsoft.com/en-gb/download/details.aspx?id=30403)
    • 这是微软(http://www.microsoft.com/en-gb/download/details.aspx?id=30403)
    • SSIS package will be very much simplified
    • SSIS包将非常简化。
    • you don't need to spend hours and hours to change the SSIS package if your XML schema changes. SQLXML is able to create the tables in the SQL Server for you every time you run the package, based on the XSD relationships you provide.
    • 如果XML模式发生更改,则不需要花费大量时间来更改SSIS包。SQLXML能够在每次运行包时为您在SQL服务器中创建表,基于您提供的XSD关系。
  • Disadvantages


    • creating the XSD may take a while and requires some knowledge. When I did it I learned something new, so this was not a real a disadvantage for me.
    • 创建XSD可能需要一段时间,需要一些知识。当我这样做的时候,我学到了一些新的东西,所以这对我来说并不是一个真正的劣势。
    • when seeing how simple the SSIS package is, your manager will have the impression that you didn't do any work.
    • 当看到SSIS包有多简单时,您的经理会觉得您没有做任何工作。

To view large files use Large Text File Viewer, nice little gem.


Note: The question is quite old, but the "issue" remains hot. I added this post for the developers who Google how to BULK LOAD XML files in SSIS and land here.




Try the free LogParser utility from Microsoft: http://www.microsoft.com/DownLoads/details.aspx?FamilyID=890cd06b-abf8-4c25-91b2-f8d975cf8c07&displaylang=en


It's designed to give you SQL-like access to large text files including XML. Something like


Select top 1000 * from myFile.xml

...should work to get you started. Also, beware that the documentation will appear in your start menu along side the executable after installation--I don't think there's a good copy on line.




You kind of have to know the schema. Try downloading TextPad or something similar to view the files.


Once you know the schema you can do a couple of things to get them into SQL. One approach would be to use OpenXML http://msdn.microsoft.com/en-us/library/ms186918.aspx.

一旦您了解了模式,您就可以做一些事情将它们转换为SQL。一种方法是使用OpenXML http://msdn.microsoft.com/en-us/library/ms186918.aspx。



I've tested the mssql xml parser extensively, the bcp.exe utility works great for this. The trick is coming up with the right row terminator since it has to be a value that cannot occur in your document. For instance you can do this:

我已经对mssql xml解析器bcp进行了广泛的测试。exe实用工具对此非常有用。关键在于找到正确的行终止符,因为它必须是一个在文档中不能出现的值。例如,你可以这样做:

create table t1(x xml)

Ceate a simple text file that contains only your chosen delimiter. For example place this string in delim.txt:



- + + + + + + + +

Then concatenate that to the end of your document instance, from the command line:


copy myFile.xml + delim.txt out.xml /b

myFile副本。xml + delim。三种。xml / b

After this you can BCP it into the database like :


bcp.exe test.dbo.t1 in out.xml -T -c -r -++++++++-

bcp。exe test.dbo。t1中。xml - t -c -r -++++++++-

If the document is UTF-16 then replace the -c switch with -w




Have you tried SQL Server XML Bulk Load?

您尝试过SQL Server XML批量加载吗?



The first thing I did was to get the first X bytes (e.g. the first 1 MB) of the XML files so I could take a look at them with the editor of my choice.

我所做的第一件事是获得XML文件的前X个字节(例如前1 MB),以便我可以使用我选择的编辑器查看它们。

If you have Cygwin installed you already own a nice GNU utility to achive this: head


head.exe -c1M comments.xml > comments_small.xml

Alternatively you can find a native port of the most GNU utilities here: http://unxutils.sourceforge.net/




For viewing very large files, I've found the V file viewer to be excellent.


I've used it on files as large as 8GB. For files which are fixed record length, it is extremely easy to navigate based on block size, because it is disk-based.


Note that there is no editing capability.


Having said that, one difficulty with XML is that it's not really a good format for large "streams", since it has an overall beginning and end structure, and a parser which cannot hold the entire file in memory may have to do some pretty fancy tricks to ensure that it complies with a DTD or schema.




Take a look at this bog post http://benchmarkitconsulting.com/colin-stasiuk/2009/01/15/parsing-xml-into-a-table-structure-possible-update/

看看这篇博文http://benchmarkitting.com/colin - stasiuk/2009/01/15/par- xml-into-a-table-structure- update/

And this so question: Parsing XML into a SQL table WITHOUT predefining structure. Possible?




Have you tried using OPENROWSET to import your big XML files into a SQL Server table?

您尝试过使用OPENROWSET将大型XML文件导入SQL Server表吗?

    XmlData XML

INSERT XmlTable(XmlData)
    OPENROWSET(BULK '(your path)\xmldata.xml',
) AS X

Since I don't have any 5GB files at hand, I can't really test it myself.


There's another way you might tackle this : streaming Linq-To-Xml. Check out this blog post where James Newton-King shows how to read XElement one-by-one, and a two-part series here and here on the same topic by the Microsoft XML team blog.

还有另一种方法可以解决这个问题:将linq到xml流。看看这个博客,James Newton-King在这里展示了如何逐个阅读XElement,并在这里和这里用微软XML团队的博客介绍了一个两部分的系列文章。





You should load your XML into an XML database, e.g. Berkeley DB XML or Xindice

您应该将XML加载到XML数据库中,例如Berkeley DB XML或Xindice

Also, I'm not sure if it can scale to 850mb, but First Object XML Editor, and the parser library on which it's built, can handle quite large files.


Also, Baretail should display your files without breaking a sweat.
