以编程方式将XML数据导入MS SQL服务器

时间:2021-09-17 16:33:54

I have 5 large XML files which I am keen to analyse. All of them are too large to open in a text editor and so I do not know their XML schemas.

我有5个大的XML文件,我很喜欢分析它们。它们都太大了,无法在文本编辑器中打开,因此我不知道它们的XML模式。

I have tried to import them into SQL server, however the process has given me an error even though I am pretty sure they are valid, as they were sourced from very reputable programmers.

我已经尝试将它们导入到SQL server中,但是这个过程给了我一个错误,尽管我很确定它们是有效的,因为它们来自非常有信誉的程序员。

I have also tried other methods but each struggles with the large file sizes (MySQL) or state that the files contain invalid XML characters (Access & Excel).

我也尝试过其他方法,但每种方法都与大文件大小(MySQL)或文件包含无效的XML字符(Access & Excel)有关。

How can I read and insert the data programmatically? Can this be done via SQL query?

如何以编程方式读取和插入数据?这能通过SQL查询完成吗?

Thanks a lot!

谢谢!

11 个解决方案

#1


3  

See this blog post by unofficial * team member Brent Ozar:
http://www.brentozar.com/archive/2009/06/how-to-import-the-*-xml-into-sql-server/

参见非官方*团队成员Brent Ozar的博客:http://www.brentozar.com/archive/2009/06/howto -the-*-xml-into-sql-server/

#2


5  

As of 2013...

2013年……

The only time saving option in my opinion to load large/huge XML files in SQL Server is (as someone previously briefly mentioned) to use the SQLXML 4.0 library.

在我看来,在SQL Server中加载大型/大型XML文件的惟一节省时间的选项是(正如前面提到的)使用SQLXML 4.0库。

This is the solution I adopted to load huge XML files (7GB in size) on a daily basis. The previous process which was using C# manipulation in the Script Task took hours to complete. Using SQLXML 4.0 takes 15-20 minutes. How to install SQLXML 4.0. step by step here. For practical examples in how to do it end to end follow this MSDN link.

这是我每天用来装载大型XML文件(7GB大小)的解决方案。在脚本任务中使用c#操作的前一个过程需要几个小时才能完成。使用SQLXML 4.0需要15-20分钟。如何安装SQLXML 4.0。一步一步。关于如何完成它的实际示例,请遵循此MSDN链接。

My XML has also nested elements, so it's quite complex, the result is 10 tables with 2.5 to 4 million rows each (the daily file sometimes is more than 7GB). My work was based purely on information I learned and applied from the two links provided above.

我的XML也嵌套了元素,所以非常复杂,结果是10个表,每个表有250到400万行(每日文件有时超过7GB)。我的工作完全基于我从上面提供的两个链接中学到和应用的信息。

  • Advantages:

    优点:

    • it's fast
    • 它的速度非常快
    • it's Microsoft (http://www.microsoft.com/en-gb/download/details.aspx?id=30403)
    • 这是微软(http://www.microsoft.com/en-gb/download/details.aspx?id=30403)
    • SSIS package will be very much simplified
    • SSIS包将非常简化。
    • you don't need to spend hours and hours to change the SSIS package if your XML schema changes. SQLXML is able to create the tables in the SQL Server for you every time you run the package, based on the XSD relationships you provide.
    • 如果XML模式发生更改,则不需要花费大量时间来更改SSIS包。SQLXML能够在每次运行包时为您在SQL服务器中创建表,基于您提供的XSD关系。
  • Disadvantages

    缺点

    • creating the XSD may take a while and requires some knowledge. When I did it I learned something new, so this was not a real a disadvantage for me.
    • 创建XSD可能需要一段时间,需要一些知识。当我这样做的时候,我学到了一些新的东西,所以这对我来说并不是一个真正的劣势。
    • when seeing how simple the SSIS package is, your manager will have the impression that you didn't do any work.
    • 当看到SSIS包有多简单时,您的经理会觉得您没有做任何工作。

To view large files use Large Text File Viewer, nice little gem.

要查看大文件使用大文本文件查看器,小宝石。

Note: The question is quite old, but the "issue" remains hot. I added this post for the developers who Google how to BULK LOAD XML files in SSIS and land here.

注意:这个问题由来已久,但“问题”仍然很热门。我为谷歌的开发人员添加了这篇文章,告诉他们如何在SSIS中批量加载XML文件并在这里着陆。

#3


4  

Try the free LogParser utility from Microsoft: http://www.microsoft.com/DownLoads/details.aspx?FamilyID=890cd06b-abf8-4c25-91b2-f8d975cf8c07&displaylang=en

试试微软的免费LogParser实用工具:http://www.microsoft.com/DownLoads/details.aspx?FamilyID=890cd06b-abf8-4c25-91b2-f8d975cf8c07&displaylang=en

It's designed to give you SQL-like access to large text files including XML. Something like

它的设计目的是让您能够访问包括XML在内的大型文本文件。类似的

Select top 1000 * from myFile.xml

...should work to get you started. Also, beware that the documentation will appear in your start menu along side the executable after installation--I don't think there's a good copy on line.

…应该开始工作。另外,要注意的是,在安装完成后,文档将出现在“开始”菜单旁边的可执行文件中——我认为在线上没有好的副本。

#4


1  

You kind of have to know the schema. Try downloading TextPad or something similar to view the files.

你需要知道模式。尝试下载文本或类似的东西来查看文件。

Once you know the schema you can do a couple of things to get them into SQL. One approach would be to use OpenXML http://msdn.microsoft.com/en-us/library/ms186918.aspx.

一旦您了解了模式,您就可以做一些事情将它们转换为SQL。一种方法是使用OpenXML http://msdn.microsoft.com/en-us/library/ms186918.aspx。

#5


1  

I've tested the mssql xml parser extensively, the bcp.exe utility works great for this. The trick is coming up with the right row terminator since it has to be a value that cannot occur in your document. For instance you can do this:

我已经对mssql xml解析器bcp进行了广泛的测试。exe实用工具对此非常有用。关键在于找到正确的行终止符,因为它必须是一个在文档中不能出现的值。例如,你可以这样做:

create table t1(x xml)

Ceate a simple text file that contains only your chosen delimiter. For example place this string in delim.txt:

使一个简单的文本文件只包含您所选择的分隔符。例如,将这个字符串放在delim.txt中:

-++++++++-

- + + + + + + + +

Then concatenate that to the end of your document instance, from the command line:

然后将其连接到文档实例的末尾,从命令行:

copy myFile.xml + delim.txt out.xml /b

myFile副本。xml + delim。三种。xml / b

After this you can BCP it into the database like :

在此之后,你可以将它存入数据库,比如:

bcp.exe test.dbo.t1 in out.xml -T -c -r -++++++++-

bcp。exe test.dbo。t1中。xml - t -c -r -++++++++-

If the document is UTF-16 then replace the -c switch with -w

如果文档是UTF-16,那么用-w替换-c开关

#6


1  

Have you tried SQL Server XML Bulk Load?

您尝试过SQL Server XML批量加载吗?

#7


1  

The first thing I did was to get the first X bytes (e.g. the first 1 MB) of the XML files so I could take a look at them with the editor of my choice.

我所做的第一件事是获得XML文件的前X个字节(例如前1 MB),以便我可以使用我选择的编辑器查看它们。

If you have Cygwin installed you already own a nice GNU utility to achive this: head

如果你已经安装了Cygwin,你已经拥有了一个很好的GNU工具来实现这个功能:head。

head.exe -c1M comments.xml > comments_small.xml

Alternatively you can find a native port of the most GNU utilities here: http://unxutils.sourceforge.net/

您也可以在这里找到最GNU实用程序的本地端口:http://unxutils.sourceforge.net/

#8


0  

For viewing very large files, I've found the V file viewer to be excellent.

对于查看非常大的文件,我发现V文件查看器非常优秀。

I've used it on files as large as 8GB. For files which are fixed record length, it is extremely easy to navigate based on block size, because it is disk-based.

我把它用在8GB的文件上。对于固定记录长度的文件,基于块大小来导航是非常容易的,因为它是基于磁盘的。

Note that there is no editing capability.

注意,没有编辑功能。

Having said that, one difficulty with XML is that it's not really a good format for large "streams", since it has an overall beginning and end structure, and a parser which cannot hold the entire file in memory may have to do some pretty fancy tricks to ensure that it complies with a DTD or schema.

XML已经说过,一个困难在于,它不是一个非常好的格式大“流”,因为它有一个整体结构,开始和结束和一个解析器不能持有整个文件在内存中可能要做一些很花哨的技巧,以确保它符合DTD或者模式。

#9


0  

Take a look at this bog post http://benchmarkitconsulting.com/colin-stasiuk/2009/01/15/parsing-xml-into-a-table-structure-possible-update/

看看这篇博文http://benchmarkitting.com/colin - stasiuk/2009/01/15/par- xml-into-a-table-structure- update/

And this so question: Parsing XML into a SQL table WITHOUT predefining structure. Possible?

这就是这么个问题:在不预先定义结构的情况下,将XML解析到SQL表中。可能吗?

#10


0  

Have you tried using OPENROWSET to import your big XML files into a SQL Server table?

您尝试过使用OPENROWSET将大型XML文件导入SQL Server表吗?

CREATE TABLE XmlTable
(
    ID INT IDENTITY,
    XmlData XML
)

INSERT XmlTable(XmlData)
  SELECT * FROM 
    OPENROWSET(BULK '(your path)\xmldata.xml',
    SINGLE_BLOB
) AS X

Since I don't have any 5GB files at hand, I can't really test it myself.

因为我手头没有任何5GB的文件,所以我不能亲自测试它。

There's another way you might tackle this : streaming Linq-To-Xml. Check out this blog post where James Newton-King shows how to read XElement one-by-one, and a two-part series here and here on the same topic by the Microsoft XML team blog.

还有另一种方法可以解决这个问题:将linq到xml流。看看这个博客,James Newton-King在这里展示了如何逐个阅读XElement,并在这里和这里用微软XML团队的博客介绍了一个两部分的系列文章。

Marc

马克

#11


0  

You should load your XML into an XML database, e.g. Berkeley DB XML or Xindice

您应该将XML加载到XML数据库中,例如Berkeley DB XML或Xindice

Also, I'm not sure if it can scale to 850mb, but First Object XML Editor, and the parser library on which it's built, can handle quite large files.

另外,我不确定它是否可以扩展到850mb,但是首先是对象XML编辑器,以及它所构建的解析器库,可以处理相当大的文件。

Also, Baretail should display your files without breaking a sweat.

此外,光着尾巴也应该显示你的文件而不流汗。

#1


3  

See this blog post by unofficial * team member Brent Ozar:
http://www.brentozar.com/archive/2009/06/how-to-import-the-*-xml-into-sql-server/

参见非官方*团队成员Brent Ozar的博客:http://www.brentozar.com/archive/2009/06/howto -the-*-xml-into-sql-server/

#2


5  

As of 2013...

2013年……

The only time saving option in my opinion to load large/huge XML files in SQL Server is (as someone previously briefly mentioned) to use the SQLXML 4.0 library.

在我看来,在SQL Server中加载大型/大型XML文件的惟一节省时间的选项是(正如前面提到的)使用SQLXML 4.0库。

This is the solution I adopted to load huge XML files (7GB in size) on a daily basis. The previous process which was using C# manipulation in the Script Task took hours to complete. Using SQLXML 4.0 takes 15-20 minutes. How to install SQLXML 4.0. step by step here. For practical examples in how to do it end to end follow this MSDN link.

这是我每天用来装载大型XML文件(7GB大小)的解决方案。在脚本任务中使用c#操作的前一个过程需要几个小时才能完成。使用SQLXML 4.0需要15-20分钟。如何安装SQLXML 4.0。一步一步。关于如何完成它的实际示例,请遵循此MSDN链接。

My XML has also nested elements, so it's quite complex, the result is 10 tables with 2.5 to 4 million rows each (the daily file sometimes is more than 7GB). My work was based purely on information I learned and applied from the two links provided above.

我的XML也嵌套了元素,所以非常复杂,结果是10个表,每个表有250到400万行(每日文件有时超过7GB)。我的工作完全基于我从上面提供的两个链接中学到和应用的信息。

  • Advantages:

    优点:

    • it's fast
    • 它的速度非常快
    • it's Microsoft (http://www.microsoft.com/en-gb/download/details.aspx?id=30403)
    • 这是微软(http://www.microsoft.com/en-gb/download/details.aspx?id=30403)
    • SSIS package will be very much simplified
    • SSIS包将非常简化。
    • you don't need to spend hours and hours to change the SSIS package if your XML schema changes. SQLXML is able to create the tables in the SQL Server for you every time you run the package, based on the XSD relationships you provide.
    • 如果XML模式发生更改,则不需要花费大量时间来更改SSIS包。SQLXML能够在每次运行包时为您在SQL服务器中创建表,基于您提供的XSD关系。
  • Disadvantages

    缺点

    • creating the XSD may take a while and requires some knowledge. When I did it I learned something new, so this was not a real a disadvantage for me.
    • 创建XSD可能需要一段时间,需要一些知识。当我这样做的时候,我学到了一些新的东西,所以这对我来说并不是一个真正的劣势。
    • when seeing how simple the SSIS package is, your manager will have the impression that you didn't do any work.
    • 当看到SSIS包有多简单时,您的经理会觉得您没有做任何工作。

To view large files use Large Text File Viewer, nice little gem.

要查看大文件使用大文本文件查看器,小宝石。

Note: The question is quite old, but the "issue" remains hot. I added this post for the developers who Google how to BULK LOAD XML files in SSIS and land here.

注意:这个问题由来已久,但“问题”仍然很热门。我为谷歌的开发人员添加了这篇文章,告诉他们如何在SSIS中批量加载XML文件并在这里着陆。

#3


4  

Try the free LogParser utility from Microsoft: http://www.microsoft.com/DownLoads/details.aspx?FamilyID=890cd06b-abf8-4c25-91b2-f8d975cf8c07&displaylang=en

试试微软的免费LogParser实用工具:http://www.microsoft.com/DownLoads/details.aspx?FamilyID=890cd06b-abf8-4c25-91b2-f8d975cf8c07&displaylang=en

It's designed to give you SQL-like access to large text files including XML. Something like

它的设计目的是让您能够访问包括XML在内的大型文本文件。类似的

Select top 1000 * from myFile.xml

...should work to get you started. Also, beware that the documentation will appear in your start menu along side the executable after installation--I don't think there's a good copy on line.

…应该开始工作。另外,要注意的是,在安装完成后,文档将出现在“开始”菜单旁边的可执行文件中——我认为在线上没有好的副本。

#4


1  

You kind of have to know the schema. Try downloading TextPad or something similar to view the files.

你需要知道模式。尝试下载文本或类似的东西来查看文件。

Once you know the schema you can do a couple of things to get them into SQL. One approach would be to use OpenXML http://msdn.microsoft.com/en-us/library/ms186918.aspx.

一旦您了解了模式,您就可以做一些事情将它们转换为SQL。一种方法是使用OpenXML http://msdn.microsoft.com/en-us/library/ms186918.aspx。

#5


1  

I've tested the mssql xml parser extensively, the bcp.exe utility works great for this. The trick is coming up with the right row terminator since it has to be a value that cannot occur in your document. For instance you can do this:

我已经对mssql xml解析器bcp进行了广泛的测试。exe实用工具对此非常有用。关键在于找到正确的行终止符,因为它必须是一个在文档中不能出现的值。例如,你可以这样做:

create table t1(x xml)

Ceate a simple text file that contains only your chosen delimiter. For example place this string in delim.txt:

使一个简单的文本文件只包含您所选择的分隔符。例如,将这个字符串放在delim.txt中:

-++++++++-

- + + + + + + + +

Then concatenate that to the end of your document instance, from the command line:

然后将其连接到文档实例的末尾,从命令行:

copy myFile.xml + delim.txt out.xml /b

myFile副本。xml + delim。三种。xml / b

After this you can BCP it into the database like :

在此之后,你可以将它存入数据库,比如:

bcp.exe test.dbo.t1 in out.xml -T -c -r -++++++++-

bcp。exe test.dbo。t1中。xml - t -c -r -++++++++-

If the document is UTF-16 then replace the -c switch with -w

如果文档是UTF-16,那么用-w替换-c开关

#6


1  

Have you tried SQL Server XML Bulk Load?

您尝试过SQL Server XML批量加载吗?

#7


1  

The first thing I did was to get the first X bytes (e.g. the first 1 MB) of the XML files so I could take a look at them with the editor of my choice.

我所做的第一件事是获得XML文件的前X个字节(例如前1 MB),以便我可以使用我选择的编辑器查看它们。

If you have Cygwin installed you already own a nice GNU utility to achive this: head

如果你已经安装了Cygwin,你已经拥有了一个很好的GNU工具来实现这个功能:head。

head.exe -c1M comments.xml > comments_small.xml

Alternatively you can find a native port of the most GNU utilities here: http://unxutils.sourceforge.net/

您也可以在这里找到最GNU实用程序的本地端口:http://unxutils.sourceforge.net/

#8


0  

For viewing very large files, I've found the V file viewer to be excellent.

对于查看非常大的文件,我发现V文件查看器非常优秀。

I've used it on files as large as 8GB. For files which are fixed record length, it is extremely easy to navigate based on block size, because it is disk-based.

我把它用在8GB的文件上。对于固定记录长度的文件,基于块大小来导航是非常容易的,因为它是基于磁盘的。

Note that there is no editing capability.

注意,没有编辑功能。

Having said that, one difficulty with XML is that it's not really a good format for large "streams", since it has an overall beginning and end structure, and a parser which cannot hold the entire file in memory may have to do some pretty fancy tricks to ensure that it complies with a DTD or schema.

XML已经说过,一个困难在于,它不是一个非常好的格式大“流”,因为它有一个整体结构,开始和结束和一个解析器不能持有整个文件在内存中可能要做一些很花哨的技巧,以确保它符合DTD或者模式。

#9


0  

Take a look at this bog post http://benchmarkitconsulting.com/colin-stasiuk/2009/01/15/parsing-xml-into-a-table-structure-possible-update/

看看这篇博文http://benchmarkitting.com/colin - stasiuk/2009/01/15/par- xml-into-a-table-structure- update/

And this so question: Parsing XML into a SQL table WITHOUT predefining structure. Possible?

这就是这么个问题:在不预先定义结构的情况下,将XML解析到SQL表中。可能吗?

#10


0  

Have you tried using OPENROWSET to import your big XML files into a SQL Server table?

您尝试过使用OPENROWSET将大型XML文件导入SQL Server表吗?

CREATE TABLE XmlTable
(
    ID INT IDENTITY,
    XmlData XML
)

INSERT XmlTable(XmlData)
  SELECT * FROM 
    OPENROWSET(BULK '(your path)\xmldata.xml',
    SINGLE_BLOB
) AS X

Since I don't have any 5GB files at hand, I can't really test it myself.

因为我手头没有任何5GB的文件,所以我不能亲自测试它。

There's another way you might tackle this : streaming Linq-To-Xml. Check out this blog post where James Newton-King shows how to read XElement one-by-one, and a two-part series here and here on the same topic by the Microsoft XML team blog.

还有另一种方法可以解决这个问题:将linq到xml流。看看这个博客,James Newton-King在这里展示了如何逐个阅读XElement,并在这里和这里用微软XML团队的博客介绍了一个两部分的系列文章。

Marc

马克

#11


0  

You should load your XML into an XML database, e.g. Berkeley DB XML or Xindice

您应该将XML加载到XML数据库中,例如Berkeley DB XML或Xindice

Also, I'm not sure if it can scale to 850mb, but First Object XML Editor, and the parser library on which it's built, can handle quite large files.

另外,我不确定它是否可以扩展到850mb,但是首先是对象XML编辑器,以及它所构建的解析器库,可以处理相当大的文件。

Also, Baretail should display your files without breaking a sweat.

此外,光着尾巴也应该显示你的文件而不流汗。