大的XML文件和分页,有可能吗?

时间:2023-01-14 11:04:30

The problem

When opening very large XML files locally, on your machine, it's almost a certainty that it will take an age for that file to open - it can often mean your computer locks down because it thinks it's not responding.

当在您的计算机上本地打开非常大的XML文件时,几乎可以肯定的是,该文件需要一段时间才能打开——这通常意味着您的计算机锁定,因为它认为它没有响应。

This is an issue if you serve users XML backups of rather complex databases or systems they use - the likehood of them being able to open large backups, let alone use them, is slim.

这是一个问题,如果您为用户提供了相当复杂的数据库或系统的XML备份,那么他们就可以打开大型备份,更不用说使用它们了。

Is pagination possible?

I use XSLT to present readable backups to users. In this same way, would it be possible to pull only a page at a time of data, to prevent the entire file from being read in one go, thus causing the issues above.

我使用XSLT向用户提供可读的备份。以同样的方式,是否可能在每次数据时只拉出一个页面,以防止一次读取整个文件,从而导致上述问题。

I imagine the answer is simply a no - but I would like to know if anyone else has seen the same issues and resolved them.

我想答案很简单,就是“不”——但我想知道是否还有其他人看到过同样的问题并解决了它们。

Note: This is on a local machine only, it must not require an internet connection. JavaScript can be used if it makes things easier.

注意:这只是在本地机器上,它不需要互联网连接。如果JavaScript能使事情变得更简单,它就可以使用。

5 个解决方案

#1


3  

Pagination with XSLT is possible, but will probably not lead to the desired results: For XSLT to work, the whole XML document must be parsed into a DOM tree.

使用XSLT的分页是可能的,但可能不会导致期望的结果:对于XSLT来说,整个XML文档必须被解析为DOM树。

What you could do, is experiment with streaming transformations: http://stx.sourceforge.net/

您可以做的是尝试流转换:http://stx.sourceforge.net/

Or you could preprocess the large XML file to cut it up into smaller bits before processing with XSLT. For this I'd use a command line tool like XMLStarlet

或者,您可以对大型XML文件进行预处理,以便在使用XSLT处理之前将其分割为更小的位。为此,我将使用命令行工具XMLStarlet

#2


2  

Right on, very good question!

很好的问题!

XSLT implementations I know require DOM, so they are bound to access the entire document (although it could perhaps be done in a lazy fashion)

我知道XSLT实现需要DOM,所以它们被绑定访问整个文档(尽管可能以惰性方式完成)

Anyway, you should take a look at VTD-XML: http://vtd-xml.sourceforge.net/

无论如何,您应该看一下VTD-XML: http://vtd-xml.sourceforge.net/

The latest SAXON XSLT processor also supports rudimentary support for what is called "Streaming XSLT". Read about that here: http://www.saxonica.com/documentation/index/intro.html

最新的SAXON XSLT处理器还支持对所谓的“流XSLT”的基本支持。请阅读:http://www.saxonica.com/documentation/index/intro.html

That said, database backups are probably not the right use case for XML. If you have to deal with XML database backups, I would try to get away from those as fast as possible. Same for logs - a linear process should work by simply appending things. I mean, it would be even better of XML would allow a forest as top level structure, but I think that is never going to happen.

也就是说,数据库备份可能不是XML的正确用例。如果您必须处理XML数据库备份,我将尝试尽快摆脱这些备份。对于日志也是一样的——线性过程应该简单地附加一些东西。我的意思是,如果XML允许森林作为顶层结构,那就更好了,但我认为这永远不会发生。

#3


1  

XMLMax Virtual xml editor will read, parse and display a 1 Gigabyte xml file in a treeview in about 30 seconds on a fast PC. Windows OS only. It will work with xml of any size or structure.

XMLMax虚拟xml编辑器将在快速PC上在treeview中读取、解析和显示一个1g的xml文件,时间大约为30秒。仅Windows操作系统。它将与任何大小或结构的xml一起工作。

#4


0  

HI, i don't know what programing language you are using but in C# using XMLReader i can read the file tag by tag and not the whole file. This way you can read only the first page and stop the reading. Best Regards, Iordan

嗨,我不知道你在用什么编程语言,但是在c#中使用XMLReader,我可以逐标记读取文件标记,而不是整个文件。这样,你只能读第一页,停止阅读。最好的问候,Iordan

#5


0  

One way to alleviate this problem would be to split the large XML files into a number of smaller XML documents. Depending on the type of data you may split or partition the file any number of ways (i.e. Day, Transaction, Entity, etc)

缓解这个问题的一种方法是将大型XML文件分割成许多较小的XML文档。根据数据的类型,您可以通过多种方式对文件进行分割或分区(例如,日、事务、实体等)

This will introduce a number of other challenges of course. For instance you will have to come up with a specialized parser if you need to view the data as a whole or across partitions.

这当然会带来其他一些挑战。例如,如果需要将数据作为一个整体或跨分区查看,就必须使用专门的解析器。

#1


3  

Pagination with XSLT is possible, but will probably not lead to the desired results: For XSLT to work, the whole XML document must be parsed into a DOM tree.

使用XSLT的分页是可能的,但可能不会导致期望的结果:对于XSLT来说,整个XML文档必须被解析为DOM树。

What you could do, is experiment with streaming transformations: http://stx.sourceforge.net/

您可以做的是尝试流转换:http://stx.sourceforge.net/

Or you could preprocess the large XML file to cut it up into smaller bits before processing with XSLT. For this I'd use a command line tool like XMLStarlet

或者,您可以对大型XML文件进行预处理,以便在使用XSLT处理之前将其分割为更小的位。为此,我将使用命令行工具XMLStarlet

#2


2  

Right on, very good question!

很好的问题!

XSLT implementations I know require DOM, so they are bound to access the entire document (although it could perhaps be done in a lazy fashion)

我知道XSLT实现需要DOM,所以它们被绑定访问整个文档(尽管可能以惰性方式完成)

Anyway, you should take a look at VTD-XML: http://vtd-xml.sourceforge.net/

无论如何,您应该看一下VTD-XML: http://vtd-xml.sourceforge.net/

The latest SAXON XSLT processor also supports rudimentary support for what is called "Streaming XSLT". Read about that here: http://www.saxonica.com/documentation/index/intro.html

最新的SAXON XSLT处理器还支持对所谓的“流XSLT”的基本支持。请阅读:http://www.saxonica.com/documentation/index/intro.html

That said, database backups are probably not the right use case for XML. If you have to deal with XML database backups, I would try to get away from those as fast as possible. Same for logs - a linear process should work by simply appending things. I mean, it would be even better of XML would allow a forest as top level structure, but I think that is never going to happen.

也就是说,数据库备份可能不是XML的正确用例。如果您必须处理XML数据库备份,我将尝试尽快摆脱这些备份。对于日志也是一样的——线性过程应该简单地附加一些东西。我的意思是,如果XML允许森林作为顶层结构,那就更好了,但我认为这永远不会发生。

#3


1  

XMLMax Virtual xml editor will read, parse and display a 1 Gigabyte xml file in a treeview in about 30 seconds on a fast PC. Windows OS only. It will work with xml of any size or structure.

XMLMax虚拟xml编辑器将在快速PC上在treeview中读取、解析和显示一个1g的xml文件,时间大约为30秒。仅Windows操作系统。它将与任何大小或结构的xml一起工作。

#4


0  

HI, i don't know what programing language you are using but in C# using XMLReader i can read the file tag by tag and not the whole file. This way you can read only the first page and stop the reading. Best Regards, Iordan

嗨,我不知道你在用什么编程语言,但是在c#中使用XMLReader,我可以逐标记读取文件标记,而不是整个文件。这样,你只能读第一页,停止阅读。最好的问候,Iordan

#5


0  

One way to alleviate this problem would be to split the large XML files into a number of smaller XML documents. Depending on the type of data you may split or partition the file any number of ways (i.e. Day, Transaction, Entity, etc)

缓解这个问题的一种方法是将大型XML文件分割成许多较小的XML文档。根据数据的类型,您可以通过多种方式对文件进行分割或分区(例如,日、事务、实体等)

This will introduce a number of other challenges of course. For instance you will have to come up with a specialized parser if you need to view the data as a whole or across partitions.

这当然会带来其他一些挑战。例如,如果需要将数据作为一个整体或跨分区查看,就必须使用专门的解析器。