xml解析中什么更快:元素或属性?

时间:2022-12-01 11:00:12

I am writing code that parses XML.

我正在编写解析XML的代码。

I would like to know what is faster to parse: elements or attributes.

我想知道什么是更快的解析:元素或属性。

This will have a direct effect over my XML design.

这将对我的XML设计产生直接影响。

Please target the answers to C# and the differences between LINQ and XmlReader.

请给出c#的答案以及LINQ和XmlReader之间的差异。

Thanks.

谢谢。

3 个解决方案

#1


3  

With XML, speed is dependent on a lot of factors.

对于XML,速度依赖于许多因素。

With regards to attributes or elements, pick the one that more closely matches the data. As a guideline, we use attributes for, well, attributes of an object; and elements for contained sub objects.

对于属性或元素,选择更接近数据的那个。作为指导原则,我们对对象的属性使用属性;以及包含子对象的元素。

Depending on the amount of data you are talking about using attributes can save you a bit on the size of your xml streams. For example, <person id="123" /> is smaller than <person><id>123</id></person> This doesn't really impact the parsing, but will impact the speed of sending the data across a network wire or loading it from disk... If we are talking about thousands of such records then it may make a difference to your application.

根据您正在讨论的使用属性的数据量,可以在xml流的大小上节省一点。例如, 小于 123 这并不会真正影响解析,但会影响跨网络连接发送数据或从磁盘加载数据的速度……如果我们讨论的是数千条这样的记录,那么它可能会对您的应用程序产生影响。

Of course, if that actually does make a difference then using JSON or some binary representation is probably a better way to go.

当然,如果这确实有影响,那么使用JSON或二进制表示可能是更好的选择。

The first question you need to ask is whether XML is even required. If it doesn't need to be human readable then binary is probably better. Heck, a CSV or even a fixed-width file might be better.

您需要问的第一个问题是是否需要XML。如果它不需要人类可读,那么二进制可能更好。更糟糕的是,一个CSV文件甚至一个固定宽度的文件可能会更好。

With regards to LINQ vs XmlReader, this is going to boil down to what you do with the data as you are parsing it. Do you need to instantiate a bunch of objects and handle them that way or do you just need to read the stream as it comes in? You might even find that just doing basic string manipulation on the data might be the easiest/best way to go.

关于LINQ vs XmlReader,这将归结为在解析数据时如何处理数据。您需要实例化一堆对象并以这种方式处理它们,还是只需要在流进入时读取它?您甚至可能会发现,对数据进行基本的字符串操作可能是最简单的方法。

Point is, you will probably need to examine the strengths of each approach beyond just "what parses faster".

要点是,您可能需要检查每种方法的优点,而不仅仅是“更快解析的内容”。

#2


4  

Design your XML schema so that representation of the information actually makes sense. Usually, the decision between making something in attribute or an element will not affect performance.

设计您的XML模式,这样信息的表示实际上是有意义的。通常,在属性或元素之间做决定不会影响性能。

Performance problems with XML are in most cases related to large amounts of data that are represented in a very verbose XML dialect. A typical countermeasures is to zip the XML data when storing or transmitting them over the wire.

在大多数情况下,XML的性能问题与大量数据有关,这些数据用非常冗长的XML方言表示。典型的对策是在存储或传输XML数据时将其压缩。

If that is not sufficient then switching to another format such as JSON, ASN.1 or a custom binary format might be the way to go.

如果这还不够,那么可以切换到另一种格式,如JSON、ASN.1或自定义二进制格式。

Addressing the second part of your question: The main difference between the XDocument (LINQ) and the XmlReader class is that the XDocument class builds a full document object model (DOM) in memory, which might be an expensive operation, whereas the XmlReader class gives you a tokenized stream on the input document.

解决你的问题的第二部分:主要区别XDocument(LINQ)和XmlReader类是XDocument类构建一个完整的文档对象模型(DOM)在内存中,这可能是一项昂贵的操作,而XmlReader类给你输入文档的标记化的流。

#3


1  

Without having any hard numbers to prove it, I know that the WCF team at Microsoft chose to make the DataContractSerializer their standard for WCF. It's limited in that it doesn't support XML attributes, but it is indeed up to 10-15% faster than the XmlSerializer.

我知道微软的WCF团队选择将DataContractSerializer作为WCF的标准,而没有任何硬性数据来证明这一点。它不支持XML属性,但实际上比XmlSerializer快10-15%。

From that information, I would assume that using XML attributes will be slower to parse than if you use only XML elements.

根据这些信息,我假设使用XML属性比只使用XML元素解析要慢。

#1


3  

With XML, speed is dependent on a lot of factors.

对于XML,速度依赖于许多因素。

With regards to attributes or elements, pick the one that more closely matches the data. As a guideline, we use attributes for, well, attributes of an object; and elements for contained sub objects.

对于属性或元素,选择更接近数据的那个。作为指导原则,我们对对象的属性使用属性;以及包含子对象的元素。

Depending on the amount of data you are talking about using attributes can save you a bit on the size of your xml streams. For example, <person id="123" /> is smaller than <person><id>123</id></person> This doesn't really impact the parsing, but will impact the speed of sending the data across a network wire or loading it from disk... If we are talking about thousands of such records then it may make a difference to your application.

根据您正在讨论的使用属性的数据量,可以在xml流的大小上节省一点。例如, 小于 123 这并不会真正影响解析,但会影响跨网络连接发送数据或从磁盘加载数据的速度……如果我们讨论的是数千条这样的记录,那么它可能会对您的应用程序产生影响。

Of course, if that actually does make a difference then using JSON or some binary representation is probably a better way to go.

当然,如果这确实有影响,那么使用JSON或二进制表示可能是更好的选择。

The first question you need to ask is whether XML is even required. If it doesn't need to be human readable then binary is probably better. Heck, a CSV or even a fixed-width file might be better.

您需要问的第一个问题是是否需要XML。如果它不需要人类可读,那么二进制可能更好。更糟糕的是,一个CSV文件甚至一个固定宽度的文件可能会更好。

With regards to LINQ vs XmlReader, this is going to boil down to what you do with the data as you are parsing it. Do you need to instantiate a bunch of objects and handle them that way or do you just need to read the stream as it comes in? You might even find that just doing basic string manipulation on the data might be the easiest/best way to go.

关于LINQ vs XmlReader,这将归结为在解析数据时如何处理数据。您需要实例化一堆对象并以这种方式处理它们,还是只需要在流进入时读取它?您甚至可能会发现,对数据进行基本的字符串操作可能是最简单的方法。

Point is, you will probably need to examine the strengths of each approach beyond just "what parses faster".

要点是,您可能需要检查每种方法的优点,而不仅仅是“更快解析的内容”。

#2


4  

Design your XML schema so that representation of the information actually makes sense. Usually, the decision between making something in attribute or an element will not affect performance.

设计您的XML模式,这样信息的表示实际上是有意义的。通常,在属性或元素之间做决定不会影响性能。

Performance problems with XML are in most cases related to large amounts of data that are represented in a very verbose XML dialect. A typical countermeasures is to zip the XML data when storing or transmitting them over the wire.

在大多数情况下,XML的性能问题与大量数据有关,这些数据用非常冗长的XML方言表示。典型的对策是在存储或传输XML数据时将其压缩。

If that is not sufficient then switching to another format such as JSON, ASN.1 or a custom binary format might be the way to go.

如果这还不够,那么可以切换到另一种格式,如JSON、ASN.1或自定义二进制格式。

Addressing the second part of your question: The main difference between the XDocument (LINQ) and the XmlReader class is that the XDocument class builds a full document object model (DOM) in memory, which might be an expensive operation, whereas the XmlReader class gives you a tokenized stream on the input document.

解决你的问题的第二部分:主要区别XDocument(LINQ)和XmlReader类是XDocument类构建一个完整的文档对象模型(DOM)在内存中,这可能是一项昂贵的操作,而XmlReader类给你输入文档的标记化的流。

#3


1  

Without having any hard numbers to prove it, I know that the WCF team at Microsoft chose to make the DataContractSerializer their standard for WCF. It's limited in that it doesn't support XML attributes, but it is indeed up to 10-15% faster than the XmlSerializer.

我知道微软的WCF团队选择将DataContractSerializer作为WCF的标准,而没有任何硬性数据来证明这一点。它不支持XML属性,但实际上比XmlSerializer快10-15%。

From that information, I would assume that using XML attributes will be slower to parse than if you use only XML elements.

根据这些信息,我假设使用XML属性比只使用XML元素解析要慢。