XML与逗号分隔的文本文件

时间:2023-01-15 09:41:21

Ok, I've read a couple books on XML and wrote programs to spit it out and what not. But here's the question. Both a comma delimited file and a XML file are "human readable." But in general, the comma delimited file is much easier on my eyes than a XML file; the tags typically take up as much if not more space than the data. This just seems to obscure what I'm reading and the format can take a page to contain the same information that you can contain on a single line of text in a comma delimited file. And a comma delimited file is significantly less complex to parse. So the real question is why XML? Just because all the cool kids are doing it?

好的,我已经阅读了几本关于XML的书籍并编写了程序来吐出来,什么不是。但这是问题所在。逗号分隔文件和XML文件都是“人类可读的”。但总的来说,逗号分隔文件在我看来比XML文件容易得多;标签通常占用尽可能多的空间而不是数据。这似乎模糊了我正在阅读的内容,格式可以使页面包含相同的信息,您可以在逗号分隔文件中的单行文本中包含这些信息。并且以逗号分隔的文件解析起来要复杂得多。那么真正的问题是为什么要使用XML?只是因为所有酷孩子都在做这件事?

12 个解决方案

#1


11  

These aren't the only two options, you can also use JSON or YAML which are much lighter weight than xml.

这些不是唯一的两个选项,你也可以使用比xml轻得多的JSON或YAML。

In general, if you have simple tabular data with out many special characters, CSV isn't a bad choice. For structured data, consider using one of the other 3.

通常,如果您有简单的表格数据,但没有很多特殊字符,那么CSV不是一个糟糕的选择。对于结构化数据,请考虑使用其他3中的一个。

#2


16  

Advantages

优点

A number of advantages XML has over CSV:

XML优于CSV的许多优点:

  • Hierarchical data organization
  • 分层数据组织
  • Automatic data validation (XML Schemas or DTDs)
  • 自动数据验证(XML Schema或DTD)
  • Easily convert formats (using XSL)
  • 轻松转换格式(使用XSL)
  • Easy to identify relational structure
  • 易于识别关系结构
  • Can be used in combination with XML-RPC
  • 可以与XML-RPC结合使用
  • Suitable for object persistence (marshalling)
  • 适合对象持久性(编组)
  • Simplifies business-to-business communications
  • 简化企业对企业的通信
  • Helpful related technologies (XPath, DOM)
  • 有用的相关技术(XPath,DOM)
  • Tight integration with modern Web browsers
  • 与现代Web浏览器紧密集成
  • Extract, Transform, and Load (ETL) tools
  • 提取,转换和加载(ETL)工具
  • Backwards file format compatibility (version attribute)
  • 向后文件格式兼容性(版本属性)
  • Digital signatures
  • 数字签名

It completely depends on the problem domain and what you are trying to solve.

它完全取决于问题域以及您要解决的问题。

Example

The last item is something that many people miss when writing web pages. Consider the situation where you have a large data store of songs. Songs have artists, albums, beats per minute, and so forth. You could export the data to XML, write a simple stylesheet to render the XML as XHTML, then point the browser at the XML page. The browser will render the XML as a web page.

最后一项是许多人在编写网页时遗漏的内容。考虑一下你有一个大型歌曲数据存储的情况。歌曲有艺术家,专辑,每分钟节拍等等。您可以将数据导出到XML,编写一个简单的样式表以将XML呈现为XHTML,然后将浏览器指向XML页面。浏览器将XML呈现为网页。

You cannot do that with CSV.

你不能用CSV做到这一点。

Disadvantages

缺点

Joel Spolsky has a great article on why XML is a poor choice as a complex data store: it is slow. (Unlike a database, which can retrieve previous or next records with a single CPU instruction, traversing records in an XML document is much slower.) Arguably, this could be considered an optimization problem, resolved by waiting 18 months. Thus:

Joel Spolsky有一篇很好的文章,说明为什么XML作为一个复杂的数据存储是一个糟糕的选择:它很慢。 (与数据库不同,数据库可以使用单个CPU指令检索上一个或下一个记录,遍历XML文档中的记录要慢得多。)可以说,这可以被视为一个优化问题,通过等待18个月来解决。从而:

  • Slower to parse than other formats
  • 解析比其他格式更慢
  • Syntactical redundancy can detract from readability
  • 语法冗余会降低可读性
  • Document bloat could affect storage costs
  • 文档膨胀可能会影响存储成本
  • Cannot easily model overlapping (non-hierarchical) data structures
  • 无法轻松建模重叠(非分层)数据结构
  • Poorly designed XML file formats are not uncommon (in my experience; citation needed)
  • 设计糟糕的XML文件格式并不罕见(根据我的经验;需要引用)

Related Question

相关问题

See also: Why Should I Use A Human Readable File Format.

另请参阅:为什么我应该使用人类可读文件格式。

#3


6  

XML supports complex, structured and hierarchical representation of things. That's far from what CSV can store trivially.

XML支持复杂,结构化和层次化的事物表示。这与CSV可以轻松存储的内容相去甚远。

Think about a complex object graph in an object oriented environment. It can be serialized as an XML document pretty easily but CSV cannot handle such a thing.

考虑面向对象环境中的复杂对象图。它可以非常容易地序列化为XML文档,但CSV无法处理这样的事情。

#4


4  

It all depends on what you need to do. If you need more complexity in your data structures than a simple "flat" row structure can give. for example hierarchical data, then XML is a great choice.

这一切都取决于你需要做什么。如果您的数据结构需要比简单的“平面”行结构更复杂。例如分层数据,那么XML是一个很好的选择。

#5


4  

Well XML is human readable and human editable. You can look at an XML file and know exactly what it is. A CSV file is human readable but you don't really know what each value means at all.

XML是人类可读和人类可编辑的。您可以查看XML文件并确切知道它是什么。 CSV文件是人类可读的,但您根本不知道每个值的含义。

For example, if we're storing user accounts, which would you prefer?

例如,如果我们存储用户帐户,您更喜欢哪个?

<user>
    <username>ryeguy</username>
    <password>abc123</password>
    <regdate>3-4-08</regdate>
    <email>my@email.com</email>
</user>

OR

要么

ryeguy,abc123,3-4-08,my@email.com

Of course, this is just an example, but imagine it with 30 fields or so!

当然,这只是一个例子,但想象它有30个左右的字段!

Or worse yet, what if we make subfields?

或者更糟糕的是,如果我们制作子场怎么办?

<user>
    <username>ryeguy</username>
    <password>abc123</password>
    <regdate>3-4-08</regdate>
    <email>my@email.com</email>
    <posts>
        <post>
            <id>34</id>
            ....
        </post>
    </posts>
</user>

That would be a pain in the ass to put in a CSV. Soon you'd be making your own querying language.

放入CSV会让人感到痛苦。很快你就会制作自己的查询语言了。

#6


3  

The fact that XML is human readable does not mean that has been made with the idea of having it read (or even edited) directly by humans.

XML是人类可读的这一事实并不意味着已经通过人类直接读取(甚至编辑)的想法。

XML has a nice set of properties that make it a good choice for many cases, in particular when you have the human resources to deal with the additional burden that such properties inevitably bring in: validation, well defined standard, a lot of tools, a very flexible architecture, it maps nicely to a tree model, which is what many programs use. Its human readability is an added value that simplifies debugging (try to do debugging of a binary file...), inspection and small changes for trivial cases.

XML具有一组很好的属性,使其成为许多情况下的理想选择,特别是当您拥有人力资源来处理此类属性不可避免地带来的额外负担时:验证,定义良好的标准,许多工具,非常灵活的架构,它很好地映射到树模型,这是许多程序使用的。它的人类可读性是一个附加价值,它简化了调试(尝试对二进制文件进行调试......),检查和微小案例的小改动。

CSV on the other hand is easy, quick and linear, although many dialects exist, and parsing it well is far from trivial (and with the added problem that it looks trivial!). For most applications involving table of data, CSV is the perfect choice.

另一方面,CSV是简单,快速和线性的,尽管存在许多方言,并且解析它远非微不足道(并且增加了它看起来微不足道的问题!)。对于涉及数据表的大多数应用程序,CSV是最佳选择。

In general, however, there are cases of data representation you can solve with XML but you cannot solve with CSV (for example, a tree). On the other hand, any data that can be represented in CSV can also be represented in XML, although it's not guaranteed (and indeed is also verified) that it will be more efficient (in terms of space, ease of parsing etc). It's a matter of "degrees of freedom" of your format. XML has a higher value of degree of freedom. CSV is lower. The hype behind XML is also relative to this fact.

但是,一般情况下,您可以使用XML解决数据表示的情况,但无法使用CSV解决(例如,树)。另一方面,任何可以用CSV表示的数据也可以用XML表示,尽管它不能保证(实际上也经过验证)它将更有效(在空间方面,易于解析等)。这是你的格式的“*度”问题。 XML具有更高的*度值。 CSV较低。 XML背后的炒作也与这一事实有关。

Don't fall victim of the hammer syndrome: when you have a hammer (XML), everything looks like a nail (something that you have to solve with XML). Reality is much different and nuanced. XML is cool, but it's not the answer to any problem.

不要成为锤子综合症的受害者:当你有一把锤子(XML)时,一切看起来都像钉子(你必须用XML解决的事情)。现实是非常不同和微妙的。 XML很酷,但它不是任何问题的答案。

#7


2  

CSV was never really a standard. Just the same quick and dirty method a bunch of people came up with independently. Of course, some of these people were smarter than others and realized you needed to escape characters but others didn't. Even MSSQL exports CSVs improperly. There is a documented RIGHT way to doing XML so if you're doing it right and someone's application or whatever isn't accepting it you have some clout when you say "That's not my fault."

CSV从来就不是真正的标准。同样快速而肮脏的方法,一群人独立提出。当然,其中一些人比其他人更聪明,并意识到你需要逃避角色,但其他人却没有。甚至MSSQL也不正确地导出CSV。有一种记录正确的XML方式,所以如果你做得对,某人的应用程序或任何不接受它的东西,当你说“那不是我的错”时,你会有一些影响力。

#8


2  

XML will describe the content and also has a ton of supporting libraries in a variety of languages... but it can be bloated. If the receiving end of the csv is aware of the layout and it is tabular, I don't see anything wrong with it.

XML将描述内容,并且还有各种语言的大量支持库......但它可能会膨胀。如果csv的接收端知道布局并且是表格式的,我认为它没有任何问题。

#9


1  

Xml can be validated against a contract (schema or DTD).

可以根据合同(模式或DTD)验证Xml。

#10


1  

XML also has complimentary technologies surrounding it: XmlDom, XPath, XSLT, XSD, Xml Schemas

XML还有其他免费技术:XmlDom,XPath,XSLT,XSD,Xml Schema

#11


1  

Among the reasons you may prefer XML over CSV (depends on the task at hand of course): * Almost all platforms and languages have existing libraries for reading, writing, parsing, and manipulating XML. * XML has well-defined rules for encoding all characters. CSV has ambiguities such as how to encode commas that are part of the data. * XML supports a variety of data shapes (like hierarchical) where as CSV is most useful when the data looks like a table (rows and columns).

您可能更喜欢XML而不是CSV的原因(当然取决于手头的任务):*几乎所有平台和语言都有现有的库,用于读取,编写,解析和操作XML。 * XML具有明确定义的编码所有字符的规则。 CSV具有歧义,例如如何编码作为数据一部分的逗号。 * XML支持各种数据形状(如分层),其中当数据看起来像表(行和列)时,CSV最有用。

#12


1  

I like to think of the primary distinction in this case as XML is TREE based, while CSV is TABLE-based.

我想在这种情况下考虑主要的区别,因为XML是基于TREE的,而CSV是基于TABLE的。

That is, you can nest and re-nest and omit and generally make a complex TREE structure in XML, whereas you can only make simple 2D tables in CSV.

也就是说,您可以嵌套并重新嵌套和省略,并且通常在XML中创建复杂的TREE结构,而您只能使用CSV制作简单的2D表。

#1


11  

These aren't the only two options, you can also use JSON or YAML which are much lighter weight than xml.

这些不是唯一的两个选项,你也可以使用比xml轻得多的JSON或YAML。

In general, if you have simple tabular data with out many special characters, CSV isn't a bad choice. For structured data, consider using one of the other 3.

通常,如果您有简单的表格数据,但没有很多特殊字符,那么CSV不是一个糟糕的选择。对于结构化数据,请考虑使用其他3中的一个。

#2


16  

Advantages

优点

A number of advantages XML has over CSV:

XML优于CSV的许多优点:

  • Hierarchical data organization
  • 分层数据组织
  • Automatic data validation (XML Schemas or DTDs)
  • 自动数据验证(XML Schema或DTD)
  • Easily convert formats (using XSL)
  • 轻松转换格式(使用XSL)
  • Easy to identify relational structure
  • 易于识别关系结构
  • Can be used in combination with XML-RPC
  • 可以与XML-RPC结合使用
  • Suitable for object persistence (marshalling)
  • 适合对象持久性(编组)
  • Simplifies business-to-business communications
  • 简化企业对企业的通信
  • Helpful related technologies (XPath, DOM)
  • 有用的相关技术(XPath,DOM)
  • Tight integration with modern Web browsers
  • 与现代Web浏览器紧密集成
  • Extract, Transform, and Load (ETL) tools
  • 提取,转换和加载(ETL)工具
  • Backwards file format compatibility (version attribute)
  • 向后文件格式兼容性(版本属性)
  • Digital signatures
  • 数字签名

It completely depends on the problem domain and what you are trying to solve.

它完全取决于问题域以及您要解决的问题。

Example

The last item is something that many people miss when writing web pages. Consider the situation where you have a large data store of songs. Songs have artists, albums, beats per minute, and so forth. You could export the data to XML, write a simple stylesheet to render the XML as XHTML, then point the browser at the XML page. The browser will render the XML as a web page.

最后一项是许多人在编写网页时遗漏的内容。考虑一下你有一个大型歌曲数据存储的情况。歌曲有艺术家,专辑,每分钟节拍等等。您可以将数据导出到XML,编写一个简单的样式表以将XML呈现为XHTML,然后将浏览器指向XML页面。浏览器将XML呈现为网页。

You cannot do that with CSV.

你不能用CSV做到这一点。

Disadvantages

缺点

Joel Spolsky has a great article on why XML is a poor choice as a complex data store: it is slow. (Unlike a database, which can retrieve previous or next records with a single CPU instruction, traversing records in an XML document is much slower.) Arguably, this could be considered an optimization problem, resolved by waiting 18 months. Thus:

Joel Spolsky有一篇很好的文章,说明为什么XML作为一个复杂的数据存储是一个糟糕的选择:它很慢。 (与数据库不同,数据库可以使用单个CPU指令检索上一个或下一个记录,遍历XML文档中的记录要慢得多。)可以说,这可以被视为一个优化问题,通过等待18个月来解决。从而:

  • Slower to parse than other formats
  • 解析比其他格式更慢
  • Syntactical redundancy can detract from readability
  • 语法冗余会降低可读性
  • Document bloat could affect storage costs
  • 文档膨胀可能会影响存储成本
  • Cannot easily model overlapping (non-hierarchical) data structures
  • 无法轻松建模重叠(非分层)数据结构
  • Poorly designed XML file formats are not uncommon (in my experience; citation needed)
  • 设计糟糕的XML文件格式并不罕见(根据我的经验;需要引用)

Related Question

相关问题

See also: Why Should I Use A Human Readable File Format.

另请参阅:为什么我应该使用人类可读文件格式。

#3


6  

XML supports complex, structured and hierarchical representation of things. That's far from what CSV can store trivially.

XML支持复杂,结构化和层次化的事物表示。这与CSV可以轻松存储的内容相去甚远。

Think about a complex object graph in an object oriented environment. It can be serialized as an XML document pretty easily but CSV cannot handle such a thing.

考虑面向对象环境中的复杂对象图。它可以非常容易地序列化为XML文档,但CSV无法处理这样的事情。

#4


4  

It all depends on what you need to do. If you need more complexity in your data structures than a simple "flat" row structure can give. for example hierarchical data, then XML is a great choice.

这一切都取决于你需要做什么。如果您的数据结构需要比简单的“平面”行结构更复杂。例如分层数据,那么XML是一个很好的选择。

#5


4  

Well XML is human readable and human editable. You can look at an XML file and know exactly what it is. A CSV file is human readable but you don't really know what each value means at all.

XML是人类可读和人类可编辑的。您可以查看XML文件并确切知道它是什么。 CSV文件是人类可读的,但您根本不知道每个值的含义。

For example, if we're storing user accounts, which would you prefer?

例如,如果我们存储用户帐户,您更喜欢哪个?

<user>
    <username>ryeguy</username>
    <password>abc123</password>
    <regdate>3-4-08</regdate>
    <email>my@email.com</email>
</user>

OR

要么

ryeguy,abc123,3-4-08,my@email.com

Of course, this is just an example, but imagine it with 30 fields or so!

当然,这只是一个例子,但想象它有30个左右的字段!

Or worse yet, what if we make subfields?

或者更糟糕的是,如果我们制作子场怎么办?

<user>
    <username>ryeguy</username>
    <password>abc123</password>
    <regdate>3-4-08</regdate>
    <email>my@email.com</email>
    <posts>
        <post>
            <id>34</id>
            ....
        </post>
    </posts>
</user>

That would be a pain in the ass to put in a CSV. Soon you'd be making your own querying language.

放入CSV会让人感到痛苦。很快你就会制作自己的查询语言了。

#6


3  

The fact that XML is human readable does not mean that has been made with the idea of having it read (or even edited) directly by humans.

XML是人类可读的这一事实并不意味着已经通过人类直接读取(甚至编辑)的想法。

XML has a nice set of properties that make it a good choice for many cases, in particular when you have the human resources to deal with the additional burden that such properties inevitably bring in: validation, well defined standard, a lot of tools, a very flexible architecture, it maps nicely to a tree model, which is what many programs use. Its human readability is an added value that simplifies debugging (try to do debugging of a binary file...), inspection and small changes for trivial cases.

XML具有一组很好的属性,使其成为许多情况下的理想选择,特别是当您拥有人力资源来处理此类属性不可避免地带来的额外负担时:验证,定义良好的标准,许多工具,非常灵活的架构,它很好地映射到树模型,这是许多程序使用的。它的人类可读性是一个附加价值,它简化了调试(尝试对二进制文件进行调试......),检查和微小案例的小改动。

CSV on the other hand is easy, quick and linear, although many dialects exist, and parsing it well is far from trivial (and with the added problem that it looks trivial!). For most applications involving table of data, CSV is the perfect choice.

另一方面,CSV是简单,快速和线性的,尽管存在许多方言,并且解析它远非微不足道(并且增加了它看起来微不足道的问题!)。对于涉及数据表的大多数应用程序,CSV是最佳选择。

In general, however, there are cases of data representation you can solve with XML but you cannot solve with CSV (for example, a tree). On the other hand, any data that can be represented in CSV can also be represented in XML, although it's not guaranteed (and indeed is also verified) that it will be more efficient (in terms of space, ease of parsing etc). It's a matter of "degrees of freedom" of your format. XML has a higher value of degree of freedom. CSV is lower. The hype behind XML is also relative to this fact.

但是,一般情况下,您可以使用XML解决数据表示的情况,但无法使用CSV解决(例如,树)。另一方面,任何可以用CSV表示的数据也可以用XML表示,尽管它不能保证(实际上也经过验证)它将更有效(在空间方面,易于解析等)。这是你的格式的“*度”问题。 XML具有更高的*度值。 CSV较低。 XML背后的炒作也与这一事实有关。

Don't fall victim of the hammer syndrome: when you have a hammer (XML), everything looks like a nail (something that you have to solve with XML). Reality is much different and nuanced. XML is cool, but it's not the answer to any problem.

不要成为锤子综合症的受害者:当你有一把锤子(XML)时,一切看起来都像钉子(你必须用XML解决的事情)。现实是非常不同和微妙的。 XML很酷,但它不是任何问题的答案。

#7


2  

CSV was never really a standard. Just the same quick and dirty method a bunch of people came up with independently. Of course, some of these people were smarter than others and realized you needed to escape characters but others didn't. Even MSSQL exports CSVs improperly. There is a documented RIGHT way to doing XML so if you're doing it right and someone's application or whatever isn't accepting it you have some clout when you say "That's not my fault."

CSV从来就不是真正的标准。同样快速而肮脏的方法,一群人独立提出。当然,其中一些人比其他人更聪明,并意识到你需要逃避角色,但其他人却没有。甚至MSSQL也不正确地导出CSV。有一种记录正确的XML方式,所以如果你做得对,某人的应用程序或任何不接受它的东西,当你说“那不是我的错”时,你会有一些影响力。

#8


2  

XML will describe the content and also has a ton of supporting libraries in a variety of languages... but it can be bloated. If the receiving end of the csv is aware of the layout and it is tabular, I don't see anything wrong with it.

XML将描述内容,并且还有各种语言的大量支持库......但它可能会膨胀。如果csv的接收端知道布局并且是表格式的,我认为它没有任何问题。

#9


1  

Xml can be validated against a contract (schema or DTD).

可以根据合同(模式或DTD)验证Xml。

#10


1  

XML also has complimentary technologies surrounding it: XmlDom, XPath, XSLT, XSD, Xml Schemas

XML还有其他免费技术:XmlDom,XPath,XSLT,XSD,Xml Schema

#11


1  

Among the reasons you may prefer XML over CSV (depends on the task at hand of course): * Almost all platforms and languages have existing libraries for reading, writing, parsing, and manipulating XML. * XML has well-defined rules for encoding all characters. CSV has ambiguities such as how to encode commas that are part of the data. * XML supports a variety of data shapes (like hierarchical) where as CSV is most useful when the data looks like a table (rows and columns).

您可能更喜欢XML而不是CSV的原因(当然取决于手头的任务):*几乎所有平台和语言都有现有的库,用于读取,编写,解析和操作XML。 * XML具有明确定义的编码所有字符的规则。 CSV具有歧义,例如如何编码作为数据一部分的逗号。 * XML支持各种数据形状(如分层),其中当数据看起来像表(行和列)时,CSV最有用。

#12


1  

I like to think of the primary distinction in this case as XML is TREE based, while CSV is TABLE-based.

我想在这种情况下考虑主要的区别,因为XML是基于TREE的,而CSV是基于TABLE的。

That is, you can nest and re-nest and omit and generally make a complex TREE structure in XML, whereas you can only make simple 2D tables in CSV.

也就是说,您可以嵌套并重新嵌套和省略,并且通常在XML中创建复杂的TREE结构,而您只能使用CSV制作简单的2D表。