什么时候需要转义XML中的字符?

时间:2022-01-03 22:27:28

When should we replace < > & " ' in XML to characters like &lt etc.

什么时候我们应该把< > & >用XML替换成诸如&lt之类的字符。

My understanding is that it's just to make sure that if the content part of XML has > < the parser will not treat is start or end of a tag.

我的理解是,它只是确保如果XML的内容部分有> <解析器不会处理标记的开始或结束。< p>

Also, if I have a XML like:

另外,如果我有一个XML,比如:

<hello>mor>ning<hello>

should this be replaced to either:

是否应该将其替换为:

  • &lthello&gtmor&gtning&lthello&gt
  • &lthello&gtmor&gtning&lthello>
  • &lthello&gtmor>ning&lthello&gt
  • &lthello&gtmor > ning&lthello>
  • <hello>mor&gtning<hello>
  • <你好> mor&gtning <你好>

I don't understand why replacing is needed. When exactly is it required and what exactly (tags or text) should be replaced?

我不明白为什么需要更换。什么时候需要它,什么(标签或文本)应该被替换?

5 个解决方案

#1


7  

<, >, &, " and ' all have special meanings in XML (such as "start of entity" or "attribute value delimiter").

<、>、&、"和"在XML中都有特殊含义(如“实体的开始”或“属性值分隔符”)。

In order to have those characters appear as data (instead of for their special meaning) they can be represented by entities (&lt; for < and so on).

为了使这些字符显示为数据(而不是它们的特殊含义),它们可以由实体表示(<for <等等)。< p>

Sometimes those special meanings are context sensitive (e.g. " doesn't mean "attribute delimiter" outside of a tag) and there are places where they can appear raw as data. Rather then worry about those exceptions, it is simplest to just always represent them as entities if you want to avoid their special meaning. Then the only gotcha is explicit CDATA sections where the special meaning doesn't hold (and & won't start an entity).

有时这些特殊的意义是上下文敏感的。并不是指标签之外的“属性分隔符”,而且有些地方可以作为数据显示原始数据。相反,如果您想避免它们的特殊含义,那么最简单的方法就是将它们作为实体来表示。然后,唯一的问题是显式的CDATA节,其中特殊的含义不成立(并且&不会启动一个实体)。

should this be replaced to either

这个应该被替换为任意一个吗

It shouldn't be represented as any of those. Entities must be terminated with a semi-colon.

它不应该被表示为这些中的任何一个。实体必须以分号终止。

How you should represent it depends on which bit of your example of data and which is markup. You haven't said, for example, if <hello> is supposed to be data or the start tag for a hello element.

如何表示它取决于数据示例的哪一部分和标记。例如,您没有说过,如果 应该是数据或hello元素的开始标记。

#2


7  

Section 2.4 of the XML Specification clearly states:

XML规范的第2.4节明确指出:

The ampersand character (&) and the left angle bracket (<) must not appear in their literal form, except when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section. If they are needed elsewhere, they must be escaped using either numeric character references or the strings " &amp; " and " &lt; " respectively. The right angle bracket (>) may be represented using the string " &gt; ", and must, for compatibility, be escaped using either " &gt; " or a character reference when it appears in the string " ]]> " in content, when that string is not marking the end of a CDATA section.

除作为标记分隔符、注释、处理指令或CDATA部分之外,符号(&)和左角括号(<)不能以它们的文字形式出现。如果在其他地方需要它们,则必须使用数字字符引用或字符串“&”和“& lt;分别”。右尖括号(>)可以用字符串“>”表示;为了兼容性,必须使用“>“或字符串中出现的字符引用”]>“在内容中,当该字符串没有标记CDATA部分的末尾时。

#3


4  

You have to encode all characters that have a special meaning in XML but should not be interpreted by the parser.

您必须对所有在XML中具有特殊含义的字符进行编码,但不应由解析器解释。

Assuming your XML is

假设你的XML

<hello>mor>ning</hello> 

you would encode it as

你可以把它编码成

<hello>mor&gt;ning</hello>

or use a CDATA [Wikipedia] section:

或使用CDATA [Wikipedia]部分:

<hello><![CDATA[mor>ning]]></hello>

#4


1  

You can see this explanation enter link description here but basically, characters like < and > are important when parsing the xml document. If extra of these special characters are included in the xml node text or attribute text, the parser will not be able to properly understand the document. If you are sending xml to some web service, all of the special characters should be properly escaped.

您可以在这里看到这个说明输入链接描述,但是基本上,在解析xml文档时, <和> 等字符是很重要的。如果xml节点文本或属性文本中包含了这些特殊字符中的额外字符,解析器将无法正确理解文档。如果要将xml发送到某些web服务,则应该正确地转义所有特殊字符。

#5


1  

https://github.com/savonrb/gyoku/blob/master/README.md

https://github.com/savonrb/gyoku/blob/master/README.md

You can use Gyoku not to escape the characters in CDATA.

您可以使用Gyoku不转义CDATA中的字符。

#1


7  

<, >, &, " and ' all have special meanings in XML (such as "start of entity" or "attribute value delimiter").

<、>、&、"和"在XML中都有特殊含义(如“实体的开始”或“属性值分隔符”)。

In order to have those characters appear as data (instead of for their special meaning) they can be represented by entities (&lt; for < and so on).

为了使这些字符显示为数据(而不是它们的特殊含义),它们可以由实体表示(<for <等等)。< p>

Sometimes those special meanings are context sensitive (e.g. " doesn't mean "attribute delimiter" outside of a tag) and there are places where they can appear raw as data. Rather then worry about those exceptions, it is simplest to just always represent them as entities if you want to avoid their special meaning. Then the only gotcha is explicit CDATA sections where the special meaning doesn't hold (and & won't start an entity).

有时这些特殊的意义是上下文敏感的。并不是指标签之外的“属性分隔符”,而且有些地方可以作为数据显示原始数据。相反,如果您想避免它们的特殊含义,那么最简单的方法就是将它们作为实体来表示。然后,唯一的问题是显式的CDATA节,其中特殊的含义不成立(并且&不会启动一个实体)。

should this be replaced to either

这个应该被替换为任意一个吗

It shouldn't be represented as any of those. Entities must be terminated with a semi-colon.

它不应该被表示为这些中的任何一个。实体必须以分号终止。

How you should represent it depends on which bit of your example of data and which is markup. You haven't said, for example, if <hello> is supposed to be data or the start tag for a hello element.

如何表示它取决于数据示例的哪一部分和标记。例如,您没有说过,如果 应该是数据或hello元素的开始标记。

#2


7  

Section 2.4 of the XML Specification clearly states:

XML规范的第2.4节明确指出:

The ampersand character (&) and the left angle bracket (<) must not appear in their literal form, except when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section. If they are needed elsewhere, they must be escaped using either numeric character references or the strings " &amp; " and " &lt; " respectively. The right angle bracket (>) may be represented using the string " &gt; ", and must, for compatibility, be escaped using either " &gt; " or a character reference when it appears in the string " ]]> " in content, when that string is not marking the end of a CDATA section.

除作为标记分隔符、注释、处理指令或CDATA部分之外,符号(&)和左角括号(<)不能以它们的文字形式出现。如果在其他地方需要它们,则必须使用数字字符引用或字符串“&”和“& lt;分别”。右尖括号(>)可以用字符串“>”表示;为了兼容性,必须使用“>“或字符串中出现的字符引用”]>“在内容中,当该字符串没有标记CDATA部分的末尾时。

#3


4  

You have to encode all characters that have a special meaning in XML but should not be interpreted by the parser.

您必须对所有在XML中具有特殊含义的字符进行编码,但不应由解析器解释。

Assuming your XML is

假设你的XML

<hello>mor>ning</hello> 

you would encode it as

你可以把它编码成

<hello>mor&gt;ning</hello>

or use a CDATA [Wikipedia] section:

或使用CDATA [Wikipedia]部分:

<hello><![CDATA[mor>ning]]></hello>

#4


1  

You can see this explanation enter link description here but basically, characters like < and > are important when parsing the xml document. If extra of these special characters are included in the xml node text or attribute text, the parser will not be able to properly understand the document. If you are sending xml to some web service, all of the special characters should be properly escaped.

您可以在这里看到这个说明输入链接描述,但是基本上,在解析xml文档时, <和> 等字符是很重要的。如果xml节点文本或属性文本中包含了这些特殊字符中的额外字符,解析器将无法正确理解文档。如果要将xml发送到某些web服务,则应该正确地转义所有特殊字符。

#5


1  

https://github.com/savonrb/gyoku/blob/master/README.md

https://github.com/savonrb/gyoku/blob/master/README.md

You can use Gyoku not to escape the characters in CDATA.

您可以使用Gyoku不转义CDATA中的字符。