XSLT。加载包含转义字符的xml文档

时间:2022-03-13 15:46:02

I use XSLT to transform an XML document which I then load on to a ASP.NET website. However, if the XML contains '<' characters, the XML becomes malformed.

我使用XSLT转换XML文档,然后将其加载到ASP.NET网站。但是,如果XML包含“<”字符,则XML会变得格式不正确。

<title><b> < left arrows <b></title>

If I use disable-output-escaping="yes", the XML cannot be loaded and I get the error "Name cannot begin with the '' character".

如果我使用disable-output-escaping =“yes”,则无法加载XML并且我收到错误“名称不能以''字符开头”。

If I do not disable output escaping the escaped characters are disregarded and the text appears as it is:

如果我不禁用输出转义,则会忽略转义字符,并且文本显示为:

<title><b> < left arrows <b></title>

I want the bold tags to work, but I also want to escape the '<' character. Ideally

我想要大胆的标签工作,但我也想逃避'<'字符。理想的情况下

<b>&lt; left arrows</b>

is what I want to achieve. Is there any solution for this?

是我想要实现的目标。这有什么解决方案吗?

3 个解决方案

#1


The XML should contain the escaped sequence for the less than sign (&lt;), not the literal < character. The XML is malformed and any XML parser must reject it.

XML应包含小于号(<)的转义序列,而不是文字 <字符。 xml格式错误,任何xml解析器都必须拒绝它。< p>

In XSLT you could generate that sequence like this:

在XSLT中,您可以生成如下序列:

<xsl:text>&amp;lt;<xsl:text>

#2


From what I understand, the input contains HTML and literal < characters. In that case, disable-output-escaping="yes" will preserve the HTML tags but produce invalid XML and setting it to no means the HTML tags will be escaped.

据我所知,输入包含HTML和文字 <字符。在这种情况下,disable-output-escaping =“yes”将保留html标记但生成无效的xml并将其设置为no表示不会转义html标记。< p>

What you need to do is to leave set disable-output-escaping="no" (which is the default, you don't actually have to add that) and add a XSLT rule that will copy the HTML tags. For instance:

您需要做的是保留set disable-output-escaping =“no”(这是默认设置,您实际上不必添加它)并添加将复制HTML标记的XSLT规则。例如:

<xsl:template match="*">
    <xsl:copy>
        <xsl:copy-of select="@*" />
        <xsl:apply-templates />
    </xsl:copy>
</xsl:template>

#3


I came up with a solution and was triggered by the last answer by Josh. Thanks Josh. I tried to used the match template, however I had a problem as the html tags are placed within cdata, so I had difficulties doing a match. There might be a way to do it, but I gave up on that.

我提出了一个解决方案,由Josh的最后一个答案触发。谢谢乔希。我试图使用匹配模板,但是我有一个问题,因为html标签放在cdata中,所以我很难做匹配。可能有办法做到这一点,但我放弃了。

What I did was to do a test="contain($text, $replace)" where the $replace is the '<' character and on top of that, I also added a condition to test if the substring after the '<' is a relevant html tag such that it is actually a <b> or </b>. So if it's just a '<' character not belonging to any html tags, I will convert '<' to ampersand, &amp;lt;. Basically that solved my problem. Hope this is useful to anyone who encounter the same problem as me.

我做的是做一个test =“contains($ text,$ replace)”,其中$ replace是'<'字符,除此之外,我还添加了一个条件来测试'<'之后的子字符串是一个相关的html标记,实际上它是或 。因此,如果它只是一个不属于任何html标签的'<'字符,我会将'<'转换为&符号,& lt;。基本上解决了我的问题。希望这对遇到与我相同问题的任何人都有用。

#1


The XML should contain the escaped sequence for the less than sign (&lt;), not the literal < character. The XML is malformed and any XML parser must reject it.

XML应包含小于号(<)的转义序列,而不是文字 <字符。 xml格式错误,任何xml解析器都必须拒绝它。< p>

In XSLT you could generate that sequence like this:

在XSLT中,您可以生成如下序列:

<xsl:text>&amp;lt;<xsl:text>

#2


From what I understand, the input contains HTML and literal < characters. In that case, disable-output-escaping="yes" will preserve the HTML tags but produce invalid XML and setting it to no means the HTML tags will be escaped.

据我所知,输入包含HTML和文字 <字符。在这种情况下,disable-output-escaping =“yes”将保留html标记但生成无效的xml并将其设置为no表示不会转义html标记。< p>

What you need to do is to leave set disable-output-escaping="no" (which is the default, you don't actually have to add that) and add a XSLT rule that will copy the HTML tags. For instance:

您需要做的是保留set disable-output-escaping =“no”(这是默认设置,您实际上不必添加它)并添加将复制HTML标记的XSLT规则。例如:

<xsl:template match="*">
    <xsl:copy>
        <xsl:copy-of select="@*" />
        <xsl:apply-templates />
    </xsl:copy>
</xsl:template>

#3


I came up with a solution and was triggered by the last answer by Josh. Thanks Josh. I tried to used the match template, however I had a problem as the html tags are placed within cdata, so I had difficulties doing a match. There might be a way to do it, but I gave up on that.

我提出了一个解决方案,由Josh的最后一个答案触发。谢谢乔希。我试图使用匹配模板,但是我有一个问题,因为html标签放在cdata中,所以我很难做匹配。可能有办法做到这一点,但我放弃了。

What I did was to do a test="contain($text, $replace)" where the $replace is the '<' character and on top of that, I also added a condition to test if the substring after the '<' is a relevant html tag such that it is actually a <b> or </b>. So if it's just a '<' character not belonging to any html tags, I will convert '<' to ampersand, &amp;lt;. Basically that solved my problem. Hope this is useful to anyone who encounter the same problem as me.

我做的是做一个test =“contains($ text,$ replace)”,其中$ replace是'<'字符,除此之外,我还添加了一个条件来测试'<'之后的子字符串是一个相关的html标记,实际上它是或 。因此,如果它只是一个不属于任何html标签的'<'字符,我会将'<'转换为&符号,& lt;。基本上解决了我的问题。希望这对遇到与我相同问题的任何人都有用。