使用XMLDocument对象在其值中使用嵌入的双引号解析xml属性。

时间:2022-09-15 15:30:52

This is a web project. I receive a partial html string from an external source. Using XMLDocument to parse it works well except when it encounters an attribute with embedded quotes such as the "style" attribute below.

这是一个web项目。我从外部源接收部分html字符串。使用XMLDocument来解析它可以很好地工作,除非遇到带有内嵌引号的属性,比如下面的“style”属性。

<span id="someId" style="font-family:"Calibri", Sans-Serif;">Some Text</span>

It seems as though (but I could be wrong) that LoadXml() thinks that the double-quote before Calibri ends the style attribute and that Calibri is another "token" (token is the term I get in the error message).

LoadXml()似乎认为Calibri之前的双引号结束了style属性,而Calibri是另一个“标记”(记号是我在错误消息中得到的术语)。

var xml = new XmlDocument();
xml.LoadXml(<the html string above, properly escaped>); // <--- here is where I get the error message below

"'Calibri' is an unexpected token. Expecting white space. Line 1, position 18."

I can use Regex to replace the inner quotes but it will be rather ugly. And, I may well end up doing it!

我可以使用Regex来替换内部引号,但它将非常难看。而且,我很可能最终会这么做!

I thought perhaps HtmlAgilityPack would help, but I couldn't find good documentation on it and I would rather avoid 3rd party libraries with sparse documentation.

我认为HtmlAgilityPack可能会有帮助,但是我找不到好的文档,我宁愿避免使用文档稀少的第三方库。

Is there a way to make LoadXml() accept it (and, subsequently, have the Attributes collection parse it correctly)? I don't have much hope for that, but I am throwing it out there anyways. Or should I be using another class altogether other than XmlDocument? I am open to using a 3rd party library with good documentation.

是否有一种方法可以让LoadXml()接受它(然后,将属性集合正确解析)?我对此没有太多的希望,但我还是把它扔了出去。还是应该使用XmlDocument之外的其他类?我愿意使用有良好文档的第三方图书馆。

1 个解决方案

#1


4  

That data is invalid. An attribute quoted with double quotes cannot contain double quotes in the attribute value. An attribute quoted with single quotes cannot have single quotes in the value.

这些数据是无效的。引用双引号的属性不能包含属性值中的双引号。引用单引号的属性不能在值中包含单引号。

Valid:

有效:

<tag attr1="value with 'single' quotes" attr2='value with "double" quotes' />

Invalid:

无效:

<tag attr1="value with "double" quotes" attr2='value with 'single' quotes' />

Note that the invalid example can be made valid as follows:

请注意,无效示例可以如下所示有效:

<tag attr1="value with &quot;double&quot; quotes" attr2='value with &apos;single&apos; quotes' />

#1


4  

That data is invalid. An attribute quoted with double quotes cannot contain double quotes in the attribute value. An attribute quoted with single quotes cannot have single quotes in the value.

这些数据是无效的。引用双引号的属性不能包含属性值中的双引号。引用单引号的属性不能在值中包含单引号。

Valid:

有效:

<tag attr1="value with 'single' quotes" attr2='value with "double" quotes' />

Invalid:

无效:

<tag attr1="value with "double" quotes" attr2='value with 'single' quotes' />

Note that the invalid example can be made valid as follows:

请注意,无效示例可以如下所示有效:

<tag attr1="value with &quot;double&quot; quotes" attr2='value with &apos;single&apos; quotes' />