使用XPATH搜索包含

时间:2021-09-27 22:17:03

I use XPather Browser to check my XPATH expressions on an HTML page.

我使用XPather浏览器检查HTML页面上的XPATH表达式。

My end goal is to use these expressions in Selenium for the testing of my user interfaces.

我的最终目标是在Selenium中使用这些表达式来测试用户界面。

I got an HTML file with a content similar to this:

我得到了一个HTML文件,其内容与此类似:

<tr>
  <td>abc</td>
  <td>&nbsp;</td>
</tr>

I want to select a node with a text containing the string "&nbsp;".

我要选择一个包含字符串“& !”的文本的节点。

With a normal string like "abc" there is no problem. I use an XPATH similar to //td[text()="abc"].

对于像“abc”这样的普通字符串,没有问题。我使用的XPATH类似于//td[text()="abc"]。

When I try with an an XPATH like //td[text()="&nbsp;"] it returns nothing. Is there a special rule concerning texts with "&" ?

当我尝试使用XPATH //td[text()="& ";"]它返回。关于带有“&”的文本是否有特别的规定?

6 个解决方案

#1


82  

It seems that OpenQA, guys behind Selenium, have already addressed this problem. They defined some variables to explicitely match whitespaces. In my case, I need to use an XPATH similar to //td[text()="${nbsp}"].

OpenQA, Selenium的支持者,似乎已经解决了这个问题。他们定义了一些变量来显式地匹配白空间。在我的例子中,我需要使用与//td相似的XPATH [text()="${}"。

I reproduced here the text from OpenQA concerning this issue (found here):

我在这里复制了OpenQA关于这个问题的文本(见这里):

HTML automatically normalizes whitespace within elements, ignoring leading/trailing spaces and converting extra spaces, tabs and newlines into a single space. When Selenium reads text out of the page, it attempts to duplicate this behavior, so you can ignore all the tabs and newlines in your HTML and do assertions based on how the text looks in the browser when rendered. We do this by replacing all non-visible whitespace (including the non-breaking space "&nbsp;") with a single space. All visible newlines (<br>, <p>, and <pre> formatted new lines) should be preserved.

HTML自动规范化元素中的空格,忽略前导/后置空格,并将额外的空格、制表符和换行符转换为单个空格。当Selenium从页面中读取文本时,它试图复制该行为,因此您可以忽略HTML中的所有选项卡和换行,并根据呈现的文本在浏览器中的外观进行断言。我们通过将所有不可见的空格(包括不间断的空格“&之前”)替换为单个空格来实现这一点。应该保留所有可见的新行(

格式化的新行)。

We use the same normalization logic on the text of HTML Selenese test case tables. This has a number of advantages. First, you don't need to look at the HTML source of the page to figure out what your assertions should be; "&nbsp;" symbols are invisible to the end user, and so you shouldn't have to worry about them when writing Selenese tests. (You don't need to put "&nbsp;" markers in your test case to assertText on a field that contains "&nbsp;".) You may also put extra newlines and spaces in your Selenese <td> tags; since we use the same normalization logic on the test case as we do on the text, we can ensure that assertions and the extracted text will match exactly.

我们在HTML Selenese测试用例表的文本上使用相同的规范化逻辑。这有许多优点。首先,您不需要查看页面的HTML源代码来确定您的断言应该是什么;对于最终用户来说,符号是不可见的,所以在编写Selenese测试时,您不必担心它们。(您不需要在测试用例中添加“& t”标记来断言包含“& t”的字段。)您还可以在Selenese 标签中添加额外的新行和空格;由于我们在测试用例上使用与在文本上使用相同的规范化逻辑,所以我们可以确保断言和提取的文本将完全匹配。

This creates a bit of a problem on those rare occasions when you really want/need to insert extra whitespace in your test case. For example, you may need to type text in a field like this: "foo ". But if you simply write <td>foo </td> in your Selenese test case, we'll replace your extra spaces with just one space.

这在您真正希望/需要在测试用例中插入额外空格的情况下会产生一些问题。例如,您可能需要在这样的字段中输入文本:“foo”。但是如果您只是在Selenese测试用例中编写foo ,那么我们将用一个空间替换您的额外空间。

This problem has a simple workaround. We've defined a variable in Selenese, ${space}, whose value is a single space. You can use ${space} to insert a space that won't be automatically trimmed, like this: <td>foo${space}${space}${space}</td>. We've also included a variable ${nbsp}, that you can use to insert a non-breaking space.

这个问题有一个简单的解决办法。我们在Selenese中定义了一个变量${space},它的值是一个单独的空间。可以使用${space}插入一个不会自动调整的空间,比如:foo${space}${space}${space} {space}}}}。我们还包含了一个变量${},您可以使用它来插入一个不间断的空间。

Note that XPaths do not normalize whitespace the way we do. If you need to write an XPath like //div[text()="hello world"] but the HTML of the link is really "hello&nbsp;world", you'll need to insert a real "&nbsp;" into your Selenese test case to get it to match, like this: //div[text()="hello${nbsp}world"].

注意,xpath不像我们那样规范化空格。如果您需要编写一个XPath,比如/div[text()="hello world"],但是链接的HTML实际上是" hello&c world",那么您需要在Selenese测试用例中插入一个真正的"&检验"来匹配它,比如:/div[text()="hello${}world"]]。

#2


16  

I found I can make the match when I input a hard-coded non-breaking space (U+00A0) by typing Alt+0160 on Windows between the two quotes...

当我输入一个硬编码的不间断空格(U+00A0)时,我发现我可以在两个引号之间的窗口输入Alt+0160。

//table[@id='TableID']//td[text()=' ']

worked for me with the special char.

用特殊的炭为我工作。

From what I understood, the XPath 1.0 standard doesn't handle escaping Unicode chars. There seems to be functions for that in XPath 2.0 but it looks like Firefox doesn't support it (or I misunderstood something). So you have to do with local codepage. Ugly, I know.

根据我的理解,XPath 1.0标准不处理转义Unicode字符。在XPath 2.0中似乎有这样的函数,但是看起来Firefox不支持它(或者我误解了什么)。所以你必须使用本地代码页。丑陋的,我知道。

Actually, it looks like the standard is relying on the programming language using XPath to provide the correct Unicode escape sequence... So, somehow, I did the right thing.

实际上,看起来标准是依靠使用XPath的编程语言来提供正确的Unicode转义序列……所以,不管怎样,我做了正确的事。

#3


3  

Try using the decimal entity &#160; instead of the named entity. If that doesn't work, you should be able to simply use the unicode character for a non-breaking space instead of the &nbsp; entity.

尝试使用十进制实体 而不是命名实体。如果这不起作用,您应该能够简单地将unicode字符用于一个不间断的空间,而不是&;实体。

(Note: I did not try this in XPather, but I did try it in Oxygen.)

(注:我没有在XPather尝试过这个,但我确实尝试过用氧。)

#4


1  

I cannot get a match using Xpather, but the following worked for me with plain XML and XSL files in Microsoft's XML Notepad:

我无法使用Xpather进行匹配,但是对于我来说,在Microsoft的XML记事本中使用纯XML和XSL文件是有效的:

<xsl:value-of select="count(//td[text()='&nbsp;'])" />

The value returned is 1, which is the correct value in my test case.

返回的值是1,这是我的测试用例中的正确值。

However, I did have to declare nbsp as an entity within my XML and XSL using the following:

但是,我必须在我的XML和XSL中声明作为一个实体:

<!DOCTYPE xsl:stylesheet [ <!ENTITY nbsp "&#160;"> ]>

I'm not sure if that helps you, but I was able to actually find nbsp using an XPath expression.

我不确定这是否对您有帮助,但是我确实能够使用XPath表达式找到了。

Edit: My code sample actually contains the characters '&nbsp;' but the JavaScript syntax highlight converts it to the space character. Don't be mislead!

编辑:我的代码示例实际上包含字符'& ',但是JavaScript语法突出显示将它转换为空格字符。不要误导!

#5


1  

Bear in mind that a standards-compliant XML processor will have replaced any entity references other than XML's five standard ones (&amp;, &gt;, &lt;, &apos;, &quot;) with the corresponding character in the target encoding by the time XPath expressions are evaluated. Given that behavior, PhiLho's and jsulak's suggestions are the way to go if you want to work with XML tools. When you enter &#160; in the XPath expression, it should be converted to the corresponding byte sequence before the XPath expression is applied.

请记住,符合标准的XML处理器会在评估XPath表达式时,用目标编码中的相应字符替换除XML的五个标准(& & > < ' ")之外的任何实体引用。考虑到这种行为,如果您想使用XML工具,PhiLho和jsulak的建议是可行的。当你进入& # 160;在XPath表达式中,应该在应用XPath表达式之前将其转换为相应的字节序列。

#6


0  

Search for &nbsp; or only nbsp - did you try this?

搜索,或者只有你试过这个?

#1


82  

It seems that OpenQA, guys behind Selenium, have already addressed this problem. They defined some variables to explicitely match whitespaces. In my case, I need to use an XPATH similar to //td[text()="${nbsp}"].

OpenQA, Selenium的支持者,似乎已经解决了这个问题。他们定义了一些变量来显式地匹配白空间。在我的例子中,我需要使用与//td相似的XPATH [text()="${}"。

I reproduced here the text from OpenQA concerning this issue (found here):

我在这里复制了OpenQA关于这个问题的文本(见这里):

HTML automatically normalizes whitespace within elements, ignoring leading/trailing spaces and converting extra spaces, tabs and newlines into a single space. When Selenium reads text out of the page, it attempts to duplicate this behavior, so you can ignore all the tabs and newlines in your HTML and do assertions based on how the text looks in the browser when rendered. We do this by replacing all non-visible whitespace (including the non-breaking space "&nbsp;") with a single space. All visible newlines (<br>, <p>, and <pre> formatted new lines) should be preserved.

HTML自动规范化元素中的空格,忽略前导/后置空格,并将额外的空格、制表符和换行符转换为单个空格。当Selenium从页面中读取文本时,它试图复制该行为,因此您可以忽略HTML中的所有选项卡和换行,并根据呈现的文本在浏览器中的外观进行断言。我们通过将所有不可见的空格(包括不间断的空格“&之前”)替换为单个空格来实现这一点。应该保留所有可见的新行(

格式化的新行)。

We use the same normalization logic on the text of HTML Selenese test case tables. This has a number of advantages. First, you don't need to look at the HTML source of the page to figure out what your assertions should be; "&nbsp;" symbols are invisible to the end user, and so you shouldn't have to worry about them when writing Selenese tests. (You don't need to put "&nbsp;" markers in your test case to assertText on a field that contains "&nbsp;".) You may also put extra newlines and spaces in your Selenese <td> tags; since we use the same normalization logic on the test case as we do on the text, we can ensure that assertions and the extracted text will match exactly.

我们在HTML Selenese测试用例表的文本上使用相同的规范化逻辑。这有许多优点。首先,您不需要查看页面的HTML源代码来确定您的断言应该是什么;对于最终用户来说,符号是不可见的,所以在编写Selenese测试时,您不必担心它们。(您不需要在测试用例中添加“& t”标记来断言包含“& t”的字段。)您还可以在Selenese 标签中添加额外的新行和空格;由于我们在测试用例上使用与在文本上使用相同的规范化逻辑,所以我们可以确保断言和提取的文本将完全匹配。

This creates a bit of a problem on those rare occasions when you really want/need to insert extra whitespace in your test case. For example, you may need to type text in a field like this: "foo ". But if you simply write <td>foo </td> in your Selenese test case, we'll replace your extra spaces with just one space.

这在您真正希望/需要在测试用例中插入额外空格的情况下会产生一些问题。例如,您可能需要在这样的字段中输入文本:“foo”。但是如果您只是在Selenese测试用例中编写foo ,那么我们将用一个空间替换您的额外空间。

This problem has a simple workaround. We've defined a variable in Selenese, ${space}, whose value is a single space. You can use ${space} to insert a space that won't be automatically trimmed, like this: <td>foo${space}${space}${space}</td>. We've also included a variable ${nbsp}, that you can use to insert a non-breaking space.

这个问题有一个简单的解决办法。我们在Selenese中定义了一个变量${space},它的值是一个单独的空间。可以使用${space}插入一个不会自动调整的空间,比如:foo${space}${space}${space} {space}}}}。我们还包含了一个变量${},您可以使用它来插入一个不间断的空间。

Note that XPaths do not normalize whitespace the way we do. If you need to write an XPath like //div[text()="hello world"] but the HTML of the link is really "hello&nbsp;world", you'll need to insert a real "&nbsp;" into your Selenese test case to get it to match, like this: //div[text()="hello${nbsp}world"].

注意,xpath不像我们那样规范化空格。如果您需要编写一个XPath,比如/div[text()="hello world"],但是链接的HTML实际上是" hello&c world",那么您需要在Selenese测试用例中插入一个真正的"&检验"来匹配它,比如:/div[text()="hello${}world"]]。

#2


16  

I found I can make the match when I input a hard-coded non-breaking space (U+00A0) by typing Alt+0160 on Windows between the two quotes...

当我输入一个硬编码的不间断空格(U+00A0)时,我发现我可以在两个引号之间的窗口输入Alt+0160。

//table[@id='TableID']//td[text()=' ']

worked for me with the special char.

用特殊的炭为我工作。

From what I understood, the XPath 1.0 standard doesn't handle escaping Unicode chars. There seems to be functions for that in XPath 2.0 but it looks like Firefox doesn't support it (or I misunderstood something). So you have to do with local codepage. Ugly, I know.

根据我的理解,XPath 1.0标准不处理转义Unicode字符。在XPath 2.0中似乎有这样的函数,但是看起来Firefox不支持它(或者我误解了什么)。所以你必须使用本地代码页。丑陋的,我知道。

Actually, it looks like the standard is relying on the programming language using XPath to provide the correct Unicode escape sequence... So, somehow, I did the right thing.

实际上,看起来标准是依靠使用XPath的编程语言来提供正确的Unicode转义序列……所以,不管怎样,我做了正确的事。

#3


3  

Try using the decimal entity &#160; instead of the named entity. If that doesn't work, you should be able to simply use the unicode character for a non-breaking space instead of the &nbsp; entity.

尝试使用十进制实体 而不是命名实体。如果这不起作用,您应该能够简单地将unicode字符用于一个不间断的空间,而不是&;实体。

(Note: I did not try this in XPather, but I did try it in Oxygen.)

(注:我没有在XPather尝试过这个,但我确实尝试过用氧。)

#4


1  

I cannot get a match using Xpather, but the following worked for me with plain XML and XSL files in Microsoft's XML Notepad:

我无法使用Xpather进行匹配,但是对于我来说,在Microsoft的XML记事本中使用纯XML和XSL文件是有效的:

<xsl:value-of select="count(//td[text()='&nbsp;'])" />

The value returned is 1, which is the correct value in my test case.

返回的值是1,这是我的测试用例中的正确值。

However, I did have to declare nbsp as an entity within my XML and XSL using the following:

但是,我必须在我的XML和XSL中声明作为一个实体:

<!DOCTYPE xsl:stylesheet [ <!ENTITY nbsp "&#160;"> ]>

I'm not sure if that helps you, but I was able to actually find nbsp using an XPath expression.

我不确定这是否对您有帮助,但是我确实能够使用XPath表达式找到了。

Edit: My code sample actually contains the characters '&nbsp;' but the JavaScript syntax highlight converts it to the space character. Don't be mislead!

编辑:我的代码示例实际上包含字符'& ',但是JavaScript语法突出显示将它转换为空格字符。不要误导!

#5


1  

Bear in mind that a standards-compliant XML processor will have replaced any entity references other than XML's five standard ones (&amp;, &gt;, &lt;, &apos;, &quot;) with the corresponding character in the target encoding by the time XPath expressions are evaluated. Given that behavior, PhiLho's and jsulak's suggestions are the way to go if you want to work with XML tools. When you enter &#160; in the XPath expression, it should be converted to the corresponding byte sequence before the XPath expression is applied.

请记住,符合标准的XML处理器会在评估XPath表达式时,用目标编码中的相应字符替换除XML的五个标准(& & > < ' ")之外的任何实体引用。考虑到这种行为,如果您想使用XML工具,PhiLho和jsulak的建议是可行的。当你进入& # 160;在XPath表达式中,应该在应用XPath表达式之前将其转换为相应的字节序列。

#6


0  

Search for &nbsp; or only nbsp - did you try this?

搜索,或者只有你试过这个?