如何使用xpath选择跟随兄弟/xml标记

时间:2022-11-27 07:35:31

I have an HTML file (from Newegg) and their HTML is organized like below. All of the data in their specifications table is 'desc' while the titles of each section are in 'name.' Below are two examples of data from Newegg pages.

我有一个HTML文件(来自Newegg),他们的HTML组织如下。规范表中的所有数据都是“desc”,而每个部分的标题都是“name”。以下是来自新蛋网页面的两个数据示例。

<tr>
    <td class="name">Brand</td>
    <td class="desc">Intel</td>
</tr>
<tr>
    <td class="name">Series</td>
    <td class="desc">Core i5</td>
</tr>
<tr>
    <td class="name">Cores</td>
    <td class="desc">4</td>
</tr>
<tr>
    <td class="name">Socket</td>
    <td class="desc">LGA 1156</td>

<tr>
    <td class="name">Brand</td>
    <td class="desc">AMD</td>
</tr>
<tr>
    <td class="name">Series</td>
    <td class="desc">Phenom II X4</td>
</tr>
<tr>
    <td class="name">Cores</td>
    <td class="desc">4</td>
</tr>
<tr>
    <td class="name">Socket</td>
    <td class="desc">Socket AM3</td>
</tr>

In the end I would like to have a class for a CPU (which is already set up) that consists of a Brand, Series, Cores, and Socket type to store each of the data. This is the only way I can think of to go about doing this:

最后,我想为一个CPU(已经建立)的一个类,它包含一个品牌、系列、核心和套接字类型来存储每个数据。这是我能想到的做这件事的唯一方式:

if(parsedDocument.xpath(tr/td[@class="name"])=='Brand'):
    CPU.brand = parsedDocument.xpath(tr/td[@class="name"]/nextsibling?).text

And doing this for the rest of the values. How would I accomplish the nextsibling and is there an easier way of doing this?

对剩下的值做这个。我该如何完成接下来的任务呢?有没有更简单的方法呢?

2 个解决方案

#1


155  

How would I accomplish the nextsibling and is there an easier way of doing this?

我该如何完成接下来的任务呢?有没有更简单的方法呢?

You may use:

你可以使用:

tr/td[@class='name']/following-sibling::td

but I'd rather use directly:

但我宁愿直接使用:

tr[td[@class='name'] ='Brand']/td[@class='desc']

This assumes that:

这样的假设:

  1. The context node, against which the XPath expression is evaluated is the parent of all tr elements -- not shown in your question.

    XPath表达式求值的上下文节点是所有tr元素的父节点——在您的问题中没有显示。

  2. Each tr element has only one td with class attribute valued 'name' and only one td with class attribute valued 'desc'.

    每个tr元素只有一个带有类属性值'name'的td,只有一个带有类属性值'desc'的td。

#2


6  

Try the following-sibling axis (following-sibling::td).

尝试跟随兄弟姐妹轴(跟随兄弟姐妹:::td)。

#1


155  

How would I accomplish the nextsibling and is there an easier way of doing this?

我该如何完成接下来的任务呢?有没有更简单的方法呢?

You may use:

你可以使用:

tr/td[@class='name']/following-sibling::td

but I'd rather use directly:

但我宁愿直接使用:

tr[td[@class='name'] ='Brand']/td[@class='desc']

This assumes that:

这样的假设:

  1. The context node, against which the XPath expression is evaluated is the parent of all tr elements -- not shown in your question.

    XPath表达式求值的上下文节点是所有tr元素的父节点——在您的问题中没有显示。

  2. Each tr element has only one td with class attribute valued 'name' and only one td with class attribute valued 'desc'.

    每个tr元素只有一个带有类属性值'name'的td,只有一个带有类属性值'desc'的td。

#2


6  

Try the following-sibling axis (following-sibling::td).

尝试跟随兄弟姐妹轴(跟随兄弟姐妹:::td)。