XPath从IMG标签解析“SRC”?

时间:2022-11-27 18:23:21

Right now I successfully grabbed the full element from an HTML page with this:

现在我成功地从HTML页面抓取了完整的元素:

//img[@class='photo-large']

for example it would return this:

例如它会返回:

<img src="http://example.com/img.jpg" class='photo-large' />

But I only need the SRC url (http://example.com/img.jpg). Any help?

但我只需要SRC网址(http://example.com/img.jpg)。有帮助吗?

3 个解决方案

#1


57  

You are so close to answering this yourself that I am somewhat reluctant to answer it for you. However, the following XPath should provide what you want (provided the source is XHTML, of course).

你是如此接近自己回答这个问题,我有点不愿意为你回答。但是,以下XPath应该提供您想要的(当然,如果源是XHTML)。

//img[@class='photo-large']/@src

For further tips, check out W3 Schools. They have excellent tutorials on such things and a great reference too.

有关更多提示,请查看W3学校。他们有关于这些事情的优秀教程,也是一个很好的参考。

#2


9  

Using Hpricot this works:

使用Hpricot这有效:

doc.at('//img[@class="photo-large"]')['src']

In case you have more than one image, the following gives an array:

如果您有多个图像,下面给出一个数组:

doc.search('//img[@class="photo-large"]').map do |e| e['src'] end

However, Nokogiri is many times faster and it “can be used as a drop in replacement” for Hpricot.
Here the version for Nokogiri, in which this XPath for selecting attributes works:

然而,Nokogiri的速度要快很多倍,它可以用作Hpricot的替代品。这里是Nokogiri的版本,其中用于选择属性的XPath工作:

doc.at('//img[@class="photo-large"]/@src').to_s

or for many images:

或许多图像:

doc.search('//img[@class="photo-large"]/@src').to_a

#3


0  

//img/@src

// IMG / @ SRC

you can just go with this if you want a link of the image.

如果你想要一个图像的链接,你可以使用它。

example:

例:

<img alt="" class="avatar width-full rounded-2" height="230" src="https://avatars3.githubusercontent.com/...;s=460" width="230">

#1


57  

You are so close to answering this yourself that I am somewhat reluctant to answer it for you. However, the following XPath should provide what you want (provided the source is XHTML, of course).

你是如此接近自己回答这个问题,我有点不愿意为你回答。但是,以下XPath应该提供您想要的(当然,如果源是XHTML)。

//img[@class='photo-large']/@src

For further tips, check out W3 Schools. They have excellent tutorials on such things and a great reference too.

有关更多提示,请查看W3学校。他们有关于这些事情的优秀教程,也是一个很好的参考。

#2


9  

Using Hpricot this works:

使用Hpricot这有效:

doc.at('//img[@class="photo-large"]')['src']

In case you have more than one image, the following gives an array:

如果您有多个图像,下面给出一个数组:

doc.search('//img[@class="photo-large"]').map do |e| e['src'] end

However, Nokogiri is many times faster and it “can be used as a drop in replacement” for Hpricot.
Here the version for Nokogiri, in which this XPath for selecting attributes works:

然而,Nokogiri的速度要快很多倍,它可以用作Hpricot的替代品。这里是Nokogiri的版本,其中用于选择属性的XPath工作:

doc.at('//img[@class="photo-large"]/@src').to_s

or for many images:

或许多图像:

doc.search('//img[@class="photo-large"]/@src').to_a

#3


0  

//img/@src

// IMG / @ SRC

you can just go with this if you want a link of the image.

如果你想要一个图像的链接,你可以使用它。

example:

例:

<img alt="" class="avatar width-full rounded-2" height="230" src="https://avatars3.githubusercontent.com/...;s=460" width="230">