如何使用PHP获取html页面中的link元素

时间:2022-10-31 18:46:08

First, I know that I can get the HTML of a webpage with:

首先,我知道我可以获取网页的HTML:

file_get_contents($url);

What I am trying to do is get a specific link element in the page (found in the head).

我想要做的是在页面中找到一个特定的链接元素(在头部找到)。

e.g:

例如:

<link type="text/plain" rel="service" href="/service.txt" /> (the element could close with just >)

My question is: How can I get that specific element with the "rel" attribute equal to "service" so I can get the href?

我的问题是:我如何获得“rel”属性等于“service”的特定元素,这样我才能获得href?

My second question is: Should I also get the "base" element? Does it apply to the "link" element? I am trying to follow the standard.

我的第二个问题是:我是否也应该获得“基础”元素?它适用于“link”元素吗?我试图遵循标准。

Also, the html might have errors. I don't have control on how my users code there stuff.

此外,html可能有错误。我没有控制我的用户如何编码那些东西。

3 个解决方案

#1


3  

Using PHP's DOMDocument, this should do it (untested):

使用PHP的DOMDocument,这应该做(未经测试):

$doc = new DOMDocument();
$doc->loadHTML($file);
$head = $doc->getElementsByTagName('head')->item(0);
$links = $head->getElementsByTagName("link");
foreach($links as $l) {
    if($l->getAttribute("rel") == "service") {
        echo $l->getAttribute("href");
    }
}

#2


0  

You should get the Base element, but know how it works and its scope.

你应该得到Base元素,但要知道它的工作原理和范围。

In truth, when I have to screen-scrape, I use phpquery. This is an older PHP port of jQuery... and what that may sound like something of a dumb concept, it is awesome for document traversal... and doesn't require well-formed XHTMl.

事实上,当我必须屏幕刮,我使用phpquery。这是一个较旧的jQuery PHP端口...而这听起来像是一个愚蠢的概念,它对于文档遍历来说非常棒......并且不需要格式良好的XHTMl。

http://code.google.com/p/phpquery/

http://code.google.com/p/phpquery/

#3


0  

I'm working with Selenium under Java for Web-Application-Testing. It provides very nice features for document traversal using CSS-Selectors.

我正在使用Selenium在Java下进行Web应用程序测试。它为使用CSS-Selectors的文档遍历提供了非常好的功能。

Have a look at How to use Selenium with PHP.
But this setup might be to complex for your needs if you only want to extract this one link.

看看如何使用PHP的Selenium。但是,如果您只想提取此链接,则此设置可能会复杂化以满足您的需求。

#1


3  

Using PHP's DOMDocument, this should do it (untested):

使用PHP的DOMDocument,这应该做(未经测试):

$doc = new DOMDocument();
$doc->loadHTML($file);
$head = $doc->getElementsByTagName('head')->item(0);
$links = $head->getElementsByTagName("link");
foreach($links as $l) {
    if($l->getAttribute("rel") == "service") {
        echo $l->getAttribute("href");
    }
}

#2


0  

You should get the Base element, but know how it works and its scope.

你应该得到Base元素,但要知道它的工作原理和范围。

In truth, when I have to screen-scrape, I use phpquery. This is an older PHP port of jQuery... and what that may sound like something of a dumb concept, it is awesome for document traversal... and doesn't require well-formed XHTMl.

事实上,当我必须屏幕刮,我使用phpquery。这是一个较旧的jQuery PHP端口...而这听起来像是一个愚蠢的概念,它对于文档遍历来说非常棒......并且不需要格式良好的XHTMl。

http://code.google.com/p/phpquery/

http://code.google.com/p/phpquery/

#3


0  

I'm working with Selenium under Java for Web-Application-Testing. It provides very nice features for document traversal using CSS-Selectors.

我正在使用Selenium在Java下进行Web应用程序测试。它为使用CSS-Selectors的文档遍历提供了非常好的功能。

Have a look at How to use Selenium with PHP.
But this setup might be to complex for your needs if you only want to extract this one link.

看看如何使用PHP的Selenium。但是,如果您只想提取此链接,则此设置可能会复杂化以满足您的需求。