在selenium . phantomjs上设置超时

时间:2022-01-13 01:51:07

The situation

这种情况

I have a simple python script to get the HTML source for a given url:

我有一个简单的python脚本,以获取给定url的HTML源代码:

    browser = webdriver.PhantomJS()
    browser.get(url)
    content = browser.page_source

Occasionally, the url points to a page with slow-loading external resources (e.g. video files, or really slow advertising content).

有时,url指向一个加载缓慢的外部资源的页面(例如视频文件,或真正缓慢的广告内容)。

Webdriver will wait until those resources are loaded before completing the .get(url) request.

在完成.get(url)请求之前,Webdriver将等待这些资源被加载。

Note: For extraneous reasons, I need to do this with PhantomJS rather than requests or urllib2

注意:出于不必要的原因,我需要使用PhantomJS而不是请求或urllib2


The question

这个问题

I'd like to set a timeout on PhantomJS resource loading so that if the resource is taking too long to load, the browser just assumes it doesn't exist or whatever.

我想在PhantomJS资源加载上设置一个超时,这样如果资源加载时间太长,浏览器就会假设它不存在或者其他什么。

This would allow me to perform the subsequent .pagesource query based on what the browser has loaded.

这将允许我基于浏览器加载的内容执行后续的.pagesource查询。

Documentation on webdriver.PhantomJS is very thin, and I haven't found a similar question on SO.

webdriver文档。PhantomJS很瘦,我还没发现类似的问题。

thanks in advance!

提前谢谢!

2 个解决方案

#1


11  

PhantomJS has provided resourceTimeout, which might suit your needs. I quote from documentation here

PhantomJS提供了resourceTimeout,可能适合您的需要。我在这里引用文档

(in milli-secs) defines the timeout after which any resource requested will stop trying and proceed with other parts of the page. onResourceTimeout callback will be called on timeout.

(在millis -secs中)定义超时后,请求的任何资源将停止尝试并继续处理页面的其他部分。超时时将调用onResourceTimeout回调。

So in Ruby, you can do something like

在Ruby中,你可以做一些类似的事情

require 'selenium-webdriver'

capabilities = Selenium::WebDriver::Remote::Capabilities.phantomjs("phantomjs.page.settings.resourceTimeout" => "5000")
driver = Selenium::WebDriver.for :phantomjs, :desired_capabilities => capabilities

I believe in Python, it's something like (untested, only provides the logic, you are the Python developer, hopefully you will figure out)

我相信Python,它是(未经测试,只提供逻辑,您是Python开发人员,希望您能理解)

driver = webdriver.PhantomJS(desired_capabilities={'phantomjs.page.settings.resourceTimeout': '5000'})

#2


13  

Long Explanation below, so TLDR:

下面的解释很长,所以TLDR:

Current version of Selenium's Ghostdriver (in PhantomJS 1.9.8) ignores resourceTimeout option, use webdriver's implicitly_wait(), set_page_load_timeout() and wrap them under try-except block.

当前版本的Selenium的Ghostdriver(在PhantomJS 1.9.8中)忽略resourceTimeout选项,使用webdriver的implicitly_wait()、set_page_load_timeout(),并将它们包装到块下(block除外)。

#Python
from selenium import webdriver
from selenium.common.exceptions import TimeoutException

browser = webdriver.PhantomJS()
browser.implicitly_wait(3)
browser.set_page_load_timeout(3)
try:
    browser.get("http://url_here")
except TimeoutException as e:
    #Handle your exception here
    print(e)
finally:
    browser.quit()

Explanation

解释

To provide PhantomJS page settings to Selenium, one can use webdriver's DesiredCapabilities such as:

要为Selenium提供PhantomJS页面设置,可以使用webdriver需要的功能,比如:

#Python
from selenium import webdriver
cap = webdriver.DesiredCapabilities.PHANTOMJS
cap["phantomjs.page.settings.resourceTimeout"] = 1000
cap["phantomjs.page.settings.loadImages"] = False
cap["phantomjs.page.settings.userAgent"] = "faking it"
browser = webdriver.PhantomJS(desired_capabilities=cap)
//Java
DesiredCapabilities capabilities = DesiredCapabilities.phantomjs();
capabilities.setCapability("phantomjs.page.settings.resourceTimeout", 1000);
capabilities.setCapability("phantomjs.page.settings.loadImages", false);
capabilities.setCapability("phantomjs.page.settings.userAgent", "faking it");
WebDriver webdriver = new PhantomJSDriver(capabilities);

But, here's the catch: As in today (2014/Dec/11) with PhantomJS 1.9.8 and its embedded Ghostdriver, resourceTimeout won't be applied by Ghostdriver (See the Ghostdriver issue#380 in Github).

但是,问题是:就像今天(2014/ 12 /11)的PhantomJS 1.9.8及其嵌入的Ghostdriver一样,resourceTimeout不会被Ghostdriver应用(参见Github上的Ghostdriver问题#380)。

For a workaround, simply use Selenium's timeout functions/methods and wrap webdriver's get method in a try-except/try-catch block, e.g.

对于解决方案,只需使用Selenium的超时函数/方法,并在try-except/try-catch块中包装webdriver的get方法,例如。

#Python
from selenium import webdriver
from selenium.common.exceptions import TimeoutException

browser = webdriver.PhantomJS()
browser.implicitly_wait(3)
browser.set_page_load_timeout(3)
try:
    browser.get("http://url_here")
except TimeoutException as e:
    #Handle your exception here
    print(e)
finally:
    browser.quit()
//Java
WebDriver webdriver = new PhantomJSDriver();
webdriver.manage().timeouts()
        .pageLoadTimeout(3, TimeUnit.SECONDS)
        .implicitlyWait(3, TimeUnit.SECONDS);
try {
    webdriver.get("http://url_here");
} catch (org.openqa.selenium.TimeoutException e) {
    //Handle your exception here
    System.out.println(e.getMessage());
} finally {
    webdriver.quit();
}

#1


11  

PhantomJS has provided resourceTimeout, which might suit your needs. I quote from documentation here

PhantomJS提供了resourceTimeout,可能适合您的需要。我在这里引用文档

(in milli-secs) defines the timeout after which any resource requested will stop trying and proceed with other parts of the page. onResourceTimeout callback will be called on timeout.

(在millis -secs中)定义超时后,请求的任何资源将停止尝试并继续处理页面的其他部分。超时时将调用onResourceTimeout回调。

So in Ruby, you can do something like

在Ruby中,你可以做一些类似的事情

require 'selenium-webdriver'

capabilities = Selenium::WebDriver::Remote::Capabilities.phantomjs("phantomjs.page.settings.resourceTimeout" => "5000")
driver = Selenium::WebDriver.for :phantomjs, :desired_capabilities => capabilities

I believe in Python, it's something like (untested, only provides the logic, you are the Python developer, hopefully you will figure out)

我相信Python,它是(未经测试,只提供逻辑,您是Python开发人员,希望您能理解)

driver = webdriver.PhantomJS(desired_capabilities={'phantomjs.page.settings.resourceTimeout': '5000'})

#2


13  

Long Explanation below, so TLDR:

下面的解释很长,所以TLDR:

Current version of Selenium's Ghostdriver (in PhantomJS 1.9.8) ignores resourceTimeout option, use webdriver's implicitly_wait(), set_page_load_timeout() and wrap them under try-except block.

当前版本的Selenium的Ghostdriver(在PhantomJS 1.9.8中)忽略resourceTimeout选项,使用webdriver的implicitly_wait()、set_page_load_timeout(),并将它们包装到块下(block除外)。

#Python
from selenium import webdriver
from selenium.common.exceptions import TimeoutException

browser = webdriver.PhantomJS()
browser.implicitly_wait(3)
browser.set_page_load_timeout(3)
try:
    browser.get("http://url_here")
except TimeoutException as e:
    #Handle your exception here
    print(e)
finally:
    browser.quit()

Explanation

解释

To provide PhantomJS page settings to Selenium, one can use webdriver's DesiredCapabilities such as:

要为Selenium提供PhantomJS页面设置,可以使用webdriver需要的功能,比如:

#Python
from selenium import webdriver
cap = webdriver.DesiredCapabilities.PHANTOMJS
cap["phantomjs.page.settings.resourceTimeout"] = 1000
cap["phantomjs.page.settings.loadImages"] = False
cap["phantomjs.page.settings.userAgent"] = "faking it"
browser = webdriver.PhantomJS(desired_capabilities=cap)
//Java
DesiredCapabilities capabilities = DesiredCapabilities.phantomjs();
capabilities.setCapability("phantomjs.page.settings.resourceTimeout", 1000);
capabilities.setCapability("phantomjs.page.settings.loadImages", false);
capabilities.setCapability("phantomjs.page.settings.userAgent", "faking it");
WebDriver webdriver = new PhantomJSDriver(capabilities);

But, here's the catch: As in today (2014/Dec/11) with PhantomJS 1.9.8 and its embedded Ghostdriver, resourceTimeout won't be applied by Ghostdriver (See the Ghostdriver issue#380 in Github).

但是,问题是:就像今天(2014/ 12 /11)的PhantomJS 1.9.8及其嵌入的Ghostdriver一样,resourceTimeout不会被Ghostdriver应用(参见Github上的Ghostdriver问题#380)。

For a workaround, simply use Selenium's timeout functions/methods and wrap webdriver's get method in a try-except/try-catch block, e.g.

对于解决方案,只需使用Selenium的超时函数/方法,并在try-except/try-catch块中包装webdriver的get方法,例如。

#Python
from selenium import webdriver
from selenium.common.exceptions import TimeoutException

browser = webdriver.PhantomJS()
browser.implicitly_wait(3)
browser.set_page_load_timeout(3)
try:
    browser.get("http://url_here")
except TimeoutException as e:
    #Handle your exception here
    print(e)
finally:
    browser.quit()
//Java
WebDriver webdriver = new PhantomJSDriver();
webdriver.manage().timeouts()
        .pageLoadTimeout(3, TimeUnit.SECONDS)
        .implicitlyWait(3, TimeUnit.SECONDS);
try {
    webdriver.get("http://url_here");
} catch (org.openqa.selenium.TimeoutException e) {
    //Handle your exception here
    System.out.println(e.getMessage());
} finally {
    webdriver.quit();
}