如何在Nokogiri中处理404未找到的错误

时间:2022-11-29 15:41:14

I am using Nokogiri to scrape web pages. Few urls need to be guessed and returns 404 not found error when they don't exist. Is there a way to capture this exception?

我正在使用Nokogiri来抓取网页。几个网址需要被猜到,并且当它们不存在时返回404未找到的错误。有没有办法捕获此异常?

http://yoursite/page/38475 #=> page number 38475 doesn't exist

I tried the following which didn't work.

我尝试了以下哪些不起作用。

url = "http://yoursite/page/38475"
doc = Nokogiri::HTML(open(url)) do
  begin
    rescue Exception => e
      puts "Try again later"
  end
end

1 个解决方案

#1


20  

It doesn't work, because you are not rescuing part of code (it's open(url) call) that raises an error in case of finding 404 status. The following code should work:

它不起作用,因为你没有挽救部分代码(它是打开(url)调用),在发现404状态时会引发错误。以下代码应该工作:

url = 'http://yoursite/page/38475'
begin
  file = open(url)
  doc = Nokogiri::HTML(file) do
    # handle doc
  end
rescue OpenURI::HTTPError => e
  if e.message == '404 Not Found'
    # handle 404 error
  else
    raise e
  end
end

BTW, about rescuing Exception: Why is it a bad style to `rescue Exception => e` in Ruby?

BTW,关于抢救Exception:为什么在Ruby中“拯救Exception => e`是一种糟糕的风格?

#1


20  

It doesn't work, because you are not rescuing part of code (it's open(url) call) that raises an error in case of finding 404 status. The following code should work:

它不起作用,因为你没有挽救部分代码(它是打开(url)调用),在发现404状态时会引发错误。以下代码应该工作:

url = 'http://yoursite/page/38475'
begin
  file = open(url)
  doc = Nokogiri::HTML(file) do
    # handle doc
  end
rescue OpenURI::HTTPError => e
  if e.message == '404 Not Found'
    # handle 404 error
  else
    raise e
  end
end

BTW, about rescuing Exception: Why is it a bad style to `rescue Exception => e` in Ruby?

BTW,关于抢救Exception:为什么在Ruby中“拯救Exception => e`是一种糟糕的风格?