使用Ruby on Rails从URL创建动态站点地图

时间:2023-02-06 08:30:09

I am currently working on an application where I scrape information from a number of different sites. To get the deeplink for the desired topic on a site I rely on the sitemap that is provided (e.g. "Forum"). As I am expanding I came across some sites that don't provide a sitemap themselves, so I was wondering if there was any way to generate it within Rails from the top level domain?

我目前正在开发一个应用程序,我从许多不同的站点获取信息。为了获得网站上所需主题的深层链接,我依赖于提供的站点地图(例如“论坛”)。随着我的扩展,我遇到了一些自己没有提供站点地图的网站,所以我想知道是否有任何方法可以在*域名的Rails中生成它?

I am using Nokogiri and Mechanize to retrieve data, so if there is any functionality that could help to tackle that task it would be easier to integrate.

我正在使用Nokogiri和Mechanize来检索数据,因此如果有任何功能可以帮助解决该任务,那么集成起来会更容易。

1 个解决方案

#1


0  

This can be done with the Spidr gem like so:

这可以使用Spidr gem来完成,如下所示:

url_map = Hash.new { |hash,key| hash[key] = [] }

Spidr.site('http://intranet.com/') do |spider|
  spider.every_link do |origin,dest|
    url_map[dest] << origin
  end
end

#1


0  

This can be done with the Spidr gem like so:

这可以使用Spidr gem来完成,如下所示:

url_map = Hash.new { |hash,key| hash[key] = [] }

Spidr.site('http://intranet.com/') do |spider|
  spider.every_link do |origin,dest|
    url_map[dest] << origin
  end
end