爬虫学习-使用CrawlSpider

使用scrapy中的CrawlSpider类来进行爬行

一直用的是BaseSpider，回调函数的方式，有一个问题是title，date在一个页面，author，detail在另一个页面时，怎么把这些字段统一在一个item条目中，尝试了多次，用全局变量等，未果。

尝试使用更高级的CrawlSpider操作；

参照实例代码：

     name=     allow_domain=[     start_urls=[     link_extractor={
                                }
     _x_query={
                                }




              bbsItem_loader=ItemLoader(item=BbsdmozItem(),response=response)
         url=str(response.url)
         bbsItem_loader.add_value(         bbsItem_loader.add_xpath(         bbsItem_loader.add_xpath(         bbsItem_loader.add_xpath(         return bbsItem_loader.load_item()

稍加改造后，如下代码：

       }

     _x_query={
                       }
     _y_query={
              }




              bbsItem_loader=ItemLoader(item=DmozItem(),response=response)
         url=str(response.url)
         bbsItem_loader.add_value(         bbsItem_loader.add_xpath(         bbsItem_loader.add_xpath(         bbsItem_loader.add_xpath(         return bbsItem_loader.load_item()

run it,success.

D:\test-python\tutorial>\Python27\Scripts\scrapy.exe crawl myspider6 -o ee.json

秒客网

爬虫学习-使用CrawlSpider

相关文章