被robots.txt禁止禁止：scrapy

while crawling website like https://www.netflix.com, getting Forbidden by robots.txt: https://www.netflix.com/>

在抓取像https://www.netflix.com这样的网站时，被robots.txt禁止访问：https：//www.netflix.com/>

ERROR: No response downloaded for: https://www.netflix.com/

错误：未下载响应：https：//www.netflix.com/

2 个解决方案

#1

In the new version (scrapy 1.1) launched 2016-05-11 the crawl first downloads robots.txt before crawling. To change this behavior change in your settings.py with ROBOTSTXT_OBEY

在2016-05-11推出的新版本（scrapy 1.1）中，抓取首先在抓取之前下载robots.txt。要使用ROBOTSTXT_OBEY更改settings.py中的此行为更改

ROBOTSTXT_OBEY=False

Here are the release notes

这是发行说明

#2

First thing you need to ensure is that you change your user agent in the request, otherwise default user agent will be blocked for sure.

您需要确保的第一件事是您在请求中更改了用户代理，否则将默认阻止默认用户代理。

#1