如何防止Python的urllib(2)跟随重定向?

I am currently trying to log into a site using Python however the site seems to be sending a cookie and a redirect statement on the same page. Python seems to be following that redirect thus preventing me from reading the cookie send by the login page. How do I prevent Python's urllib (or urllib2) urlopen from following the redirect?

我目前正在尝试使用Python登录到一个站点，但是该站点似乎在同一个页面上发送了一个cookie和一个重定向语句。Python似乎在跟踪这个重定向，从而阻止我读取登录页面发送的cookie。如何防止Python的urllib(或urllib2)在重定向后打开?

4 个解决方案

#1

You could do a couple of things:

你可以做一些事情:

Build your own HTTPRedirectHandler that intercepts each redirect
构建自己的HTTPRedirectHandler，它可以拦截每个重定向。
Create an instance of HTTPCookieProcessor and install that opener so that you have access to the cookiejar.
创建一个HTTPCookieProcessor的实例并安装这个打开器，这样您就可以访问cookiejar了。

This is a quick little thing that shows both

这是一件很简单的事情。

import urllib2

#redirect_handler = urllib2.HTTPRedirectHandler()

class MyHTTPRedirectHandler(urllib2.HTTPRedirectHandler):
    def http_error_302(self, req, fp, code, msg, headers):
        print "Cookie Manip Right Here"
        return urllib2.HTTPRedirectHandler.http_error_302(self, req, fp, code, msg, headers)

    http_error_301 = http_error_303 = http_error_307 = http_error_302

cookieprocessor = urllib2.HTTPCookieProcessor()

opener = urllib2.build_opener(MyHTTPRedirectHandler, cookieprocessor)
urllib2.install_opener(opener)

response =urllib2.urlopen("WHEREEVER")
print response.read()

print cookieprocessor.cookiejar

#2

If all you need is stopping redirection, then there is a simple way to do it. For example I only want to get cookies and for a better performance I don't want to be redirected to any other page. Also I hope the code is kept as 3xx. let's use 302 for instance.

如果您所需要的只是停止重定向，那么有一个简单的方法。例如，我只希望得到cookie和更好的性能，我不想被重定向到任何其他页面。我希望代码保存为3xx。例如，让我们使用302。

class MyHTTPErrorProcessor(urllib2.HTTPErrorProcessor):

    def http_response(self, request, response):
        code, msg, hdrs = response.code, response.msg, response.info()

        # only add this line to stop 302 redirection.
        if code == 302: return response

        if not (200 <= code < 300):
            response = self.parent.error(
                'http', request, response, code, msg, hdrs)
        return response

    https_response = http_response

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj), MyHTTPErrorProcessor)

In this way, you don't even need to go into urllib2.HTTPRedirectHandler.http_error_302()

这样，您甚至不需要进入urllib2.HTTPRedirectHandler.http_error_302()

Yet more common case is that we simply want to stop redirection (as required):

更常见的情况是，我们只是想停止重定向(根据需要):

class NoRedirection(urllib2.HTTPErrorProcessor):

    def http_response(self, request, response):
        return response

    https_response = http_response

And normally use it this way:

通常这样使用:

cj = cookielib.CookieJar()
opener = urllib2.build_opener(NoRedirection, urllib2.HTTPCookieProcessor(cj))
data = {}
response = opener.open('http://www.example.com', urllib.urlencode(data))
if response.code == 302:
    redirection_target = response.headers['Location']

#3

urllib2.urlopen calls build_opener() which uses this list of handler classes:

urllib2。urlopen调用build_opener()，它使用这个处理程序类列表:

handlers = [ProxyHandler, UnknownHandler, HTTPHandler,
HTTPDefaultErrorHandler, HTTPRedirectHandler,
FTPHandler, FileHandler, HTTPErrorProcessor]

You could try calling urllib2.build_opener(handlers) yourself with a list that omits HTTPRedirectHandler, then call the open() method on the result to open your URL. If you really dislike redirects, you could even call urllib2.install_opener(opener) to your own non-redirecting opener.

您可以尝试将urllib2.build_opener(处理程序)命名为omits HTTPRedirectHandler，然后调用open()方法来打开您的URL。如果您真的不喜欢重定向，您甚至可以将urllib2.install_opener(开瓶器)命名为您自己的非重定向启动器。

It sounds like your real problem is that urllib2 isn't doing cookies the way you'd like. See also How to use Python to login to a webpage and retrieve cookies for later usage?

听起来你真正的问题是urllib2并没有像你想的那样做饼干。还可以看到如何使用Python登录到一个网页并检索cookie以供以后使用?

#4

This question was asked before here.

这个问题在此之前被问到过。

EDIT: If you have to deal with quirky web applications you should probably try out mechanize. It's a great library that simulates a web browser. You can control redirecting, cookies, page refreshes... If the website doesn't rely [heavily] on JavaScript, you'll get along very nicely with mechanize.

编辑:如果你要处理古怪的网络应用，你应该试试机械化。它是一个很好的模拟web浏览器的库。您可以控制重定向、cookies、页面刷新……如果网站不依赖JavaScript，你就能很好地与机械化相处。

#1