从URL获取HTTP响应代码的最佳方法是什么?

时间:2022-10-09 07:41:54

I’m looking for a quick way to get an HTTP response code from a URL (i.e. 200, 404, etc). I’m not sure which library to use.

我正在寻找一种从URL(即200,404等)获取HTTP响应代码的快捷方法。我不确定使用哪个库。

6 个解决方案

#1


67  

Update using the wonderful requests library. Note we are using the HEAD request, which should happen more quickly then a full GET or POST request.

使用精彩请求库进行更新。注意我们正在使用HEAD请求,这应该比完整的GET或POST请求更快地发生。

import requests
try:
    r = requests.head("https://*.com")
    print(r.status_code)
    # prints the int of the status code. Find more at httpstatusrappers.com :)
except requests.ConnectionError:
    print("failed to connect")

#2


64  

Here's a solution that uses httplib instead.

这是一个使用httplib的解决方案。

import httplib

def get_status_code(host, path="/"):
    """ This function retreives the status code of a website by requesting
        HEAD data from the host. This means that it only requests the headers.
        If the host cannot be reached or something else goes wrong, it returns
        None instead.
    """
    try:
        conn = httplib.HTTPConnection(host)
        conn.request("HEAD", path)
        return conn.getresponse().status
    except StandardError:
        return None


print get_status_code("*.com") # prints 200
print get_status_code("*.com", "/nonexistant") # prints 404

#3


22  

You should use urllib2, like this:

您应该使用urllib2,如下所示:

import urllib2
for url in ["http://entrian.com/", "http://entrian.com/does-not-exist/"]:
    try:
        connection = urllib2.urlopen(url)
        print connection.getcode()
        connection.close()
    except urllib2.HTTPError, e:
        print e.getcode()

# Prints:
# 200 [from the try block]
# 404 [from the except block]

#4


5  

In future, for those that use python3 and later, here's another code to find response code.

将来,对于那些使用python3及更高版本的人来说,这是另一个查找响应代码的代码。

import urllib.request

def getResponseCode(url):
    conn = urllib.request.urlopen(url)
    return conn.getcode()

#5


3  

The urllib2.HTTPError exception does not contain a getcode() method. Use the code attribute instead.

urllib2.HTTPError异常不包含getcode()方法。请改用code属性。

#6


1  

Here's an httplib solution that behaves like urllib2. You can just give it a URL and it just works. No need to mess about splitting up your URLs into hostname and path. This function already does that.

这是一个httplib解决方案,其行为类似于urllib2。你可以给它一个URL,它只是工作。无需将您的URL拆分为主机名和路径。这个功能已经做到了。

import httplib
import socket
def get_link_status(url):
  """
    Gets the HTTP status of the url or returns an error associated with it.  Always returns a string.
  """
  https=False
  url=re.sub(r'(.*)#.*$',r'\1',url)
  url=url.split('/',3)
  if len(url) > 3:
    path='/'+url[3]
  else:
    path='/'
  if url[0] == 'http:':
    port=80
  elif url[0] == 'https:':
    port=443
    https=True
  if ':' in url[2]:
    host=url[2].split(':')[0]
    port=url[2].split(':')[1]
  else:
    host=url[2]
  try:
    headers={'User-Agent':'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:26.0) Gecko/20100101 Firefox/26.0',
             'Host':host
             }
    if https:
      conn=httplib.HTTPSConnection(host=host,port=port,timeout=10)
    else:
      conn=httplib.HTTPConnection(host=host,port=port,timeout=10)
    conn.request(method="HEAD",url=path,headers=headers)
    response=str(conn.getresponse().status)
    conn.close()
  except socket.gaierror,e:
    response="Socket Error (%d): %s" % (e[0],e[1])
  except StandardError,e:
    if hasattr(e,'getcode') and len(e.getcode()) > 0:
      response=str(e.getcode())
    if hasattr(e, 'message') and len(e.message) > 0:
      response=str(e.message)
    elif hasattr(e, 'msg') and len(e.msg) > 0:
      response=str(e.msg)
    elif type('') == type(e):
      response=e
    else:
      response="Exception occurred without a good error message.  Manually check the URL to see the status.  If it is believed this URL is 100% good then file a issue for a potential bug."
  return response

#1


67  

Update using the wonderful requests library. Note we are using the HEAD request, which should happen more quickly then a full GET or POST request.

使用精彩请求库进行更新。注意我们正在使用HEAD请求,这应该比完整的GET或POST请求更快地发生。

import requests
try:
    r = requests.head("https://*.com")
    print(r.status_code)
    # prints the int of the status code. Find more at httpstatusrappers.com :)
except requests.ConnectionError:
    print("failed to connect")

#2


64  

Here's a solution that uses httplib instead.

这是一个使用httplib的解决方案。

import httplib

def get_status_code(host, path="/"):
    """ This function retreives the status code of a website by requesting
        HEAD data from the host. This means that it only requests the headers.
        If the host cannot be reached or something else goes wrong, it returns
        None instead.
    """
    try:
        conn = httplib.HTTPConnection(host)
        conn.request("HEAD", path)
        return conn.getresponse().status
    except StandardError:
        return None


print get_status_code("*.com") # prints 200
print get_status_code("*.com", "/nonexistant") # prints 404

#3


22  

You should use urllib2, like this:

您应该使用urllib2,如下所示:

import urllib2
for url in ["http://entrian.com/", "http://entrian.com/does-not-exist/"]:
    try:
        connection = urllib2.urlopen(url)
        print connection.getcode()
        connection.close()
    except urllib2.HTTPError, e:
        print e.getcode()

# Prints:
# 200 [from the try block]
# 404 [from the except block]

#4


5  

In future, for those that use python3 and later, here's another code to find response code.

将来,对于那些使用python3及更高版本的人来说,这是另一个查找响应代码的代码。

import urllib.request

def getResponseCode(url):
    conn = urllib.request.urlopen(url)
    return conn.getcode()

#5


3  

The urllib2.HTTPError exception does not contain a getcode() method. Use the code attribute instead.

urllib2.HTTPError异常不包含getcode()方法。请改用code属性。

#6


1  

Here's an httplib solution that behaves like urllib2. You can just give it a URL and it just works. No need to mess about splitting up your URLs into hostname and path. This function already does that.

这是一个httplib解决方案,其行为类似于urllib2。你可以给它一个URL,它只是工作。无需将您的URL拆分为主机名和路径。这个功能已经做到了。

import httplib
import socket
def get_link_status(url):
  """
    Gets the HTTP status of the url or returns an error associated with it.  Always returns a string.
  """
  https=False
  url=re.sub(r'(.*)#.*$',r'\1',url)
  url=url.split('/',3)
  if len(url) > 3:
    path='/'+url[3]
  else:
    path='/'
  if url[0] == 'http:':
    port=80
  elif url[0] == 'https:':
    port=443
    https=True
  if ':' in url[2]:
    host=url[2].split(':')[0]
    port=url[2].split(':')[1]
  else:
    host=url[2]
  try:
    headers={'User-Agent':'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:26.0) Gecko/20100101 Firefox/26.0',
             'Host':host
             }
    if https:
      conn=httplib.HTTPSConnection(host=host,port=port,timeout=10)
    else:
      conn=httplib.HTTPConnection(host=host,port=port,timeout=10)
    conn.request(method="HEAD",url=path,headers=headers)
    response=str(conn.getresponse().status)
    conn.close()
  except socket.gaierror,e:
    response="Socket Error (%d): %s" % (e[0],e[1])
  except StandardError,e:
    if hasattr(e,'getcode') and len(e.getcode()) > 0:
      response=str(e.getcode())
    if hasattr(e, 'message') and len(e.message) > 0:
      response=str(e.message)
    elif hasattr(e, 'msg') and len(e.msg) > 0:
      response=str(e.msg)
    elif type('') == type(e):
      response=e
    else:
      response="Exception occurred without a good error message.  Manually check the URL to see the status.  If it is believed this URL is 100% good then file a issue for a potential bug."
  return response