python Requests库在处理response时的一些陷阱

时间:2023-03-09 09:39:26
python Requests库在处理response时的一些陷阱

python的Requests(http://docs.python-requests.org/en/latest/)库在处理http/https请求时还是比较方便的,应用也比较广泛。
但其在处理response时有一些地方需要特别注意,简单来说就是Response对象的content方法和text方法的区别,具体代码如下:

@property
def content(self):
"""Content of the response, in bytes.""" if self._content is False:
# Read the contents.
try:
if self._content_consumed:
raise RuntimeError(
'The content for this response was already consumed') if self.status_code == 0:
self._content = None
else:
self._content = bytes().join(self.iter_content(CONTENT_CHUNK_SIZE)) or bytes() except AttributeError:
self._content = None self._content_consumed = True
# don't need to release the connection; that's been handled by urllib3
# since we exhausted the data.
return self._content @property
def text(self):
"""Content of the response, in unicode. if Response.encoding is None and chardet module is available, encoding
will be guessed.
""" # Try charset from content-type
content = None
encoding = self.encoding if not self.content:
return str('') # Fallback to auto-detected encoding.
if self.encoding is None:
encoding = self.apparent_encoding # Decode unicode from given encoding.
try:
content = str(self.content, encoding, errors='replace')
except (LookupError, TypeError):
# A LookupError is raised if the encoding was not found which could
# indicate a misspelling or similar mistake.
#
# A TypeError can be raised if encoding is None
#
# So we try blindly encoding.
content = str(self.content, errors='replace') return content
   @property
    def apparent_encoding(self):
        """The apparent encoding, provided by the lovely Charade library
        (Thanks, Ian!)."""
        return chardet.detect(self.content)['encoding']

可以看出text方法中对原始数据做了编码操作
其中response的encoding属性是在adapters.py中的HTTPAdapter中的build_response中进行赋值,具体代码如下:

def build_response(self, req, resp):
"""Builds a :class:`Response <requests.Response>` object from a urllib3
response. This should not be called from user code, and is only exposed
for use when subclassing the
:class:`HTTPAdapter <requests.adapters.HTTPAdapter>` :param req: The :class:`PreparedRequest <PreparedRequest>` used to generate the response.
:param resp: The urllib3 response object.
"""
response = Response() # Fallback to None if there's no status_code, for whatever reason.
response.status_code = getattr(resp, 'status', None) # Make headers case-insensitive.
response.headers = CaseInsensitiveDict(getattr(resp, 'headers', {})) # Set encoding.
response.encoding = get_encoding_from_headers(response.headers)
response.raw = resp
response.reason = response.raw.reason if isinstance(req.url, bytes):
response.url = req.url.decode('utf-8')
else:
response.url = req.url # Add new cookies from the server.
extract_cookies_to_jar(response.cookies, req, resp) # Give the Response some context.
response.request = req
response.connection = self return response

从上述代码(response.encoding = get_encoding_from_headers(response.headers))中可以看出,具体的encoding是通过解析headers得到的,

def get_encoding_from_headers(headers):
"""Returns encodings from given HTTP Header Dict. :param headers: dictionary to extract encoding from.
""" content_type = headers.get('content-type') if not content_type:
return None content_type, params = cgi.parse_header(content_type) if 'charset' in params:
return params['charset'].strip("'\"") if 'text' in content_type:
return 'ISO-8859-1'

为避免Requests采用chardet去猜测response的编码,请慎用text属性,直接使用content属性即可,再根据实际需要进行编码。
对于服务端没有显式指明charset的response来说,采用text和content的差别如下所示:
代码:

    print time.time()
print 'begin request'
r = requests.get(r'http://www.sina.com.cn')
# erase response encoding
r.encoding = None
r.text
#r.content
print 'request end'
print time.time()

采用text时的耗时:
aaarticlea/png;base64,iVBORw0KGgoAAAANSUhEUgAAAKMAAABICAIAAAAVjdKWAAADXElEQVR4nO2bW3LkIAxF2W7vqrcx8+UtzCwjq5iPVHUxIF2uMDEP6VQ+aMCS4NhVbpKk33+/Pj+//ny93+/X65WC8wjTXgjTXgjTXsCmr+sq5l8ZYmc+qvXXQ2KKjrx4Poiv0SySjDMfYFpbHm7nH639YlI+LxPQ5IbJu41s0TRzw44ynSeymhs137rGc0x/D/GmyaHaaHZHXcyFoM3UecdKHQfXvBwdpvN1aqO4P9+mpIgvgohDYL5YTHO+Bq6HjzOTsc90x44n6Vm3xheHcA8vCcTZSfZA06TmVG3QT5huXk4aGhVnPkPevbUeciZpmq/H2s57tNuFj7MozXdvcfHi8njTRRwmPk4Kgljrvx9nUeKMzAth2gth2gth2gth2gth2gth2gv4+/SoLDt97zwV7ZkO0002W9Ezps9jv/3BpvlTwLwzH20GYfZLC26tpy6MjwPqZ5YwH+aZbu4U2EGxszm/pqihr55Rbb7shegwnVNMSD9suhBsrQe36zigPLLsheh+pnMeM810gvhMm8zuyLQo70OR5QHTT7aTsidLA75Pp9avbMWA+A6od4fZKZBUG6qTFv1iGTiOlhoXvwpDzsis5p5ktXqmMeo0FNz7E1mzqjnEubcXwrQXwrQXwrQXwrQXwnTOyS/qG5l+RkCYnsxjAnyZ1o4ME3FqmJQzYWscbTJpgomjzXdkOm3+++CBec+h+TcnSbJe3Pt97TpO0je6w/T9Oo/C9HdkjIYhu1lP6DB9v86j6Db9ZDv/yJgYmPccmv8/XczXhq7/qfuTtKE4jpaaWRcopm6DOs9h+LesM7fpAMaaBk9kMJldTk6Cu4RpL4RpL4RpL4RpL2DT5PfpSwL010Niio68eD7o17DGXxdgWtsm3M4/WvvFpHxe03xGElP/NrKtZ2QfRpnOE90xZ62z79r6kr1Nfw/xpsmh2mhxSy1ruqhTi780Haa1NTNXFW0gvggiDoH5Wp34Eq3mS7oRySCrMPaZZjQn/VGuLZLx+dTNOPU0sTxrqPkMNM3vdfE0rG/6fqj5DHn31nrImaRpvp47dYqFmepclOa7t7hacXm86SIOEx8nBfPJONpQxz4sSpyReSFMeyFMeyFMeyFMeyFMeyFMeyFMeyFMeyFMeyFMO+Ef673JmtCMTGwAAAAASUVORK5CYII=" alt="" />
采用content时的耗时:
aaarticlea/png;base64,iVBORw0KGgoAAAANSUhEUgAAAK8AAABDCAIAAABldsLbAAADVElEQVR4nO2dUXLkIAxEue7caq6x+fIVdo+RU+RvxABqGoYllt2v8oFBkYTUdpWZqSR9/ft+/fz5+/18Ph+PRxL3BKvhOI7C/shoTuar3ny91AzRjevZe0HBvjxG84wNUIO3fzzOL0fnm0HJWJ59Dd/C0TzD01QDuKterFJDHsiL6MVi7Ov5OTUweYZn6NmQA1a7TSoG29TArIKEpYaj/hV8ew11FIijcNJc8uyBH5AhyJnPMzZrnw2MFJJ/q9WdZvwMhe7uq7acyDMqC9Uw2o+6oKdVw2ieUVnyTuHNkJZklfl8PsnTUxufZ2C67xTNijRLwKuh8MP4x0GBveecDDFRh8DoLFIYUoMwpAZhSA3CkBqEITUIQ2oQBj5vWBXlau/lV8V7NkgNXa63o01quB7XrA9WA38im0/mq10nTE0956P51InxfkD+zBZiwDwbutUEVW5Odu1rihzm8lk15tMOxoQacgqD9J/VUIhgNB88rv2A9Mi0gzH9bMjZpgZmEvhnxmR0qQFV8P3WOtI7G9Swc5ycmoQHnDck/wsgoARYJXUFmWqCoN5SHbSYb6aB/XihcfKRWHIWOdrdnZwtn1Oz6mQa3EO/yDmzOi/6nEIYUoMwpAZhSA3CkBqEITXk3P0FJJAa9jRJagighm1NkhrQN+G8k13vBDc5Z/ijfjxjsluMH89eanA/tWq2efO4vsQsjHsvut99Si1lFPfQ3Lj2k/xmTKjh8zxvx9D3IplWLal4bTChhs/zvB3Tatg5zi+Zbi2Mey+6f7+hsPeWjnfq+dQqOvbjhWb2BZKpxyDPe7H8DfO+pbwAa9UA7mwRgCinT2IHUoMwpAZhSA3CkBqEgdVAnjccLcB8vdQM0Y3r2XtBsR+P0XwCA9TglQCP88vR+WZQMpZnT/rxGM0nNqNnkS9WqSEPNNpFxp7MubvHT2zCMPRsyAGr3WYUg21q+C2bMEyoIe+ft4rn8/YnRxyFk+aSZw/84CS9nTbtcR1CsvbZQFa/KHE9IP137ZvzfAvn/AdmoRpG+1EXeoMahponNVz5/1Mw/hm1MSoJSfedolmgZgl4NRR+GP84KLD3nJMhhvyHR2eRwpAahCE1CENqEIbUIAypQRhSgzCkBmFIDeLFD3SGYZeFVbbYAAAAAElFTkSuQmCC" alt="" />