Python爬虫-11-response.text出现乱码的解决方案

时间:2024-05-19 07:51:02

代码如下: 

# 这里是封装的一个下载url页面的方法


import requests

def download_page(url, user_Agent=None, referer=None):
    print("Downloading:",url)
    headers = {
        "Referer":referer,
        "User-Agent":user_Agent
    }
    response = requests.get(url=url,headers=headers)
    try:
        html = response.text
    except Exception as e:
        print("Download error:",e)
        html = None
    return html

if __name__ == '__main__':
    u = "http://192.168.1.19:8080/edu/"
    u_a = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36"
    print(download_page(url=u, user_Agent=u_a))

执行结果:

页面是下载下来了,但是有乱码

Python爬虫-11-response.text出现乱码的解决方案

 

考虑:

response.text以文本格式查看的时候有乱码,可能是返回的内容被压缩了,这里修改一下

response.content.decode("utf-8")  按utf-8格式输出

 

修改后的代码为:

import requests

def download_page(url, user_Agent=None, referer=None):
    print("Downloading:",url)
    headers = {
        "Referer":referer,
        "User-Agent":user_Agent
    }
    response = requests.get(url=url,headers=headers)
    try:
        html = response.content.decode("utf-8")
    except Exception as e:
        print("Download error:",e)
        html = None
    return html

if __name__ == '__main__':
    u = "http://192.168.1.19:8080/edu/"
    u_a = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36"
    print(download_page(url=u, user_Agent=u_a))

 

优化后执行结果:

正常显示

Python爬虫-11-response.text出现乱码的解决方案