python-网络爬虫初学一:获取网页源码以及发送POST和GET请求

时间:2022-04-14 12:08:52

一、工具包urlllib和urllib2导入;

# GET和POST请求需要工具包urllib
import urllib
# 导入工具包
import urllib2

二、a)爬取网站对应的源码

# 通过资源定位符获取网页对象,通过read方法返回网页的源码
response = urllib2.urlopen("http://www.baidu.com")
print response.read()

b)将其写得规范一点,则如下所示

# 构造request请求实例
request = urllib2.Request("http://www.baidu.com")
response = urllib2.urlopen(request)
print response.read()

三、构造POST请求

# POST请求
values = {"username": "geek", "password": "**********"}
# 或者
values = {}
values["username"] = "geek"
values["password"] = "**********"
# 将字典编码
data = urllib.urlencode(values)
url = "https://passport.csdn.net/account/login?from=http://my.csdn.net/my/mycsdn"
request = urllib2.Request(url, data)
response = urllib2.urlopen(request)
print response.read()

四、构造GET请求

# GET请求
values = {"username": "geek", "password": "**********"}
data = urllib.urlencode(values)
url = "https://passport.csdn.net/account/login?from=http://my.csdn.net/my/mycsdn"
request = url + "?" + data
response = urllib2.urlopen(request)
print response.read()