博客园的模拟登陆(Simulated Login)

时间:2022-12-16 22:12:26

1.查看正常情况下登录博客园时本地浏览器向博客园的服务器发送的数据

首先打开博客园登录界面,填入登录用户名和密码,按快捷键 Ctrl+Alt+I 打开开发者管理器,然后点击登录 按钮,则可以在开发者管理器里看到发送的数据包内容。
博客园的模拟登陆(Simulated Login)

数据包内容的查看位置如下图所示。
博客园的模拟登陆(Simulated Login)

下面贴出了该数据包的内容

1.  General
1. Remote Address:121.199.251.55:80
2. Request URL:http://passport.cnblogs.com/login.aspx?ReturnUrl=http%3a%2f%2fwww.cnblogs.com%2f
3. Request Method:POST
4. Status Code:302 Found
2. Response Headers
1. Cache-Control:private
2. Connection:keep-alive
3. Content-Length:140
4. Content-Type:text/html; charset=utf-8
5. Date:Sat, 28 Mar 2015 11:14:18 GMT
6. Location:http://www.cnblogs.com/
7. Set-Cookie:.DottextCookie=8D07D4D6449D629F475F84028369F871661B6C9E8F77305038D6236B5A4E3F33E1803C65D52DAD18CEDE4F4DB0B530179489D11B1F92DA7D78506AAF3570BEC0DA8C283662326F44679A88D01E09F53AA243908301C66E1617CE5B183682D93B5F7B9843AF0945B4CC825AE1A989A536F79D6C434111BF40ADE21D90A2918901BE2AC17F688B210A274DAE79; domain=.cnblogs.com; path=/; HttpOnly
8. Set-Cookie:SERVERID=9b2e527de1fc6430919cfb3051ec3e6c|1427541258|1427541244;Path=/
9. X-AspNet-Version:4.0.30319
10. X-Powered-By:ASP.NET
11. X-UA-Compatible:IE=10
3. Request Headers
1. Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
2. Accept-Encoding:gzip, deflate
3. Accept-Language:en-US,en;q=0.8
4. Cache-Control:max-age=0
5. Connection:keep-alive
6. Content-Length:503
7. Content-Type:application/x-www-form-urlencoded
8. Cookie:__gads=ID=5f799eb5ff8a0d1c:T=1426060996:S=ALNI_MY3SIyB9wH3MOArdyDiV2aA15B-5w; _gat=1; _ga=GA1.2.327332698.1426074473; SERVERID=9b2e527de1fc6430919cfb3051ec3e6c|1427541248|1427541244
9. Host:passport.cnblogs.com
10. Origin:http://passport.cnblogs.com
11. Referer:http://passport.cnblogs.com/login.aspx?ReturnUrl=http%3A%2F%2Fwww.cnblogs.com%2F
12. User-Agent:Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.101 Safari/537.36
4. Query String Parameters
1. ReturnUrl:http://www.cnblogs.com/
5. Form Dataview sourceview URL encoded
1. __EVENTTARGET:
2. __EVENTARGUMENT:
3. __VIEWSTATE:/wEPDwUKLTM1MjEzOTU2MGQYAQUeX19Db250cm9sc1JlcXVpcmVQb3N0QmFja0tleV9fFgEFC2Noa1JlbWVtYmVy4b/ZXiH+8FthXlmKpjSEgi7XBNU=
4. __VIEWSTATEGENERATOR:C2EE9ABB
5. __EVENTVALIDATION:/wEdAAUIqCk3Gcmu25zI9fQWqoC7hI6Xi65hwcQ8/QoQCF8JIahXufbhIqPmwKf992GTkd0wq1PKp6+/1yNGng6H71Uxop4oRunf14dz2Zt2+QKDEM3sbzJLySdZoy08+/dzW8VF2on0
6. tbUserName:golden1314521@gmail.com
7. tbPassword:真实的密码
8. btnLogin:登 录
9. txtReturnUrl:http://www.cnblogs.com/

2.依据上一步得到的由本地浏览器发送给博客园服务器的数据包内容进行模拟登陆

该部分的工作就是用自己写的程序模拟本地浏览器来登陆服务器。

Python程序如下:

import time,urllib2,urllib

followeespagecontent = ""

try:
#设置 cookie
cookies = urllib2.HTTPCookieProcessor()
opener = urllib2.build_opener(cookies)
urllib2.install_opener(opener)

#下面的数据都是从上一步得到的
parms = {"tbUserName":"golden1314521@gmail.com","tbPassword":"真实的用户密码","__EVENTTARGET":"btnLogin","__EVENTARGUMENT":"",\
"__VIEWSTATE":"/wEPDwUKLTM1MjEzOTU2MGQYAQUeX19Db250cm9sc1JlcXVpcmVQb3N0QmFja0tleV9fFgEFC2Noa1JlbWVtYmVy4b/ZXiH+8FthXlmKpjSEgi7XBNU=",\
"__EVENTVALIDATION":"/wEdAAUIqCk3Gcmu25zI9fQWqoC7hI6Xi65hwcQ8/QoQCF8JIahXufbhIqPmwKf992GTkd0wq1PKp6+/1yNGng6H71Uxop4oRunf14dz2Zt2+QKDEM3sbzJLySdZoy08+/dzW8VF2on0",\
"txtReturnUrl":"http://www.cnblogs.com/"}
loginUrl = "http://passport.cnblogs.com/login.aspx?ReturnUrl=http%3A%2F%2Fwww.cnblogs.com%2F"
login = urllib2.urlopen(loginUrl,urllib.urlencode(parms))

#验证是否登录成功。如果登录成功,则能读取用户的好友信息所在的页面。
followeespage = urllib2.urlopen("http://home.cnblogs.com/followees/")
followeespagecontent = followeespage.read().decode("utf8")

except Exception,e:
print("登录失败")
pass

print followeespagecontent

在上面的程序中,为了验证是否成功登录博客园,我们访问了当前博客园用户的好友信息所在的页面,即http://home.cnblogs.com/followees/ ,该页面含有用户的好友信息的列表。

下面是成功登录的所得到的用户好友信息列表的内容输出:

....
<div id="main">
<div class="avatar_list">
<ul>

<li>
<div class="avatar_pic">
<a href="/u/heaad/"><img src="http://pic.cnitblog.com/face/u63234.png" alt="" title="苍梧"></a>
</div>
<div class="avatar_name">
<a href="/u/heaad/" title="苍梧">苍梧</a>
</div>
.....

<a href="http://www.cnblogs.com/AboutUS.aspx">关于博客园</a><a href="http://www.cnblogs.com/SiteMap.aspx">站点地图</a><a href="http://www.cnblogs.com/ContactUs.aspx">联系我们</a><a href="http://www.cnblogs.com/ad.aspx">广告服务</a>&copy; 2004-2015 <a href="http://www.cnblogs.com">博客园</a><span id="profiler_footer"></span>
</div>
</div>
</body>
</html>
.......

3.scrapy模拟登陆博客园

代码执行过程:
启动后,先执行方法 start_requests,该方法抛出一个网页爬取请求,网页地址是http://passport.cnblogs.com/login.aspx?ReturnUrl=http%3A%2F%2Fwww.cnblogs.com%2F,也就是博客园的登录界面,解析该网页的函数是post_login。postlogin方法以该方法接收到的response为基础发起请求,参数有浏览器信息headers,所填的包含用户名和密码等信息的表单信息formdata。

    headers = {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"Accept-Encoding": "gzip",
"Accept-Language": "en-US,en;q=0.8",
"Connection": "keep-alive",
"Content-Type":"text/html; charset=UTF-8",
"User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.101 Safari/537.36",
"Referer": "http://www.cnblogs.com/"
}
def start_requests(self):
return [Request("http://passport.cnblogs.com/login.aspx?ReturnUrl=http%3A%2F%2Fwww.cnblogs.com%2F", meta = {'cookiejar' : 1}, callback = self.post_login)]

#FormRequeset出问题了
def post_login(self, response):
print 'Preparing login'

#FormRequeset.from_response是Scrapy提供的一个函数, 用于post表单
#登陆成功后, 会调用after_login回调函数
return [FormRequest.from_response(response,
meta = {'cookiejar' : response.meta['cookiejar']},
headers = self.headers,
formdata = {
"tbUserName":"golden1314521@gmail.com",
"tbPassword":"***",
"__EVENTTARGET":"btnLogin","__EVENTARGUMENT":"",
"__VIEWSTATE":"/wEPDwUKLTM1MjEzOTU2MGQYAQUeX19Db250cm9sc1JlcXVpcmVQb3N0QmFja0tleV9fFgEFC2Noa1JlbWVtYmVy4b/ZXiH+8FthXlmKpjSEgi7XBNU=",
"__EVENTVALIDATION":"/wEdAAUIqCk3Gcmu25zI9fQWqoC7hI6Xi65hwcQ8/QoQCF8JIahXufbhIqPmwKf992GTkd0wq1PKp6+/1yNGng6H71Uxop4oRunf14dz2Zt2+QKDEM3sbzJLySdZoy08+/dzW8VF2on0",
"txtReturnUrl":"http://www.cnblogs.com/"
},
callback = self.after_login,
dont_filter = True
)]
def after_login(self, response) :
print "login successfully"
#登录成功后进入页面http://home.cnblogs.com/u/jinliangjiuzhuang/
yield self.make_requests_from_url("http://home.cnblogs.com/u/jinliangjiuzhuang/")

4. Reference