Python3.X爬虫

1、Python很有名，但是一直没在实际项目中用过，今天花30分钟学习下。
去Python官网https://www.python.org/downloads/

2、2.X与3.X版本相差比较大，新手用最新的3.6.4。
3、下载安装。
4、安装BeautifulSoup,CMD进入C:\Users\xxx\AppData\Local\Programs\Python\Python36-32\Scripts，运行 pip install bs4.
5、桌面建一个记事本test.py，去网上找例子，注意3.x与2.x语法不一样下面代码是3.x下可以使用的。

#!/usr/bin/python

# -*- coding: utf-8 -*-

from bs4 import BeautifulSoup

import  urllib.request

url = r'http://douban.com'

res = urllib.request.urlopen(url)

html = res.read().decode('utf-8')

html_doc = """

<html><head><title>The Dormouse's story</title></head>

<body>

<p class="title"><b>The Dormouse's story</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were

<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,

<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and

<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;

and they lived at the bottom of a well.</p>

<p class="story">...</p>

"""  

#获取BeautifulSoup对象并按标准缩进格式输出，下面用html,或者html_doc一个是本地，一个是远程。

soup = BeautifulSoup(html,"html.parser")

print(soup.prettify())

print(soup.title)

6、右键Edit with IDE,Run、Run Moudle，输出结果了吧，入门就这么简单。

相关文章