python爬虫，使用BeautifulSoup解析爬出来的HTML代码时报错

时间：2023-03-10 00:40:22

UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html5lib"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 15 of the file D:/PycharmProjects/spider/beautiful.py. To get rid of this warning, change code that looks like this:

BeautifulSoup(YOUR_MARKUP})

to this:

BeautifulSoup(YOUR_MARKUP, "html5lib")

markup_type=markup_type))

soup = BeautifulSoup(html)

这不是错，是警告，原因是虽然操作虽然没有错，但却不合规范。再次运行，仍然是不规范，但由于之前报过一次错，这里就不再报错了。我使用的ide是pycharm，安装html5lib库。再把语句改成 BeautifulSoup(content, "html5lib")，content是爬出来的HTML代码字符串。

BeautifulSoup官网： https://beautifulsoup.readthedocs.io/zh_CN/v4.4.0/

python爬虫主要就是五个模块：爬虫启动入口模块，URL管理器存放已经爬虫的URL和待爬虫URL列表，html下载器，html解析器，html输出器同时可以掌握到urllib2的使用、bs4（BeautifulSoup）页面解析器、re正则表达式、urlparse、python基础知识回顾（set集合操作）等相关内容。



秒客网

python爬虫，使用BeautifulSoup解析爬出来的HTML代码时报错

相关文章