5分钟学会Python爬取整个网站

时间:2023-02-07 07:58:23


爬取网站的步骤:

  1. 设定爬取目标
  • 目标网站:我自己的博客,疯狂的蚂蚁 http://www.crazyant.net
  • 目标数据:所有博客文章的 - 链接、标题、标签
  1. 分析目标网站
  • 待爬取页面:http://www.crazyant.net/page/1 ~ http://www.crazyant.net/page/24
  • 待爬取数据:HTML元素中的h2 class=entry-title下的超链接的标题和链接,标签列表
  1. 批量下载HTML
  • 使用requests库实现下载,官网:https://2.python-requests.org//zh_CN/latest/user/quickstart.html
  1. 实现HTML解析,得到目标数据
  • 使用BeautifulSoup库解析,官网:https://beautifulsoup.readthedocs.io/zh_CN/v4.4.0/
  1. 将结果数据存储
  • 可以使用json.dumps把这个数据序列化存储
  • 也可以将数据存入MySQL
import requests
from bs4 import BeautifulSoup
import pprint
import json

1、下载所有的页面的HTML

def download_all_htmls():
"""
下载所有列表页面的HTML,用于后续的分析
"""
htmls = []
for idx in range(24):
url = f"http://www.crazyant.net/page/{idx+1}"
print("craw html:", url)
r = requests.get(url)
if r.status_code != 200:
raise Exception("error")
htmls.append(r.text)
return htmls
# 执行爬取
htmls = download_all_htmls()
craw html: http://www.crazyant.net/page/1
craw html: http://www.crazyant.net/page/2
craw html: http://www.crazyant.net/page/3
craw html: http://www.crazyant.net/page/4
craw html: http://www.crazyant.net/page/5
craw html: http://www.crazyant.net/page/6
craw html: http://www.crazyant.net/page/7
craw html: http://www.crazyant.net/page/8
craw html: http://www.crazyant.net/page/9
craw html: http://www.crazyant.net/page/10
craw html: http://www.crazyant.net/page/11
craw html: http://www.crazyant.net/page/12
craw html: http://www.crazyant.net/page/13
craw html: http://www.crazyant.net/page/14
craw html: http://www.crazyant.net/page/15
craw html: http://www.crazyant.net/page/16
craw html: http://www.crazyant.net/page/17
craw html: http://www.crazyant.net/page/18
craw html: http://www.crazyant.net/page/19
craw html: http://www.crazyant.net/page/20
craw html: http://www.crazyant.net/page/21
craw html: http://www.crazyant.net/page/22
craw html: http://www.crazyant.net/page/23
craw html: http://www.crazyant.net/page/24
htmls[0]
'<!DOCTYPE html><html lang="zh-CN" class="no-js"><head><meta charset="UTF-8"><meta name="viewport" content="width=device-width"><link rel="profile" href="http://gmpg.org/xfn/11"><link rel="pingback" href="http://www.crazyant.net/xmlrpc.php"> <!--[if lt IE 9]> <script src="http://www.crazyant.net/wp-content/themes/twentyfifteen/js/html5.js"></script> <![endif]--> <script>(function(html){html.className = html.className.replace(/\\bno-js\\b/,\'js\')})(document.documentElement);</script> <title>疯狂的蚂蚁 – 视频公众号:蚂蚁学Python</title><link rel=\'dns-prefetch\' href=\'//cdn.bibblio.org\' /><link rel="alternate" type="application/rss+xml" title="疯狂的蚂蚁 » Feed" href="http://www.crazyant.net/feed" /><link rel="alternate" type="application/rss+xml" title="疯狂的蚂蚁 » 评论Feed" href="http://www.crazyant.net/comments/feed" /> <!-- managing ads with Advanced Ads – https://wpadvancedads.com/ --><script>advanced_ads_ready=function(){var fns=[],listener,doc=typeof document==="object"&&document,hack=doc&&doc.documentElement.doScroll,domContentLoaded="DOMContentLoaded",loaded=doc&&(hack?/^loaded|^c/:/^loaded|^i|^c/).test(doc.readyState);if(!loaded&&doc){listener=function(){doc.removeEventListener(domContentLoaded,listener);window.removeEventListener("load",listener);loaded=1;while(listener=fns.shift())listener()};doc.addEventListener(domContentLoaded,listener);window.addEventListener("load",listener)}return function(fn){loaded?setTimeout(fn,0):fns.push(fn)}}();</script><link rel=\'stylesheet\' id=\'crayon-css\'  href=\'http://www.crazyant.net/wp-content/plugins/crayon-syntax-highlighter/css/min/crayon.min.css\' type=\'text/css\' media=\'all\' /><link rel=\'stylesheet\' id=\'crayon-theme-xcode-css\'  href=\'http://www.crazyant.net/wp-content/plugins/crayon-syntax-highlighter/themes/xcode/crayon-theme-xcode.min.css\' type=\'text/css\' media=\'all\' /><link rel=\'stylesheet\' id=\'crayon-font-monaco-css\'  href=\'http://www.crazyant.net/wp-content/plugins/crayon-syntax-highlighter/fonts/crayon-font-monaco.min.css\' type=\'text/css\' media=\'all\' /><link rel=\'stylesheet\' id=\'wp-block-library-css\'  href=\'http://www.crazyant.net/wp-includes/css/dist/block-library/style.min.css\' type=\'text/css\' media=\'all\' /><link rel=\'stylesheet\' id=\'wp-block-library-theme-css\'  href=\'http://www.crazyant.net/wp-includes/css/dist/block-library/theme.min.css\' type=\'text/css\' media=\'all\' /><link rel=\'stylesheet\' id=\'bibblio_related_posts-css\'  href=\'http://www.crazyant.net/wp-content/plugins/bibblio-related-posts/public/css/bibblio_related_posts.min.css\' type=\'text/css\' media=\'all\' /><link rel=\'stylesheet\' id=\'bibblio-rcm-css-css\'  href=\'//cdn.bibblio.org/rcm/4.5/bib-related-content.css?ver=5.2.2\' type=\'text/css\' media=\'all\' /><link rel=\'stylesheet\' id=\'my-style-css\'  href=\'http://www.crazyant.net/wp-content/plugins/cardoza-3d-tag-cloud//public/css/my-style.min.css\' type=\'text/css\' media=\'all\' /><link rel=\'stylesheet\' id=\'wordpress-popular-posts-css-css\'  href=\'http://www.crazyant.net/wp-content/plugins/wordpress-popular-posts/public/css/wordpress-popular-posts-css.min.css\' type=\'text/css\' media=\'all\' /><link rel=\'stylesheet\' id=\'genericons-css\'  href=\'http://www.crazyant.net/wp-content/themes/twentyfifteen/genericons/genericons.min.css\' type=\'text/css\' media=\'all\' /><link rel=\'stylesheet\' id=\'twentyfifteen-style-css\'  href=\'http://www.crazyant.net/wp-content/themes/twentyfifteen/twentyfifteen-style.min.css\' type=\'text/css\' media=\'all\' /><link rel=\'stylesheet\' id=\'twentyfifteen-block-style-css\'  href=\'http://www.crazyant.net/wp-content/themes/twentyfifteen/css/twentyfifteen-block-style.min.css\' type=\'text/css\' media=\'all\' /> <!--[if lt IE 9]><link rel=\'stylesheet\' id=\'twentyfifteen-ie-css\'  href=\'http://www.crazyant.net/wp-content/themes/twentyfifteen/css/twentyfifteen-ie.min.css\' type=\'text/css\' media=\'all\' /> <![endif]--> <!--[if lt IE 8]><link rel=\'stylesheet\' id=\'twentyfifteen-ie7-css\'  href=\'http://www.crazyant.net/wp-content/themes/twentyfifteen/css/twentyfifteen-ie7.min.css\' type=\'text/css\' media=\'all\' /> <![endif]--><link rel=\'stylesheet\' id=\'fancybox-css\'  href=\'http://www.crazyant.net/wp-content/plugins/easy-fancybox/css/jquery.fancybox.min.css\' type=\'text/css\' media=\'screen\' /> <script type=\'text/javascript\' src=\'http://www.crazyant.net/wp-includes/js/jquery/jquery.js\'></script> <script async type=\'text/javascript\' src=\'http://www.crazyant.net/wp-includes/js/jquery/jquery-migrate.min.js\'></script> <script type=\'text/javascript\'>/* <![CDATA[ */\nvar CrayonSyntaxSettings = {"version":"_2.7.2_beta","is_admin":"0","ajaxurl":"http:\\/\\/www.crazyant.net\\/wp-admin\\/admin-ajax.php","prefix":"crayon-","setting":"crayon-setting","selected":"crayon-setting-selected","changed":"crayon-setting-changed","special":"crayon-setting-special","orig_value":"data-orig-value","debug":""};\nvar CrayonSyntaxStrings = {"copy":"Press %s to Copy, %s to Paste","minimize":"Click To Expand Code"};\n/* ]]> */</script> <script async type=\'text/javascript\' src=\'http://www.crazyant.net/wp-content/plugins/crayon-syntax-highlighter/js/min/crayon.min.js\'></script> <script async type=\'text/javascript\' src=\'http://www.crazyant.net/wp-content/uploads/siteground-optimizer-assets/bibblio_related_posts.min.js\'></script> <script async type=\'text/javascript\' src=\'http://www.crazyant.net/wp-content/plugins/cardoza-3d-tag-cloud/jquery.tagcanvas.min.js\'></script> <script type=\'text/javascript\'>/* <![CDATA[ */\nvar wpp_params = {"sampling_active":"0","sampling_rate":"100","ajax_url":"http:\\/\\/www.crazyant.net\\/wp-json\\/wordpress-popular-posts\\/v1\\/popular-posts\\/","ID":"","token":"9585d03ca9","debug":""};\n/* ]]> */</script> <script async type=\'text/javascript\' src=\'http://www.crazyant.net/wp-content/plugins/wordpress-popular-posts/public/js/wpp-4.2.0.min.js\'></script> <link rel=\'https://api.w.org/\' href=\'http://www.crazyant.net/wp-json/\' /><link rel="EditURI" type="application/rsd+xml" title="RSD" href="http://www.crazyant.net/xmlrpc.php?rsd" /><link rel="wlwmanifest" type="application/wlwmanifest+xml" href="http://www.crazyant.net/wp-includes/wlwmanifest.xml" /><meta name="generator" content="WordPress 5.2.2" /> <script type="text/javascript">$j = jQuery.noConflict();\n\t\t$j(document).ready(function() {\n\t\t\tif(!$j(\'#myCanvas\').tagcanvas({\n\t\t\t\ttextColour: \'#333333\',\n\t\t\t\toutlineColour: \'#ffffff\',\n\t\t\t\treverse: true,\n\t\t\t\tdepth: 0.8,\n\t\t\t\ttextFont: null,\n\t\t\t\tweight: true,\n\t\t\t\tmaxSpeed: 0.05\n\t\t\t},\'tags\')) {\n\t\t\t\t$j(\'#myCanvasContainer\').hide();\n\t\t\t}\n\t\t});</script> <script type=\'text/javascript\'>// <![CDATA[\n    var ajaxUrl = "http://www.crazyant.net/wp-admin/admin-ajax.php";\n    //]]></script> <style type="text/css">.recentcomments a{display:inline !important;padding:0 !important;margin:0 !important;}</style> <script>var _hmt = _hmt || [];\n(function() {\n  var hm = document.createElement("script");\n  hm.src = "https://hm.baidu.com/hm.js?4c9637db87f741d7588ff42a2a9c057d";\n  var s = document.getElementsByTagName("script")[0]; \n  s.parentNode.insertBefore(hm, s);\n})();</script> </head><body class="home blog wp-embed-responsive"><div  class="hfeed site"> <a class="skip-link screen-reader-text" href="#content">跳至内容</a><div  class="sidebar"><header  class="site-header" role="banner"><div class="site-branding"><h1 class="site-title"><a href="http://www.crazyant.net/" rel="home">疯狂的蚂蚁</a></h1><p class="site-description">视频公众号:蚂蚁学Python</p> <button class="secondary-toggle">菜单和挂件</button></div><!-- .site-branding --></header><!-- .site-header --><div  class="secondary"><nav  class="main-navigation" role="navigation"><div class="menu-%e5%af%bc%e8%88%aa%e6%a0%8f-container"><ul id="menu-%e5%af%bc%e8%88%aa%e6%a0%8f" class="nav-menu"><li  class="menu-item menu-item-type-custom menu-item-object-custom menu-item-862"><a href="http://crazyant.net/">首页</a></li><li  class="menu-item menu-item-type-taxonomy menu-item-object-category menu-item-2482"><a href="http://www.crazyant.net/category/python-solvedoubts">Python-答疑解惑</a></li><li  class="menu-item menu-item-type-taxonomy menu-item-object-category menu-item-2475"><a href="http://www.crazyant.net/category/python-basic">Python-基础知识</a></li><li  class="menu-item menu-item-type-taxonomy menu-item-object-category menu-item-2476"><a href="http://www.crazyant.net/category/python-web">Python-Web开发</a></li><li  class="menu-item menu-item-type-taxonomy menu-item-object-category menu-item-2478"><a href="http://www.crazyant.net/category/python-bigdata">Python-大数据</a></li><li  class="menu-item menu-item-type-taxonomy menu-item-object-category menu-item-2479"><a href="http://www.crazyant.net/category/python-data-analysis">Python-数据分析</a></li><li  class="menu-item menu-item-type-taxonomy menu-item-object-category menu-item-2480"><a href="http://www.crazyant.net/category/python-machinelearning">Python-机器学习</a></li><li  class="menu-item menu-item-type-taxonomy menu-item-object-category menu-item-2468"><a href="http://www.crazyant.net/category/%e6%8e%a8%e8%8d%90%e7%b3%bb%e7%bb%9f">推荐系统合集</a></li><li  class="menu-item menu-item-type-post_type menu-item-object-page menu-item-has-children menu-item-2328"><a href="http://www.crazyant.net/%e5%85%b3%e4%ba%8e">关于我</a><ul class="sub-menu"><li  class="menu-item menu-item-type-post_type menu-item-object-page menu-item-861"><a href="http://www.crazyant.net/%e7%95%99%e8%a8%80%e5%b0%8f%e6%9c%ac">留言小本</a></li><li  class="menu-item menu-item-type-post_type menu-item-object-page menu-item-1844"><a href="http://www.crazyant.net/meditation-resource">冥想资料</a></li><li  class="menu-item menu-item-type-post_type menu-item-object-page menu-item-1941"><a href="http://www.crazyant.net/my_program_notes">编程笔记</a></li><li  class="menu-item menu-item-type-post_type menu-item-object-page menu-item-697"><a href="http://www.crazyant.net/%e5%b8%b8%e7%94%a8%e8%b5%84%e6%ba%90">常用资源</a></li><li  class="menu-item menu-item-type-post_type menu-item-object-page menu-item-1875"><a href="http://www.crazyant.net/my_book_list">个人书单</a></li><li  class="menu-item menu-item-type-post_type menu-item-object-page menu-item-1866"><a href="http://www.crazyant.net/%e4%b8%aa%e4%ba%ba%e7%ae%b4%e8%a8%80">个人箴言</a></li></ul></li></ul></div></nav><!-- .main-navigation --><div  class="widget-area" role="complementary"><aside  class="widget widget_search"><form role="search" method="get" class="search-form" action="http://www.crazyant.net/"> <label> <span class="screen-reader-text">搜索:</span> <input type="search" class="search-field" placeholder="搜索…" value="" name="s" /> </label> <input type="submit" class="search-submit screen-reader-text" value="搜索" /></form></aside><aside  class="widget widget_recent_entries"><h2 class="widget-title">近期文章</h2><ul><li> <a href="http://www.crazyant.net/2469.html">Spark使用Java开发遇到的那些类型错误</a></li><li> <a href="http://www.crazyant.net/2454.html">推荐系统:实现文章相似推荐的简单实例</a></li><li> <a href="http://www.crazyant.net/2447.html">Spark使用word2vec训练item2vec实现内容相关推荐</a></li><li> <a href="http://www.crazyant.net/2434.html">Pandas中对轴axis=0和axis=1的理解</a></li><li> <a href="http://www.crazyant.net/2419.html">Flask使用Pyecharts在单个页面展示多个图表</a></li></ul></aside><aside  class="widget widget_categories"><h2 class="widget-title">分类目录</h2><form action="http://www.crazyant.net" method="get"><label class="screen-reader-text" for="cat">分类目录</label><select  name=\'cat\' id=\'cat\' class=\'postform\' ><option value=\'-1\'>选择分类目录</option><option class="level-0" value="7">c++</option><option class="level-0" value="209">flask</option><option class="level-0" value="136">hadoop</option><option class="level-0" value="145">hive</option><option class="level-0" value="134">java</option><option class="level-0" value="36">mysql</option><option class="level-0" value="243">pandas</option><option class="level-0" value="8">php</option><option class="level-0" value="111">python</option><option class="level-0" value="144">shell</option><option class="level-0" value="200">spark</option><option class="level-0" value="202">tensorflow</option><option class="level-0" value="151">web</option><option class="level-0" value="121">wordpress</option><option class="level-0" value="149">个人旅程</option><option class="level-0" value="150">基础知识</option><option class="level-0" value="131">工具软件</option><option class="level-0" value="203">推荐系统</option><option class="level-0" value="152">操作系统</option><option class="level-0" value="148">数据采集</option><option class="level-0" value="211">数据驱动</option><option class="level-0" value="86">未分类</option><option class="level-0" value="216">机器学习</option><option class="level-0" value="205">程序人生</option><option class="level-0" value="133">站长</option> </select></form> <script type=\'text/javascript\'>/* <![CDATA[ */\n(function() {\n\tvar dropdown = document.getElementById( "cat" );\n\tfunction onCatChange() {\n\t\tif ( dropdown.options[ dropdown.selectedIndex ].value > 0 ) {\n\t\t\tdropdown.parentNode.submit();\n\t\t}\n\t}\n\tdropdown.onchange = onCatChange;\n})();\n/* ]]> */</script> </aside><aside  class="widget widget_recent_comments"><h2 class="widget-title">近期评论</h2><ul ><li class="recentcomments"><span class="comment-author-link"><a href=\'http://crazyant.net\' rel=\'external nofollow\' class=\'url\'>crazyant</a></span>发表在《<a href="http://www.crazyant.net/2404.html#comment-28288">听樊登的《非暴力沟通》</a>》</li><li class="recentcomments"><span class="comment-author-link"><a href=\'http://blog.antior.cn\' rel=\'external nofollow\' class=\'url\'>antior</a></span>发表在《<a href="http://www.crazyant.net/2404.html#comment-28287">听樊登的《非暴力沟通》</a>》</li><li class="recentcomments"><span class="comment-author-link"><a href=\'http://crazyant.net\' rel=\'external nofollow\' class=\'url\'>crazyant</a></span>发表在《<a href="http://www.crazyant.net/my_book_list#comment-28278">个人书单</a>》</li><li class="recentcomments"><span class="comment-author-link">d</span>发表在《<a href="http://www.crazyant.net/my_book_list#comment-28277">个人书单</a>》</li><li class="recentcomments"><span class="comment-author-link">赖文伟</span>发表在《<a href="http://www.crazyant.net/2145.html#comment-28042">快速找到Tomcat中最耗CPU的线程</a>》</li></ul></aside><aside  class="widget widget_tag_cloud"><h2 class="widget-title">标签</h2><div class="tagcloud"><ul class=\'wp-tag-cloud\' role=\'list\'><li><a href="http://www.crazyant.net/tag/apache" class="tag-cloud-link tag-link-233 tag-link-position-1" style="font-size: 9.7872340425532pt;" aria-label="apache (2个项目)">apache</a></li><li><a href="http://www.crazyant.net/tag/c" class="tag-cloud-link tag-link-69 tag-link-position-2" style="font-size: 14.950354609929pt;" aria-label="c++ (9个项目)">c++</a></li><li><a href="http://www.crazyant.net/tag/django" class="tag-cloud-link tag-link-118 tag-link-position-3" style="font-size: 13.460992907801pt;" aria-label="django (6个项目)">django</a></li><li><a href="http://www.crazyant.net/tag/excel" class="tag-cloud-link tag-link-230 tag-link-position-4" style="font-size: 9.7872340425532pt;" aria-label="excel (2个项目)">excel</a></li><li><a href="http://www.crazyant.net/tag/flask" class="tag-cloud-link tag-link-210 tag-link-position-5" style="font-size: 9.7872340425532pt;" aria-label="flask (2个项目)">flask</a></li><li><a href="http://www.crazyant.net/tag/hadoop" class="tag-cloud-link tag-link-173 tag-link-position-6" style="font-size: 13.460992907801pt;" aria-label="hadoop (6个项目)">hadoop</a></li><li><a href="http://www.crazyant.net/tag/hive" class="tag-cloud-link tag-link-175 tag-link-position-7" style="font-size: 16.737588652482pt;" aria-label="hive (14个项目)">hive</a></li><li><a href="http://www.crazyant.net/tag/java" class="tag-cloud-link tag-link-20 tag-link-position-8" style="font-size: 18.921985815603pt;" aria-label="java (24个项目)">java</a></li><li><a href="http://www.crazyant.net/tag/javascript" class="tag-cloud-link tag-link-21 tag-link-position-9" style="font-size: 13.460992907801pt;" aria-label="javascript (6个项目)">javascript</a></li><li><a href="http://www.crazyant.net/tag/jquery" class="tag-cloud-link tag-link-48 tag-link-position-10" style="font-size: 9.7872340425532pt;" aria-label="jquery (2个项目)">jquery</a></li><li><a href="http://www.crazyant.net/tag/jvm" class="tag-cloud-link tag-link-166 tag-link-position-11" style="font-size: 10.978723404255pt;" aria-label="jvm (3个项目)">jvm</a></li><li><a href="http://www.crazyant.net/tag/linux" class="tag-cloud-link tag-link-59 tag-link-position-12" style="font-size: 14.45390070922pt;" aria-label="linux (8个项目)">linux</a></li><li><a href="http://www.crazyant.net/tag/mac" class="tag-cloud-link tag-link-186 tag-link-position-13" style="font-size: 9.7872340425532pt;" aria-label="mac (2个项目)">mac</a></li><li><a href="http://www.crazyant.net/tag/maven" class="tag-cloud-link tag-link-222 tag-link-position-14" style="font-size: 9.7872340425532pt;" aria-label="maven (2个项目)">maven</a></li><li><a href="http://www.crazyant.net/tag/mybatis" class="tag-cloud-link tag-link-187 tag-link-position-15" style="font-size: 9.7872340425532pt;" aria-label="mybatis (2个项目)">mybatis</a></li><li><a href="http://www.crazyant.net/tag/mysql" class="tag-cloud-link tag-link-169 tag-link-position-16" style="font-size: 19.219858156028pt;" aria-label="mysql (26个项目)">mysql</a></li><li><a href="http://www.crazyant.net/tag/pandas" class="tag-cloud-link tag-link-244 tag-link-position-17" style="font-size: 9.7872340425532pt;" aria-label="pandas (2个项目)">pandas</a></li><li><a href="http://www.crazyant.net/tag/php" class="tag-cloud-link tag-link-17 tag-link-position-18" style="font-size: 22pt;" aria-label="php (50个项目)">php</a></li><li><a href="http://www.crazyant.net/tag/phpmyadmin" class="tag-cloud-link tag-link-25 tag-link-position-19" style="font-size: 8pt;" aria-label="phpmyadmin (1个项目)">phpmyadmin</a></li><li><a href="http://www.crazyant.net/tag/python" class="tag-cloud-link tag-link-170 tag-link-position-20" style="font-size: 21.304964539007pt;" aria-label="python (43个项目)">python</a></li><li><a href="http://www.crazyant.net/tag/qt" class="tag-cloud-link tag-link-236 tag-link-position-21" style="font-size: 9.7872340425532pt;" aria-label="qt (2个项目)">qt</a></li><li><a href="http://www.crazyant.net/tag/redis" class="tag-cloud-link tag-link-214 tag-link-position-22" style="font-size: 10.978723404255pt;" aria-label="redis (3个项目)">redis</a></li><li><a href="http://www.crazyant.net/tag/seo" class="tag-cloud-link tag-link-110 tag-link-position-23" style="font-size: 9.7872340425532pt;" aria-label="seo (2个项目)">seo</a></li><li><a href="http://www.crazyant.net/tag/shell" class="tag-cloud-link tag-link-174 tag-link-position-24" style="font-size: 13.957446808511pt;" aria-label="shell (7个项目)">shell</a></li><li><a href="http://www.crazyant.net/tag/spark" class="tag-cloud-link tag-link-198 tag-link-position-25" style="font-size: 11.971631205674pt;" aria-label="spark (4个项目)">spark</a></li><li><a href="http://www.crazyant.net/tag/svn" class="tag-cloud-link tag-link-122 tag-link-position-26" style="font-size: 9.7872340425532pt;" aria-label="svn (2个项目)">svn</a></li><li><a href="http://www.crazyant.net/tag/tensorflow" class="tag-cloud-link tag-link-199 tag-link-position-27" style="font-size: 9.7872340425532pt;" aria-label="tensorflow (2个项目)">tensorflow</a></li><li><a href="http://www.crazyant.net/tag/tomcat" class="tag-cloud-link tag-link-213 tag-link-position-28" style="font-size: 9.7872340425532pt;" aria-label="tomcat (2个项目)">tomcat</a></li><li><a href="http://www.crazyant.net/tag/ubuntu" class="tag-cloud-link tag-link-55 tag-link-position-29" style="font-size: 13.460992907801pt;" aria-label="ubuntu (6个项目)">ubuntu</a></li><li><a href="http://www.crazyant.net/tag/vim" class="tag-cloud-link tag-link-14 tag-link-position-30" style="font-size: 8pt;" aria-label="vim (1个项目)">vim</a></li><li><a href="http://www.crazyant.net/tag/win7" class="tag-cloud-link tag-link-226 tag-link-position-31" style="font-size: 12.765957446809pt;" aria-label="win7 (5个项目)">win7</a></li><li><a href="http://www.crazyant.net/tag/word" class="tag-cloud-link tag-link-229 tag-link-position-32" style="font-size: 9.7872340425532pt;" aria-label="word (2个项目)">word</a></li><li><a href="http://www.crazyant.net/tag/wordpress" class="tag-cloud-link tag-link-171 tag-link-position-33" style="font-size: 10.978723404255pt;" aria-label="wordpress (3个项目)">wordpress</a></li><li><a href="http://www.crazyant.net/tag/%e5%a4%a7%e6%95%b0%e6%8d%ae" class="tag-cloud-link tag-link-207 tag-link-position-34" style="font-size: 10.978723404255pt;" aria-label="大数据 (3个项目)">大数据</a></li><li><a href="http://www.crazyant.net/tag/%e5%ae%89%e5%85%a8" class="tag-cloud-link tag-link-16 tag-link-position-35" style="font-size: 8pt;" aria-label="安全 (1个项目)">安全</a></li><li><a href="http://www.crazyant.net/tag/%e6%8e%a8%e8%8d%90%e7%b3%bb%e7%bb%9f" class="tag-cloud-link tag-link-204 tag-link-position-36" style="font-size: 11.971631205674pt;" aria-label="推荐系统 (4个项目)">推荐系统</a></li><li><a href="http://www.crazyant.net/tag/%e6%93%8d%e4%bd%9c%e7%b3%bb%e7%bb%9f" class="tag-cloud-link tag-link-238 tag-link-position-37" style="font-size: 9.7872340425532pt;" aria-label="操作系统 (2个项目)">操作系统</a></li><li><a href="http://www.crazyant.net/tag/%e6%95%b0%e6%8d%ae%e5%ba%93" class="tag-cloud-link tag-link-23 tag-link-position-38" style="font-size: 11.971631205674pt;" aria-label="数据库 (4个项目)">数据库</a></li><li><a href="http://www.crazyant.net/tag/%e6%9c%ba%e5%99%a8%e5%ad%a6%e4%b9%a0" class="tag-cloud-link tag-link-215 tag-link-position-39" style="font-size: 9.7872340425532pt;" aria-label="机器学习 (2个项目)">机器学习</a></li><li><a href="http://www.crazyant.net/tag/%e7%88%ac%e8%99%ab" class="tag-cloud-link tag-link-189 tag-link-position-40" style="font-size: 15.347517730496pt;" aria-label="爬虫 (10个项目)">爬虫</a></li><li><a href="http://www.crazyant.net/tag/%e7%a8%8b%e5%ba%8f%e4%ba%ba%e7%94%9f" class="tag-cloud-link tag-link-206 tag-link-position-41" style="font-size: 18.127659574468pt;" aria-label="程序人生 (20个项目)">程序人生</a></li><li><a href="http://www.crazyant.net/tag/website" class="tag-cloud-link tag-link-172 tag-link-position-42" style="font-size: 10.978723404255pt;" aria-label="站长 (3个项目)">站长</a></li><li><a href="http://www.crazyant.net/tag/%e7%ae%97%e6%b3%95" class="tag-cloud-link tag-link-208 tag-link-position-43" style="font-size: 10.978723404255pt;" aria-label="算法 (3个项目)">算法</a></li><li><a href="http://www.crazyant.net/tag/%e7%bb%87%e6%a2%a6" class="tag-cloud-link tag-link-130 tag-link-position-44" style="font-size: 11.971631205674pt;" aria-label="织梦 (4个项目)">织梦</a></li><li><a href="http://www.crazyant.net/tag/%e8%ae%be%e8%ae%a1" class="tag-cloud-link tag-link-159 tag-link-position-45" style="font-size: 9.7872340425532pt;" aria-label="设计 (2个项目)">设计</a></li></ul></div></aside><aside  class="widget popular-posts"><h2 class="widget-title">热门文章</h2><!-- cached --> <!-- WordPress Popular Posts --><ul class="wpp-list"><li> <a href="http://www.crazyant.net/2447.html" title="Spark使用word2vec训练item2vec实现内容相关推荐" class="wpp-post-title" target="_self">Spark使用word2vec训练item2vec实现内容相关推荐</a> <span class="wpp-meta post-stats"><span class="wpp-views">60 views</span></span></li><li> <a href="http://www.crazyant.net/2454.html" title="推荐系统:实现文章相似推荐的简单实例" class="wpp-post-title" target="_self">推荐系统:实现文章相似推荐的简单实例</a> <span class="wpp-meta post-stats"><span class="wpp-views">29 views</span></span></li><li> <a href="http://www.crazyant.net/2434.html" title="Pandas中对轴axis=0和axis=1的理解" class="wpp-post-title" target="_self">Pandas中对轴axis=0和axis=1的理解</a> <span class="wpp-meta post-stats"><span class="wpp-views">26 views</span></span></li><li> <a href="http://www.crazyant.net/2469.html" title="Spark使用Java开发遇到的那些类型错误" class="wpp-post-title" target="_self">Spark使用Java开发遇到的那些类型错误</a> <span class="wpp-meta post-stats"><span class="wpp-views">6 views</span></span></li></ul></aside><aside class="widget crazy-widget"><h2 class="widget-title">Python技术视频分享公众号:蚂蚁学Python</h2><a href="http://zhishi.iqiyi.com/shop/#/home/P812d55d7c6d344c7a17365a452bb9e52.html"><img width="344" height="344" src=\'http://www.crazyant.net/wp-content/uploads/2019/08/小图.jpg\' alt=\'\'  /></a></aside><aside  class="widget widget_text"><h2 class="widget-title">分享文章</h2><div class="textwidget"><div class="bdsharebuttonbox"><a href="#" class="bds_more" ></a><a href="#" class="bds_qzone"  title="分享到QQ空间"></a><a href="#" class="bds_tsina"  title="分享到新浪微博"></a><a href="#" class="bds_tqq"  title="分享到腾讯微博"></a><a href="#" class="bds_renren"  title="分享到人人网"></a><a href="#" class="bds_weixin"  title="分享到微信"></a></div> <script>window._bd_share_config={"common":{"bdSnsKey":{},"bdText":"","bdMini":"2","bdPic":"","bdStyle":"0","bdSize":"16"},"share":{}};with(document)0[(getElementsByTagName(\'head\')[0]||body).appendChild(createElement(\'script\')).src=\'http://bdimg.share.baidu.com/static/api/js/share.js?v=89860593.js?cdnversion=\'+~(-new Date()/36e5)];</script></div></aside></div><!-- .widget-area --></div><!-- .secondary --></div><!-- .sidebar --><div  class="site-content"><div  class="content-area"><main  class="site-main" role="main"><article  class="post-2469 post type-post status-publish format-standard hentry category-java-development category-spark tag-java tag-spark"><header class="entry-header"><h2 class="entry-title"><a href="http://www.crazyant.net/2469.html" rel="bookmark">Spark使用Java开发遇到的那些类型错误</a></h2></header><!-- .entry-header --><div class="entry-summary"><p>Spark使用Java开发其实比较方便的,JAVA8的lambda表达式使得编写体验并不比Scala差很多,但是因为Spark本身使用Scala实现,导致使用Java开发的时候,也遇到不少的类型匹配问题。 本文列举出自己在工作开发中遇到的一些问题,供大家参考: WrappedArray和Vector 报错信息为:Caused by: java.lang.ClassCastException: sc … <a href="http://www.crazyant.net/2469.html" class="more-link">继续阅读<span class="screen-reader-text">Spark使用Java开发遇到的那些类型错误</span></a></p></div><!-- .entry-summary --><footer class="entry-footer"> <span class="posted-on"><span class="screen-reader-text">发布于 </span><a href="http://www.crazyant.net/2469.html" rel="bookmark"><time class="entry-date published" datetime="2019-08-28T08:04:02+00:00">2019-08-28</time><time class="updated" datetime="2019-08-28T09:17:12+00:00">2019-08-28</time></a></span><span class="byline"><span class="author vcard"><span class="screen-reader-text">作者 </span><a class="url fn n" href="http://www.crazyant.net/author/peishuai1987">crazyant</a></span></span><span class="cat-links"><span class="screen-reader-text">分类 </span><a href="http://www.crazyant.net/category/java-development" rel="category tag">java</a>、<a href="http://www.crazyant.net/category/spark" rel="category tag">spark</a></span><span class="tags-links"><span class="screen-reader-text">标签 </span><a href="http://www.crazyant.net/tag/java" rel="tag">java</a>、<a href="http://www.crazyant.net/tag/spark" rel="tag">spark</a></span><span class="comments-link"><a href="http://www.crazyant.net/2469.html#respond"><span class="screen-reader-text">于Spark使用Java开发遇到的那些类型错误</span>留下评论</a></span></footer><!-- .entry-footer --></article><!-- #post-2469 --><article  class="post-2454 post type-post status-publish format-standard hentry category-pandas category-python category-203 tag-pandas tag-python tag-sklearn tag-204"><header class="entry-header"><h2 class="entry-title"><a href="http://www.crazyant.net/2454.html" rel="bookmark">推荐系统:实现文章相似推荐的简单实例</a></h2></header><!-- .entry-header --><div class="entry-summary"><p>看了一篇文章实现了文章的内容相似度计算实现相似推荐,算法比较简单,非常适合我这种初学入门的人。 来自一篇英文文章:地址 文章标题为:How to build a content-based movie recommender system with Natural Language Processing 文章的代码在:地址 该文章实现相似推荐的步骤: 1、将CSV加载到pandas.pd 2、提取 … <a href="http://www.crazyant.net/2454.html" class="more-link">继续阅读<span class="screen-reader-text">推荐系统:实现文章相似推荐的简单实例</span></a></p></div><!-- .entry-summary --><footer class="entry-footer"> <span class="posted-on"><span class="screen-reader-text">发布于 </span><a href="http://www.crazyant.net/2454.html" rel="bookmark"><time class="entry-date published" datetime="2019-08-25T12:13:46+00:00">2019-08-25</time><time class="updated" datetime="2019-08-25T13:05:07+00:00">2019-08-25</time></a></span><span class="byline"><span class="author vcard"><span class="screen-reader-text">作者 </span><a class="url fn n" href="http://www.crazyant.net/author/peishuai1987">crazyant</a></span></span><span class="cat-links"><span class="screen-reader-text">分类 </span><a href="http://www.crazyant.net/category/pandas" rel="category tag">pandas</a>、<a href="http://www.crazyant.net/category/python" rel="category tag">python</a>、<a href="http://www.crazyant.net/category/%e6%8e%a8%e8%8d%90%e7%b3%bb%e7%bb%9f" rel="category tag">推荐系统</a></span><span class="tags-links"><span class="screen-reader-text">标签 </span><a href="http://www.crazyant.net/tag/pandas" rel="tag">pandas</a>、<a href="http://www.crazyant.net/tag/python" rel="tag">python</a>、<a href="http://www.crazyant.net/tag/sklearn" rel="tag">sklearn</a>、<a href="http://www.crazyant.net/tag/%e6%8e%a8%e8%8d%90%e7%b3%bb%e7%bb%9f" rel="tag">推荐系统</a></span><span class="comments-link"><a href="http://www.crazyant.net/2454.html#respond"><span class="screen-reader-text">于推荐系统:实现文章相似推荐的简单实例</span>留下评论</a></span></footer><!-- .entry-footer --></article><!-- #post-2454 --><article  class="post-2447 post type-post status-publish format-standard hentry category-java-development category-spark category-203 tag-item2vec tag-java tag-204"><header class="entry-header"><h2 class="entry-title"><a href="http://www.crazyant.net/2447.html" rel="bookmark">Spark使用word2vec训练item2vec实现内容相关推荐</a></h2></header><!-- .entry-header --><div class="entry-summary"><p>之前使用spark als训练协同过滤,然后导出itemvectors做相似度计算,后来学到了可以用word2vec实现item2vec的训练效果貌似更好,试了一下果然不错; spark版本:2.3.1,开发语言为JAVA 几大步骤 读取查看、点击、播放等行为数据,我用的是播放数据; 数据整理成(userid, itemid, playcnt)的形式,这个数据可能是聚合N天得到的; 过滤掉play … <a href="http://www.crazyant.net/2447.html" class="more-link">继续阅读<span class="screen-reader-text">Spark使用word2vec训练item2vec实现内容相关推荐</span></a></p></div><!-- .entry-summary --><footer class="entry-footer"> <span class="posted-on"><span class="screen-reader-text">发布于 </span><a href="http://www.crazyant.net/2447.html" rel="bookmark"><time class="entry-date published" datetime="2019-08-23T13:33:46+00:00">2019-08-23</time><time class="updated" datetime="2019-08-23T13:37:09+00:00">2019-08-23</time></a></span><span class="byline"><span class="author vcard"><span class="screen-reader-text">作者 </span><a class="url fn n" href="http://www.crazyant.net/author/peishuai1987">crazyant</a></span></span><span class="cat-links"><span class="screen-reader-text">分类 </span><a href="http://www.crazyant.net/category/java-development" rel="category tag">java</a>、<a href="http://www.crazyant.net/category/spark" rel="category tag">spark</a>、<a href="http://www.crazyant.net/category/%e6%8e%a8%e8%8d%90%e7%b3%bb%e7%bb%9f" rel="category tag">推荐系统</a></span><span class="tags-links"><span class="screen-reader-text">标签 </span><a href="http://www.crazyant.net/tag/item2vec" rel="tag">item2vec</a>、<a href="http://www.crazyant.net/tag/java" rel="tag">java</a>、<a href="http://www.crazyant.net/tag/%e6%8e%a8%e8%8d%90%e7%b3%bb%e7%bb%9f" rel="tag">推荐系统</a></span><span class="comments-link"><a href="http://www.crazyant.net/2447.html#respond"><span class="screen-reader-text">于Spark使用word2vec训练item2vec实现内容相关推荐</span>留下评论</a></span></footer><!-- .entry-footer --></article><!-- #post-2447 --><article  class="post-2434 post type-post status-publish format-standard hentry category-pandas category-python category-211 tag-pandas tag-python"><header class="entry-header"><h2 class="entry-title"><a href="http://www.crazyant.net/2434.html" rel="bookmark">Pandas中对轴axis=0和axis=1的理解</a></h2></header><!-- .entry-header --><div class="entry-summary"><p>刚学习numpy和Pandas,被axis、axis=0或者axis=’index’,axis=1或者axis=’columns’给搞蒙了,甚至经常觉得书是不是写错了,有点反直觉。 来自简书的一篇文章地址有张图解释的挺好的,见文章底部   引用一下这篇文章的话,理解的很好: 实际上axis = 1,指的是沿着行求所有列的平均值,代表了横轴, … <a href="http://www.crazyant.net/2434.html" class="more-link">继续阅读<span class="screen-reader-text">Pandas中对轴axis=0和axis=1的理解</span></a></p></div><!-- .entry-summary --><footer class="entry-footer"> <span class="posted-on"><span class="screen-reader-text">发布于 </span><a href="http://www.crazyant.net/2434.html" rel="bookmark"><time class="entry-date published" datetime="2019-08-20T00:11:12+00:00">2019-08-20</time><time class="updated" datetime="2019-08-20T00:15:11+00:00">2019-08-20</time></a></span><span class="byline"><span class="author vcard"><span class="screen-reader-text">作者 </span><a class="url fn n" href="http://www.crazyant.net/author/peishuai1987">crazyant</a></span></span><span class="cat-links"><span class="screen-reader-text">分类 </span><a href="http://www.crazyant.net/category/pandas" rel="category tag">pandas</a>、<a href="http://www.crazyant.net/category/python" rel="category tag">python</a>、<a href="http://www.crazyant.net/category/%e6%95%b0%e6%8d%ae%e9%a9%b1%e5%8a%a8" rel="category tag">数据驱动</a></span><span class="tags-links"><span class="screen-reader-text">标签 </span><a href="http://www.crazyant.net/tag/pandas" rel="tag">pandas</a>、<a href="http://www.crazyant.net/tag/python" rel="tag">python</a></span><span class="comments-link"><a href="http://www.crazyant.net/2434.html#respond"><span class="screen-reader-text">于Pandas中对轴axis=0和axis=1的理解</span>留下评论</a></span></footer><!-- .entry-footer --></article><!-- #post-2434 --><article  class="post-2419 post type-post status-publish format-standard hentry category-flask tag-echarts tag-flask tag-pyecharts tag-python"><header class="entry-header"><h2 class="entry-title"><a href="http://www.crazyant.net/2419.html" rel="bookmark">Flask使用Pyecharts在单个页面展示多个图表</a></h2></header><!-- .entry-header --><div class="entry-summary"><p>在Flask页面展示echarts,主要有两种方法: 方法1、原生echarts方法 自己在前端引入echarts.js文件、自己创建div、自己初始化echarts对象、自己从官网复制并且配置图表、自己给echarts对象设置配置项实现绘制,这种方法的缺点是配置项都是js的形式比较繁琐,对于后端开发人员来说有点过于参与前端js部分的配置开发; 这种方式参照echarts官网的方式,其实跟flas … <a href="http://www.crazyant.net/2419.html" class="more-link">继续阅读<span class="screen-reader-text">Flask使用Pyecharts在单个页面展示多个图表</span></a></p></div><!-- .entry-summary --><footer class="entry-footer"> <span class="posted-on"><span class="screen-reader-text">发布于 </span><a href="http://www.crazyant.net/2419.html" rel="bookmark"><time class="entry-date published" datetime="2019-08-04T14:24:29+00:00">2019-08-04</time><time class="updated" datetime="2019-08-04T14:39:06+00:00">2019-08-04</time></a></span><span class="byline"><span class="author vcard"><span class="screen-reader-text">作者 </span><a class="url fn n" href="http://www.crazyant.net/author/peishuai1987">crazyant</a></span></span><span class="cat-links"><span class="screen-reader-text">分类 </span><a href="http://www.crazyant.net/category/flask" rel="category tag">flask</a></span><span class="tags-links"><span class="screen-reader-text">标签 </span><a href="http://www.crazyant.net/tag/echarts" rel="tag">echarts</a>、<a href="http://www.crazyant.net/tag/flask" rel="tag">flask</a>、<a href="http://www.crazyant.net/tag/pyecharts" rel="tag">pyecharts</a>、<a href="http://www.crazyant.net/tag/python" rel="tag">python</a></span><span class="comments-link"><a href="http://www.crazyant.net/2419.html#respond"><span class="screen-reader-text">于Flask使用Pyecharts在单个页面展示多个图表</span>留下评论</a></span></footer><!-- .entry-footer --></article><!-- #post-2419 --><article  class="post-2404 post type-post status-publish format-standard hentry category-205 tag-206"><header class="entry-header"><h2 class="entry-title"><a href="http://www.crazyant.net/2404.html" rel="bookmark">听樊登的《非暴力沟通》</a></h2></header><!-- .entry-header --><div class="entry-summary"><p>最近在爱奇艺知识买了课程《樊登教你快乐地事业有成》,第一节是《樊登:非暴力沟通》,看完后觉得自己有一些感悟。 自己的感悟: 我们总是太喜欢评价、评判别人 不论是别人主动寻求你的帮助,或者你觉得自己是前辈想要指导一下对方,我们总是喜欢评价评判别人或者给别人建议,然而每个人的世界观不同,有什么资格评价别人呢,这都是不对的,如果对方寻求安慰向你倾诉,真正应该做的应该是体会对方的感受。 就像有一句话:如果 … <a href="http://www.crazyant.net/2404.html" class="more-link">继续阅读<span class="screen-reader-text">听樊登的《非暴力沟通》</span></a></p></div><!-- .entry-summary --><footer class="entry-footer"> <span class="posted-on"><span class="screen-reader-text">发布于 </span><a href="http://www.crazyant.net/2404.html" rel="bookmark"><time class="entry-date published updated" datetime="2019-08-04T01:00:24+00:00">2019-08-04</time></a></span><span class="byline"><span class="author vcard"><span class="screen-reader-text">作者 </span><a class="url fn n" href="http://www.crazyant.net/author/peishuai1987">crazyant</a></span></span><span class="cat-links"><span class="screen-reader-text">分类 </span><a href="http://www.crazyant.net/category/%e7%a8%8b%e5%ba%8f%e4%ba%ba%e7%94%9f" rel="category tag">程序人生</a></span><span class="tags-links"><span class="screen-reader-text">标签 </span><a href="http://www.crazyant.net/tag/%e7%a8%8b%e5%ba%8f%e4%ba%ba%e7%94%9f" rel="tag">程序人生</a></span><span class="comments-link"><a href="http://www.crazyant.net/2404.html#comments"><span class="screen-reader-text">听樊登的《非暴力沟通》</span>有2条评论</a></span></footer><!-- .entry-footer --></article><!-- #post-2404 --><article  class="post-2367 post type-post status-publish format-standard hentry category-java-development category-python category-tensorflow tag-java tag-python tag-tensorflow"><header class="entry-header"><h2 class="entry-title"><a href="http://www.crazyant.net/2367.html" rel="bookmark">Java和Python使用Grpc访问Tensorflow的Serving代码</a></h2></header><!-- .entry-header --><div class="entry-summary"><p>发现网上大量的代码都是mnist,我自己反正不是搞图像处理的,所以这个例子我怎么都不想搞; wide&deep这种,包含各种特征的模型,才是我的需要,iris也是从文本训练模型,所以非常简单; 本文给出Python和Java访问Tensorflow的Serving代码。 Java版本使用Grpc访问Tensorflow的Serving代码 [crayon-5d71218766da55308 … <a href="http://www.crazyant.net/2367.html" class="more-link">继续阅读<span class="screen-reader-text">Java和Python使用Grpc访问Tensorflow的Serving代码</span></a></p></div><!-- .entry-summary --><footer class="entry-footer"> <span class="posted-on"><span class="screen-reader-text">发布于 </span><a href="http://www.crazyant.net/2367.html" rel="bookmark"><time class="entry-date published" datetime="2019-07-30T02:07:29+00:00">2019-07-30</time><time class="updated" datetime="2019-07-30T02:08:48+00:00">2019-07-30</time></a></span><span class="byline"><span class="author vcard"><span class="screen-reader-text">作者 </span><a class="url fn n" href="http://www.crazyant.net/author/peishuai1987">crazyant</a></span></span><span class="cat-links"><span class="screen-reader-text">分类 </span><a href="http://www.crazyant.net/category/java-development" rel="category tag">java</a>、<a href="http://www.crazyant.net/category/python" rel="category tag">python</a>、<a href="http://www.crazyant.net/category/tensorflow" rel="category tag">tensorflow</a></span><span class="tags-links"><span class="screen-reader-text">标签 </span><a href="http://www.crazyant.net/tag/java" rel="tag">java</a>、<a href="http://www.crazyant.net/tag/python" rel="tag">python</a>、<a href="http://www.crazyant.net/tag/tensorflow" rel="tag">tensorflow</a></span><span class="comments-link"><a href="http://www.crazyant.net/2367.html#respond"><span class="screen-reader-text">于Java和Python使用Grpc访问Tensorflow的Serving代码</span>留下评论</a></span></footer><!-- .entry-footer --></article><!-- #post-2367 --><article  class="post-2351 post type-post status-publish format-standard hentry category-203 tag-204"><header class="entry-header"><h2 class="entry-title"><a href="http://www.crazyant.net/2351.html" rel="bookmark">推荐系统:怎样实现内容相似推荐</a></h2></header><!-- .entry-header --><div class="entry-summary"><p>很多产品想要加入推荐系统模块,最简单的就是做内容相似推荐,虽然技术简单但是效果却很好,对于增加用户粘性、提升用户留存有较多的效果,甚至很多产品后来加入了很多推荐模块之后,还是发现导流效果最好的依然是内容的相似推荐。 比如看完了一片《Python怎样读取MySQL》之后,在相似推荐中看到了一片题目为《Python操作MySQL的效果优化》的文章,很自然的就像多深入了解一下,于是就点进去看一看,那么对 … <a href="http://www.crazyant.net/2351.html" class="more-link">继续阅读<span class="screen-reader-text">推荐系统:怎样实现内容相似推荐</span></a></p></div><!-- .entry-summary --><footer class="entry-footer"> <span class="posted-on"><span class="screen-reader-text">发布于 </span><a href="http://www.crazyant.net/2351.html" rel="bookmark"><time class="entry-date published" datetime="2019-07-28T13:35:25+00:00">2019-07-28</time><time class="updated" datetime="2019-07-28T13:40:54+00:00">2019-07-28</time></a></span><span class="byline"><span class="author vcard"><span class="screen-reader-text">作者 </span><a class="url fn n" href="http://www.crazyant.net/author/peishuai1987">crazyant</a></span></span><span class="cat-links"><span class="screen-reader-text">分类 </span><a href="http://www.crazyant.net/category/%e6%8e%a8%e8%8d%90%e7%b3%bb%e7%bb%9f" rel="category tag">推荐系统</a></span><span class="tags-links"><span class="screen-reader-text">标签 </span><a href="http://www.crazyant.net/tag/%e6%8e%a8%e8%8d%90%e7%b3%bb%e7%bb%9f" rel="tag">推荐系统</a></span><span class="comments-link"><a href="http://www.crazyant.net/2351.html#respond"><span class="screen-reader-text">于推荐系统:怎样实现内容相似推荐</span>留下评论</a></span></footer><!-- .entry-footer --></article><!-- #post-2351 --><article  class="post-2343 post type-post status-publish format-standard hentry category-flask category-python tag-flask tag-python"><header class="entry-header"><h2 class="entry-title"><a href="http://www.crazyant.net/2343.html" rel="bookmark">Flask怎样从其他Python文件导入app.route视图函数</a></h2></header><!-- .entry-header --><div class="entry-summary"><p>用Blueprint这个东西实现; 主文件: flask_main.py</p><!-- Crayon Syntax Highlighter v_2.7.2_beta --><div  class="crayon-syntax crayon-theme-xcode crayon-font-monaco crayon-os-pc print-yes notranslate" data-settings=" minimize scroll-mouseover" style=" float: left; font-size: 14px !important; line-height: 16px !important;"><div class="crayon-plain-wrap"></div><div class="crayon-main" style=""><table class="crayon-table"><tr class="crayon-row"><td class="crayon-nums " ><div class="crayon-nums-content" style="font-size: 14px !important; line-height: 16px !important;"><div class="crayon-num" >1</div><div class="crayon-num" >2</div><div class="crayon-num" >3</div><div class="crayon-num" >4</div><div class="crayon-num" >5</div><div class="crayon-num" >6</div><div class="crayon-num" >7</div><div class="crayon-num" >8</div><div class="crayon-num" >9</div><div class="crayon-num" >10</div><div class="crayon-num" >11</div><div class="crayon-num" >12</div><div class="crayon-num" >13</div></div></td><td class="crayon-code"><div class="crayon-pre" style="font-size: 14px !important; line-height: 16px !important; -moz-tab-size:4; -o-tab-size:4; -webkit-tab-size:4; tab-size:4;"><div class="crayon-line" ><span class="crayon-e">import </span><span class="crayon-e">flask</span></div><div class="crayon-line" > </div><div class="crayon-line" ><span class="crayon-e">from </span><span class="crayon-v">flask_pyecharts</span><span class="crayon-sy">.</span><span class="crayon-e">flask_moudle2 </span><span class="crayon-e">import </span><span class="crayon-e">account_api</span></div><div class="crayon-line" > </div><div class="crayon-line" ><span class="crayon-v">app</span><span class="crayon-h"> </span><span class="crayon-o">=</span><span class="crayon-h"> </span><span class="crayon-v">flask</span><span class="crayon-sy">.</span><span class="crayon-e">Flask</span><span class="crayon-sy">(</span><span class="crayon-v">__name__</span><span class="crayon-sy">)</span></div><div class="crayon-line" > </div><div class="crayon-line" ><span class="crayon-v">app</span><span class="crayon-sy">.</span><span class="crayon-e">register_blueprint</span><span class="crayon-sy">(</span><span class="crayon-v">account_api</span><span class="crayon-sy">)</span></div><div class="crayon-line" > </div><div class="crayon-line" ><span class="crayon-sy">@</span><span class="crayon-v">app</span><span class="crayon-sy">.</span><span class="crayon-e">route</span><span class="crayon-sy">(</span><span class="crayon-s">"/hello"</span><span class="crayon-sy">)</span></div><div class="crayon-line" ><span class="crayon-e">def </span><span class="crayon-e">hello</span><span class="crayon-sy">(</span><span class="crayon-sy">)</span><span class="crayon-o">:</span></div><div class="crayon-line" ><span class="crayon-h">    </span><span class="crayon-st">return</span><span class="crayon-h"> </span><span class="crayon-s">"hello"</span></div><div class="crayon-line" > </div><div class="crayon-line" ><span class="crayon-v">app</span><span class="crayon-sy">.</span><span class="crayon-e">run</span><span class="crayon-sy">(</span><span class="crayon-sy">)</span></div></div></td></tr></table></div></div> <!-- [Format Time: 0.0003 seconds] --><p> 引入的一个Module的文件,这个文件中写了视图函数 flask_moudle2.py</p><!-- Crayon Syntax Highlighter v_2.7.2_beta --><div  class="crayon-syntax crayon-theme-xcode crayon-font-monaco crayon-os-pc print-yes notranslate" data-settings=" minimize scroll-mouseover" style=" float: left; font-size: 14px !important; line-height: 16px !important;"><div class="crayon-plain-wrap"></div><div class="crayon-main" style=""><table class="crayon-table"><tr class="crayon-row"><td class="crayon-nums " ><div class="crayon-nums-content" style="font-size: 14px !important; line-height: 16px !important;"><div class="crayon-num" >1</div><div class="crayon-num" >2</div><div class="crayon-num" >3</div><div class="crayon-num" >4</div><div class="crayon-num" >5</div><div class="crayon-num" >6</div><div class="crayon-num" >7</div></div></td><td class="crayon-code"><div class="crayon-pre" style="font-size: 14px !important; line-height: 16px !important; -moz-tab-size:4; -o-tab-size:4; -webkit-tab-size:4; tab-size:4;"><div class="crayon-line" ><span class="crayon-e">from </span><span class="crayon-e">flask </span><span class="crayon-e">import </span><span class="crayon-e">Blueprint</span></div><div class="crayon-line" > </div><div class="crayon-line" ><span class="crayon-v">account_api</span><span class="crayon-h"> </span><span class="crayon-o">=</span><span class="crayon-h"> </span><span class="crayon-e">Blueprint</span><span class="crayon-sy">(</span><span class="crayon-s">\'account_api\'</span><span class="crayon-sy">,</span><span class="crayon-h"> </span><span class="crayon-v">__name__</span><span class="crayon-sy">)</span></div><div class="crayon-line" > </div><div class="crayon-line" ><span class="crayon-sy">@</span><span class="crayon-v">account_api</span><span class="crayon-sy">.</span><span class="crayon-e">route</span><span class="crayon-sy">(</span><span class="crayon-s">"/account"</span><span class="crayon-sy">)</span></div><div class="crayon-line" ><span class="crayon-e">def </span><span class="crayon-e">accountList</span><span class="crayon-sy">(</span><span class="crayon-sy">)</span><span class="crayon-o">:</span></div><div class="crayon-line" ><span class="crayon-h">    </span><span class="crayon-st">return</span><span class="crayon-h"> </span><span class="crayon-s">"list of accounts"</span></div></div></td></tr></table></div></div> <!-- [Format Time: 0.0002 seconds] --><p> 界面*问第一个函数和第二个函数都返回正常 贴一下官网蓝图的解释: Flask 用 蓝图(blueprin … <a href="http://www.crazyant.net/2343.html" class="more-link">继续阅读<span class="screen-reader-text">Flask怎样从其他Python文件导入app.route视图函数</span></a></p></div><!-- .entry-summary --><footer class="entry-footer"> <span class="posted-on"><span class="screen-reader-text">发布于 </span><a href="http://www.crazyant.net/2343.html" rel="bookmark"><time class="entry-date published" datetime="2019-07-28T12:00:48+00:00">2019-07-28</time><time class="updated" datetime="2019-07-28T12:04:47+00:00">2019-07-28</time></a></span><span class="byline"><span class="author vcard"><span class="screen-reader-text">作者 </span><a class="url fn n" href="http://www.crazyant.net/author/peishuai1987">crazyant</a></span></span><span class="cat-links"><span class="screen-reader-text">分类 </span><a href="http://www.crazyant.net/category/flask" rel="category tag">flask</a>、<a href="http://www.crazyant.net/category/python" rel="category tag">python</a></span><span class="tags-links"><span class="screen-reader-text">标签 </span><a href="http://www.crazyant.net/tag/flask" rel="tag">flask</a>、<a href="http://www.crazyant.net/tag/python" rel="tag">python</a></span><span class="comments-link"><a href="http://www.crazyant.net/2343.html#respond"><span class="screen-reader-text">于Flask怎样从其他Python文件导入app.route视图函数</span>留下评论</a></span></footer><!-- .entry-footer --></article><!-- #post-2343 --><article  class="post-2336 post type-post status-publish format-standard hentry category-205 tag-207 tag-206 tag-208"><header class="entry-header"><h2 class="entry-title"><a href="http://www.crazyant.net/2336.html" rel="bookmark">我为什么从工程转了算法?</a></h2></header><!-- .entry-header --><div class="entry-summary"><p>一句话总结下:年龄大了,总想让自己做的事情有意义点,所以想让自己写的代码对产品有更多的影响、可衡量的影响。 1、我发现自己的JAVA开发和大数据业务处理对产品影响甚微 我自己工作快7年,工作主要有两个方向: A – JAVA后台业务开发 来什么需求做什么开发,增删改查,接消息发消息,因为之前做的是公司商业运营部门的需求,面向公司运营市场人员,不直接面向普通用户,系统访问量特别低,往往一 … <a href="http://www.crazyant.net/2336.html" class="more-link">继续阅读<span class="screen-reader-text">我为什么从工程转了算法?</span></a></p></div><!-- .entry-summary --><footer class="entry-footer"> <span class="posted-on"><span class="screen-reader-text">发布于 </span><a href="http://www.crazyant.net/2336.html" rel="bookmark"><time class="entry-date published" datetime="2019-07-28T09:38:47+00:00">2019-07-28</time><time class="updated" datetime="2019-07-28T09:47:00+00:00">2019-07-28</time></a></span><span class="byline"><span class="author vcard"><span class="screen-reader-text">作者 </span><a class="url fn n" href="http://www.crazyant.net/author/peishuai1987">crazyant</a></span></span><span class="cat-links"><span class="screen-reader-text">分类 </span><a href="http://www.crazyant.net/category/%e7%a8%8b%e5%ba%8f%e4%ba%ba%e7%94%9f" rel="category tag">程序人生</a></span><span class="tags-links"><span class="screen-reader-text">标签 </span><a href="http://www.crazyant.net/tag/%e5%a4%a7%e6%95%b0%e6%8d%ae" rel="tag">大数据</a>、<a href="http://www.crazyant.net/tag/%e7%a8%8b%e5%ba%8f%e4%ba%ba%e7%94%9f" rel="tag">程序人生</a>、<a href="http://www.crazyant.net/tag/%e7%ae%97%e6%b3%95" rel="tag">算法</a></span><span class="comments-link"><a href="http://www.crazyant.net/2336.html#respond"><span class="screen-reader-text">于我为什么从工程转了算法?</span>留下评论</a></span></footer><!-- .entry-footer --></article><!-- #post-2336 --><nav class="navigation pagination" role="navigation"><h2 class="screen-reader-text">文章导航</h2><div class="nav-links"><span aria-current=\'page\' class=\'page-numbers current\'><span class="meta-nav screen-reader-text">页 </span>1</span> <a class=\'page-numbers\' href=\'http://www.crazyant.net/page/2\'><span class="meta-nav screen-reader-text">页 </span>2</a> <span class="page-numbers dots">…</span> <a class=\'page-numbers\' href=\'http://www.crazyant.net/page/24\'><span class="meta-nav screen-reader-text">页 </span>24</a> <a class="next page-numbers" href="http://www.crazyant.net/page/2">下一页</a></div></nav></main><!-- .site-main --></div><!-- .content-area --></div><!-- .site-content --><footer  class="site-footer" role="contentinfo"><div class="site-info"> <a href="https://cn.wordpress.org/" class="imprint"> 自豪地采用WordPress </a></div><!-- .site-info --></footer><!-- .site-footer --></div><!-- .site --> <script type=\'text/javascript\' src=\'//cdn.bibblio.org/rcm/4.5/bib-related-content.js?ver=5.2.2\'></script> <script type=\'text/javascript\' src=\'http://www.crazyant.net/wp-content/uploads/siteground-optimizer-assets/twentyfifteen-skip-link-focus-fix.min.js\'></script> <script type=\'text/javascript\'>/* <![CDATA[ */\nvar screenReaderText = {"expand":"<span class=\\"screen-reader-text\\">\\u5c55\\u5f00\\u5b50\\u83dc\\u5355<\\/span>","collapse":"<span class=\\"screen-reader-text\\">\\u6298\\u53e0\\u5b50\\u83dc\\u5355<\\/span>"};\n/* ]]> */</script> <script type=\'text/javascript\' src=\'http://www.crazyant.net/wp-content/uploads/siteground-optimizer-assets/twentyfifteen-script.min.js\'></script> <script type=\'text/javascript\' src=\'http://www.crazyant.net/wp-content/plugins/easy-fancybox/js/jquery.fancybox.min.js\'></script> <script type=\'text/javascript\'>var fb_timeout, fb_opts={\'overlayShow\':true,\'hideOnOverlayClick\':true,\'showCloseButton\':true,\'margin\':20,\'centerOnScroll\':false,\'enableEscapeButton\':true,\'autoScale\':true };\nif(typeof easy_fancybox_handler===\'undefined\'){\nvar easy_fancybox_handler=function(){\njQuery(\'.nofancybox,a.wp-block-file__button,a.pin-it-button,a[href*="pinterest.com/pin/create"],a[href*="facebook.com/share"],a[href*="twitter.com/share"]\').addClass(\'nolightbox\');\n/* IMG */\nvar fb_IMG_select=\'a[href*=".jpg"]:not(.nolightbox,li.nolightbox>a),area[href*=".jpg"]:not(.nolightbox),a[href*=".jpeg"]:not(.nolightbox,li.nolightbox>a),area[href*=".jpeg"]:not(.nolightbox),a[href*=".png"]:not(.nolightbox,li.nolightbox>a),area[href*=".png"]:not(.nolightbox),a[href*=".webp"]:not(.nolightbox,li.nolightbox>a),area[href*=".webp"]:not(.nolightbox)\';\njQuery(fb_IMG_select).addClass(\'fancybox image\');\nvar fb_IMG_sections=jQuery(\'.gallery,.wp-block-gallery,.tiled-gallery\');\nfb_IMG_sections.each(function(){jQuery(this).find(fb_IMG_select).attr(\'rel\',\'gallery-\'+fb_IMG_sections.index(this));});\njQuery(\'a.fancybox,area.fancybox,li.fancybox a\').each(function(){jQuery(this).fancybox(jQuery.extend({},fb_opts,{\'transitionIn\':\'elastic\',\'easingIn\':\'easeOutBack\',\'transitionOut\':\'elastic\',\'easingOut\':\'easeInBack\',\'opacity\':false,\'hideOnContentClick\':false,\'titleShow\':true,\'titlePosition\':\'over\',\'titleFromAlt\':true,\'showNavArrows\':true,\'enableKeyboardNav\':true,\'cyclic\':false}))});};\njQuery(\'a.fancybox-close\').on(\'click\',function(e){e.preventDefault();jQuery.fancybox.close()});\n};\nvar easy_fancybox_auto=function(){setTimeout(function(){jQuery(\'#fancybox-auto\').trigger(\'click\')},1000);};\njQuery(easy_fancybox_handler);jQuery(document).on(\'post-load\',easy_fancybox_handler);\njQuery(easy_fancybox_auto);</script> <script type=\'text/javascript\' src=\'http://www.crazyant.net/wp-content/plugins/easy-fancybox/js/jquery.easing.min.js\'></script> <script type=\'text/javascript\' src=\'http://www.crazyant.net/wp-content/plugins/easy-fancybox/js/jquery.mousewheel.min.js\'></script> <script type=\'text/javascript\' src=\'http://www.crazyant.net/wp-includes/js/wp-embed.min.js\'></script> </body></html>'

2、解析HTML得到数据

def parse_single_html(html):
"""
解析单个HTML,得到数据
@return list({"link", "title", [label]})
"""
soup = BeautifulSoup(html, 'html.parser')
articles = soup.find_all("article")
datas = []
for article in articles:
# 查找超链接
title_node = (
article
.find("h2", class_="entry-title")
.find("a")
)
title = title_node.get_text()
link = title_node["href"]

# 查找标签列表
tag_nodes = (
article
.find("footer", class_="entry-footer")
.find("span", class_="tags-links")
.find_all("a")
)
tags = [tag_node.get_text() for tag_node in tag_nodes]
datas.append(
{"title":title, "link":link, "tags":tags}
)
return datas
pprint.pprint(parse_single_html(htmls[0]))
[{'link': 'http://www.crazyant.net/2469.html',
'tags': ['java', 'spark'],
'title': 'Spark使用Java开发遇到的那些类型错误'},
{'link': 'http://www.crazyant.net/2454.html',
'tags': ['pandas', 'python', 'sklearn', '推荐系统'],
'title': '推荐系统:实现文章相似推荐的简单实例'},
{'link': 'http://www.crazyant.net/2447.html',
'tags': ['item2vec', 'java', '推荐系统'],
'title': 'Spark使用word2vec训练item2vec实现内容相关推荐'},
{'link': 'http://www.crazyant.net/2434.html',
'tags': ['pandas', 'python'],
'title': 'Pandas中对轴axis=0和axis=1的理解'},
{'link': 'http://www.crazyant.net/2419.html',
'tags': ['echarts', 'flask', 'pyecharts', 'python'],
'title': 'Flask使用Pyecharts在单个页面展示多个图表'},
{'link': 'http://www.crazyant.net/2404.html',
'tags': ['程序人生'],
'title': '听樊登的《非暴力沟通》'},
{'link': 'http://www.crazyant.net/2367.html',
'tags': ['java', 'python', 'tensorflow'],
'title': 'Java和Python使用Grpc访问Tensorflow的Serving代码'},
{'link': 'http://www.crazyant.net/2351.html',
'tags': ['推荐系统'],
'title': '推荐系统:怎样实现内容相似推荐'},
{'link': 'http://www.crazyant.net/2343.html',
'tags': ['flask', 'python'],
'title': 'Flask怎样从其他Python文件导入app.route视图函数'},
{'link': 'http://www.crazyant.net/2336.html',
'tags': ['大数据', '程序人生', '算法'],
'title': '我为什么从工程转了算法?'}]
# 执行所有的HTML页面的解析
all_datas = []
for html in htmls:
all_datas.extend(parse_single_html(html))
all_datas
[{'title': 'Spark使用Java开发遇到的那些类型错误',
'link': 'http://www.crazyant.net/2469.html',
'tags': ['java', 'spark']},
{'title': '推荐系统:实现文章相似推荐的简单实例',
'link': 'http://www.crazyant.net/2454.html',
'tags': ['pandas', 'python', 'sklearn', '推荐系统']},
{'title': 'Spark使用word2vec训练item2vec实现内容相关推荐',
'link': 'http://www.crazyant.net/2447.html',
'tags': ['item2vec', 'java', '推荐系统']},
{'title': 'python执行shell的两种方法',
'link': 'http://www.crazyant.net/1319.html',
'tags': ['python', 'shell']},
{'title': 'Python封装的常用日期函数',
'link': 'http://www.crazyant.net/1309.html',
'tags': ['python']},
{'title': 'python子类调用父类的方法',
'link': 'http://www.crazyant.net/1303.html',
'tags': ['python']},
{'title': 'wordpress按层级方式显示分类链接的方法',
'link': 'http://www.crazyant.net/1297.html',
'tags': ['wordpress']},
{'title': 'Firefox数据采集插件大全',
'link': 'http://www.crazyant.net/1292.html',
'tags': ['数据采集', '爬虫']},
{'title': 'Python生成文件md5校验值函数',
'link': 'http://www.crazyant.net/1216.html',
'tags': ['python']},
{'title': '网站从织梦DEDECMS迁移到WordPress过程以及URL重定向方法',
'link': 'http://www.crazyant.net/1214.html',
'tags': ['织梦']},
{'title': 'shell/hadoop/hive一些有用命令收集',
'link': 'http://www.crazyant.net/1209.html',
'tags': ['hadoop', 'hive', 'mysql', 'shell']},
{'title': 'Hive开发中使用变量的两种方法',
'link': 'http://www.crazyant.net/1203.html',
'tags': ['hive']},
{'title': 'hive从查询中获取数据插入到表或动态分区',
'link': 'http://www.crazyant.net/1197.html',
'tags': ['hive']},
{'title': 'Hive元数据存于mysql中文乱码解决',
'link': 'http://www.crazyant.net/1193.html',
'tags': ['hive', 'mysql']},
{'title': '为eclipse安装python、shell开发环境和SVN插件',
'link': 'http://www.crazyant.net/1185.html',
'tags': ['python', 'shell']},
{'title': 'hadoop第一个程序WordCount.java的编译运行过程',
'link': 'http://www.crazyant.net/1144.html',
'tags': ['hadoop']},
{'title': 'MYSQL向数据表插入默认字段值的方法',
'link': 'http://www.crazyant.net/1129.html',
'tags': ['mysql']},
{'title': 'Hadoop-Streaming实战经验及问题解决方法总结',
'link': 'http://www.crazyant.net/1122.html',
'tags': ['hadoop']},
{'title': 'Hadoop之使用python实现数据集合间join操作',
'link': 'http://www.crazyant.net/1112.html',
'tags': ['hadoop', 'python']},
{'title': 'Rational Rose根据Java代码自动生成类图(教程和错误解决)',
'link': 'http://www.crazyant.net/1094.html',
'tags': ['java']},
{'title': 'MathType(数学公式编辑器) 汉化绿色版V6.7下载',
'link': 'http://www.crazyant.net/1088.html',
'tags': ['mathtype']},
{'title': 'JSP使用JNA调用DLL函数遇到的几个问题',
'link': 'http://www.crazyant.net/1072.html',
'tags': ['java']},
{'title': '读《疯狂的站长》- 回顾反思我的个人站长路',
'link': 'http://www.crazyant.net/1066.html',
'tags': ['站长']},
{'title': '给计算机专业求职的同学推荐几本书',
'link': 'http://www.crazyant.net/1064.html',
'tags': ['程序人生']},
{'title': 'MySQL数据库存储过程教程',
'link': 'http://www.crazyant.net/1061.html',
'tags': ['mysql']},
{'title': 'Magento获取指定分类下的所有子分类信息',
'link': 'http://www.crazyant.net/1057.html',
'tags': ['magento', 'php']},
{'title': 'WIN7使用VisualSVN建立SVN服务器',
'link': 'http://www.crazyant.net/1055.html',
'tags': ['svn', 'win7']},
{'title': '织梦DEDECMS简洁蓝色模板免费下载(资讯文章类)',
'link': 'http://www.crazyant.net/1044.html',
'tags': ['织梦']},
{'title': 'Django基本命令最全收集',
'link': 'http://www.crazyant.net/1036.html',
'tags': ['django', 'python']},
{'title': '2012年百度、腾讯、微软、奇虎360、人人、去哪网找工作经历总结',
'link': 'http://www.crazyant.net/1030.html',
'tags': ['程序人生']},
{'title': 'PHP对数组的高级遍历和操作处理方法',
'link': 'http://www.crazyant.net/1022.html',
'tags': ['php']},
{'title': '使用PHP连接、操纵Memcached的原理和教程',
'link': 'http://www.crazyant.net/1014.html',
'tags': ['memcached', 'php', '数据库']},
{'title': 'Django关于站点管理Admin Site的常见问题解决方法',
'link': 'http://www.crazyant.net/1005.html',
'tags': ['django', 'python']},
{'title': '对Django框架架构和Request/Response处理流程的分析',
'link': 'http://www.crazyant.net/1001.html',
'tags': ['django', 'python']},
{'title': 'PHP开发者最好的学习资源收集',
'link': 'http://www.crazyant.net/970.html',
'tags': ['php']},
{'title': 'Ubuntu10.10 Server+Nginx+Django+Postgresql安装步骤',
'link': 'http://www.crazyant.net/955.html',
'tags': ['django', 'ngnix', 'ubuntu']},
{'title': 'PHP和MySQL处理树状、分级、无限分类、分层数据的方法',
'link': 'http://www.crazyant.net/930.html',
'tags': ['mysql', 'php']},
{'title': 'PHP创建和解析JSON数据的方法',
'link': 'http://www.crazyant.net/920.html',
'tags': ['json', 'php']},
len(all_datas)
239

3、将结果输出存储

with open("all_article_links.json", "w") as fout:
for data in all_datas:
fout.write(json.dumps(data, ensure_ascii=False)+"\n")

本文同时发布了视频教程,代码以及视频下载,可以关注微信公众号“蚂蚁学Python”获取。