OpenWRT镜像爬虫搭建本地源

网上的爬虫不能用，还是先表达谢意，不过我比较懒不喜欢重复写别人写的教程，只贴出修改，怎么用自己看教程吧。

我自己改了一版可以正常爬：

#!/usr/bin/env python

#coding=utf-8

#

# Openwrt Package Grabber

#

# Copyright (C) 2016 sohobloo.me

#

import urllib2

import re

import os

import time

# the url of package list page, end with "/"

baseurl = 'https://downloads.openwrt.org/snapshots/trunk/ramips/mt7620/packages/'

# which directory to save all the packages, end with "/"

time = time.strftime("%Y%m%d%H%M%S", time.localtime())

savedir = './' + time + '/'

pattern = r'<a href="([^\?].*?)">'

cnt = 0

def fetch(url, path = ''):

    if not os.path.exists(savedir + path):

        os.makedirs(savedir + path)

    print 'fetching package list from ' + url

    content = urllib2.urlopen(url + path, timeout=15).read()

    items = re.findall(pattern, content)for item in items:

        if item == '../':

            continue

        elif item.endswith('/'):

            fetch(url, path + item)

        else:

            cnt += 1

            print 'downloading item %d: '%(cnt) + path + item

            if os.path.isfile(savedir + path + item):

                print 'file exists, ignored.'

            else:

                rfile = urllib2.urlopen(baseurl + path + item)

                with open(savedir + path + item, "wb") as code:

                    code.write(rfile.read())

fetch(baseurl)

print 'done!'

修改内容：

1. 增加了一级当前时间格式的根目录

2. 修改正则，过滤无效的地址（问号开头）

3. 改为递归爬目录结构

另外很高兴Python知识终于可以用了，撒花。

想更新截图失败，博客园看上去是要死了。

秒客网

OpenWRT镜像爬虫搭建本地源

相关文章