python直接按行读取gz压缩文件中的文本文件的数据

之前写了一个从日志文件中（txt文件）提取特定的日志，写入mysql数据库的脚本，由于日志太大，维护人员把日志打包压缩成了tar.gz格式。

之前txt文件单个文件超过2G，把单个txt文件打包压缩成一个tar.gz文件了。所以我的python脚本也需要修改。（服务器centos6.3）

本来想过一个方案，就是把tar.gz解压出来，然后再读取，读取完成后再把这个解压出来的文件删除掉，这个方案不是不可行，但不是很好，一个大文件的解压缩比较慢，另一个解压后比较占服务器磁盘。

后来发现另一个方案，直接按行读取gz压缩文件中的文本文件的数据。

这个是我windows下的测试脚本：

import os
import os.path
import gzip


def read_gz_file(path):
    if os.path.exists(path):
        with gzip.open(path, 'r') as pf:
            for line in pf:
                yield line
    else:
        print('the path [{}] is not exist!'.format(path))

con = read_gz_file('c:\\1.gz')
if getattr(con, '__iter__', None):
    for line in con:
        print(line)

strZipFile = 'c:\\1.gz'
strDstFile = 'c:\\2'
file   =   gzip.GzipFile(strZipFile,   "r")
outFile   =   open(strDstFile   , "w ")
outFile.write(file.read())
outFile.close()

附件是1.gz文件

运行结果：

sdfasfda

asdfasdf

asdfasdf

adsfadf

秒客网

python直接按行读取gz压缩文件中的文本文件的数据

相关文章