跟踪Python中的文件加载进度

时间:2022-06-27 00:11:42

A lot of modules I use import entire files into memory or trickle a file's contents in while they process it. I'm wondering if there's any way to track this sort of loading progress? Possibly a wrapper class that takes a callback?

我使用的很多模块将整个文件导入到内存中,或者在处理文件的内容时将其涓流。我想知道是否有办法跟踪这种加载进度?可能是一个需要回调的包装类?

2 个解决方案

#1


I would do by this by determining the size of the file, and then simply dividing the total by the number of bytes read. Like this:

我会通过确定文件的大小来做到这一点,然后简单地将总数除以读取的字节数。像这样:

import os

def show_progress(file_name, chunk_size=1024):
    fh = open(file_name, "r")
    total_size = os.path.getsize(file_name)
    total_read = 0
    while True:
        chunk = fh.read(chunk_size)
        if not chunk: 
            fh.close()
            break
        total_read += len(chunk)
        print "Progress: %s percent" % (total_read/total_size)
        yield chunk

for chunk in show_progress("my_file.txt"):
    # Process the chunk
    pass 

Edit: I know it isn't the best code, but I just wanted to show the concept.

编辑:我知道这不是最好的代码,但我只是想展示这个概念。

#2


If you actually mean "import" (not "read") then you can override the import module definitions. You can add timing capabilities.

如果您实际上是指“导入”(而非“读取”),那么您可以覆盖导入模块定义。您可以添加计时功能。

See the imp module.

请参阅imp模块。

If you mean "read", then you can trivially wrap Python files with your own file-like wrapper. Files don't expose too many methods. You can override the interesting ones to get timing data.

如果您的意思是“读取”,那么您可以使用自己的文件类包装器轻松地包装Python文件。文件不会暴露太多方法。您可以覆盖有趣的数据以获取时序数据。

>>> class MyFile(file):
...     def read(self,*args,**kw):
...         # start timing
...         result= super(MyFile,self).read(*args,**kw)
...         # finish timing
...         return result

#1


I would do by this by determining the size of the file, and then simply dividing the total by the number of bytes read. Like this:

我会通过确定文件的大小来做到这一点,然后简单地将总数除以读取的字节数。像这样:

import os

def show_progress(file_name, chunk_size=1024):
    fh = open(file_name, "r")
    total_size = os.path.getsize(file_name)
    total_read = 0
    while True:
        chunk = fh.read(chunk_size)
        if not chunk: 
            fh.close()
            break
        total_read += len(chunk)
        print "Progress: %s percent" % (total_read/total_size)
        yield chunk

for chunk in show_progress("my_file.txt"):
    # Process the chunk
    pass 

Edit: I know it isn't the best code, but I just wanted to show the concept.

编辑:我知道这不是最好的代码,但我只是想展示这个概念。

#2


If you actually mean "import" (not "read") then you can override the import module definitions. You can add timing capabilities.

如果您实际上是指“导入”(而非“读取”),那么您可以覆盖导入模块定义。您可以添加计时功能。

See the imp module.

请参阅imp模块。

If you mean "read", then you can trivially wrap Python files with your own file-like wrapper. Files don't expose too many methods. You can override the interesting ones to get timing data.

如果您的意思是“读取”,那么您可以使用自己的文件类包装器轻松地包装Python文件。文件不会暴露太多方法。您可以覆盖有趣的数据以获取时序数据。

>>> class MyFile(file):
...     def read(self,*args,**kw):
...         # start timing
...         result= super(MyFile,self).read(*args,**kw)
...         # finish timing
...         return result