如何在当前目录中的所有* .txt文件上运行脚本? [重复]

时间:2021-06-11 07:08:48

This question already has an answer here:


I am trying to run below script on all *.txt files in current directory. Currently it will process only test.txt file and print block of text based on regular expression. What would be the quickest way of scanning current directory for *.txt files and running below script on all found *.txt files? Also how I could include lines containing 'word1' and 'word3' as currently script is printing only content between those two lines? I would like to print whole block.

我试图在当前目录中的所有* .txt文件下运行脚本。目前,它将仅处理test.txt文件并基于正则表达式打印文本块。扫描当前目录中* .txt文件并在所有找到的* .txt文件下运行脚本下最快捷的方法是什么?另外我如何在当前脚本中包含包含'word1'和'word3'的行只打印这两行之间的内容?我想打印整块。

#!/usr/bin/env python
import os, re
file = 'test.txt'
with open(file) as fp:
   for result in re.findall('word1(.*?)word3', fp.read(), re.S):
     print result

I would appreciate any advice or suggestions on how to improve above code e.g. speed when running on large set of text files. Thank you.


2 个解决方案



Use glob.glob:

import os, re
import glob

pattern = re.compile('word1(.*?)word3', flags=re.S)
for file in glob.glob('*.txt'):
    with open(file) as fp:
        for result in pattern.findall(fp.read()):
            print result



Inspired by the answer of falsetru, I rewrote my code, making it more generic.


Now the files to explore :


  • can be described either by a string as second argument that will be used by glob(),
    or by a function specifically written for this goal in case the set of desired files can't be described with a globish pattern


  • and may be in the current directory if no third argument is passed,
    or in a specified directory if its path is passed as a second argument



import re,glob
from itertools import ifilter
from os import getcwd,listdir,path
from inspect import isfunction

regx = re.compile('^[^\n]*word1.*?word3.*?$',re.S|re.M)

G = '\n\n'\
    'MWMWMW  %s\n'\
    'MWMWMW  %s\n'\

def search(REGX, how_to_find_files, dirpath='',
           G=G,sepm = '\n======================\n'):
    if dirpath=='':
        dirpath = getcwd()

    if isfunction(how_to_find_files):
        gen = ifilter(how_to_find_files,
    elif isinstance(how_to_find_files,str):
        gen = glob.glob(path.join(dirpath,

    for fn in gen:
        with open(fn) as fp:
            found = REGX.findall(fp.read())
            if found:
                yield G % (dirpath,path.basename(fn),

# Example of searching in .txt files

#============ one use ===================
def select(fn):
    return fn[-4:]=='.txt'
print ''.join(search(regx, select))

#============= another use ==============
print ''.join(search(regx,'*.txt'))

The advantage of chaining the treatments of sevral files through succession of generators is that the final joining with ''.join() creates a unique string that is instantly written,
while, if not so processed, the printing of several individual strings one after the other is longer because of the interrupts of displaying (am I understandable ?)




Use glob.glob:

import os, re
import glob

pattern = re.compile('word1(.*?)word3', flags=re.S)
for file in glob.glob('*.txt'):
    with open(file) as fp:
        for result in pattern.findall(fp.read()):
            print result



Inspired by the answer of falsetru, I rewrote my code, making it more generic.


Now the files to explore :


  • can be described either by a string as second argument that will be used by glob(),
    or by a function specifically written for this goal in case the set of desired files can't be described with a globish pattern


  • and may be in the current directory if no third argument is passed,
    or in a specified directory if its path is passed as a second argument



import re,glob
from itertools import ifilter
from os import getcwd,listdir,path
from inspect import isfunction

regx = re.compile('^[^\n]*word1.*?word3.*?$',re.S|re.M)

G = '\n\n'\
    'MWMWMW  %s\n'\
    'MWMWMW  %s\n'\

def search(REGX, how_to_find_files, dirpath='',
           G=G,sepm = '\n======================\n'):
    if dirpath=='':
        dirpath = getcwd()

    if isfunction(how_to_find_files):
        gen = ifilter(how_to_find_files,
    elif isinstance(how_to_find_files,str):
        gen = glob.glob(path.join(dirpath,

    for fn in gen:
        with open(fn) as fp:
            found = REGX.findall(fp.read())
            if found:
                yield G % (dirpath,path.basename(fn),

# Example of searching in .txt files

#============ one use ===================
def select(fn):
    return fn[-4:]=='.txt'
print ''.join(search(regx, select))

#============= another use ==============
print ''.join(search(regx,'*.txt'))

The advantage of chaining the treatments of sevral files through succession of generators is that the final joining with ''.join() creates a unique string that is instantly written,
while, if not so processed, the printing of several individual strings one after the other is longer because of the interrupts of displaying (am I understandable ?)
