使用Glob()在Python中递归地查找文件?

This is what I have:

这就是我所拥有的:

glob(os.path.join('src','*.c'))

but I want to search the subfolders of src. Something like this would work:

但是我想搜索src的子文件夹。像这样的东西会起作用:

glob(os.path.join('src','*.c'))
glob(os.path.join('src','*','*.c'))
glob(os.path.join('src','*','*','*.c'))
glob(os.path.join('src','*','*','*','*.c'))

But this is obviously limited and clunky.

但这显然是有限的和笨拙的。

21 个解决方案

#1

907

Python 3.5+

Python 3.5 +

Starting with Python version 3.5, the glob module supports the "**" directive (which is parsed only if you pass recursive flag):

从Python版本3.5开始，glob模块支持“**”指令(只有通过递归标记时才解析):

import glob

for filename in glob.iglob('src/**/*.c', recursive=True):
    print(filename)

If you need a list, just use glob.glob instead of glob.iglob.

如果您需要一个列表，只需使用glob。代替glob.iglob水珠。

For cases where matching files beginning with a dot (.); like files in the current directory or hidden files on Unix based system, use the os.walk solution below.

对于以点(.)开头的匹配文件的情况;像当前目录中的文件或基于Unix的系统中的隐藏文件一样，使用操作系统。走下面的解决方案。

Python 2.2 to 3.4

Python 2.2到3.4

For older Python versions, starting with Python 2.2, use os.walk to recursively walk a directory and fnmatch.filter to match against a simple expression:

对于较老的Python版本，从Python 2.2开始，使用操作系统。走到递归地走一个目录和fnmatch。过滤器匹配一个简单的表达式:

import fnmatch
import os

matches = []
for root, dirnames, filenames in os.walk('src'):
    for filename in fnmatch.filter(filenames, '*.c'):
        matches.append(os.path.join(root, filename))

Python 2.1 and earlier

Python 2.1和更早的

For even older Python versions, use glob.glob against each filename instead of fnmatch.filter.

对于更老的Python版本，使用glob。对每个文件名使用glob，而不是fnmatch.filter。

#2

Similar to other solutions, but using fnmatch.fnmatch instead of glob, since os.walk already listed the filenames:

类似于其他的解决方案，但是使用fnmatch。fnmatch而不是glob，因为os。walk已经列出了文件名:

import os, fnmatch


def find_files(directory, pattern):
    for root, dirs, files in os.walk(directory):
        for basename in files:
            if fnmatch.fnmatch(basename, pattern):
                filename = os.path.join(root, basename)
                yield filename


for filename in find_files('src', '*.c'):
    print 'Found C source:', filename

Also, using a generator alows you to process each file as it is found, instead of finding all the files and then processing them.

另外，使用生成器可以让您处理每个文件，而不是查找所有的文件，然后处理它们。

#3

I've modified the glob module to support ** for recursive globbing, e.g:

我已经修改了glob模块以支持**进行递归的globbing，例如:

>>> import glob2
>>> all_header_files = glob2.glob('src/**/*.c')

https://github.com/miracle2k/python-glob2/

Useful when you want to provide your users with the ability to use the ** syntax, and thus os.walk() alone is not good enough.

当您想要为用户提供使用**语法的能力时，这是非常有用的，因此，仅使用walk()是不够的。

#4

Starting with Python 3.4, one can use the glob() method of one of the Path classes in the new pathlib module, which supports ** wildcards. For example:

从Python 3.4开始，您可以使用新的pathlib模块中的一个路径类的glob()方法，它支持**通配符。例如:

from pathlib import Path

for file_path in Path('src').glob('**/*.c'):
    print(file_path) # do whatever you need with these files

Update: Starting with Python 3.5, the same syntax is also supported by glob.glob().

更新:从Python 3.5开始，glob.glob()也支持相同的语法。

#5

import os
import fnmatch


def recursive_glob(treeroot, pattern):
    results = []
    for base, dirs, files in os.walk(treeroot):
        goodfiles = fnmatch.filter(files, pattern)
        results.extend(os.path.join(base, f) for f in goodfiles)
    return results

fnmatch gives you exactly the same patterns as glob, so this is really an excellent replacement for glob.glob with very close semantics. An iterative version (e.g. a generator), IOW a replacement for glob.iglob, is a trivial adaptation (just yield the intermediate results as you go, instead of extending a single results list to return at the end).

fnmatch给出的模式和glob完全相同，所以这是一个非常好的glob替换。具有非常密切的语义。一个迭代版本(例如生成器)，IOW替换了glob。iglob，是一种简单的适应(只在你走的时候产生中间结果，而不是在最后扩展一个结果列表)。

#6

You'll want to use os.walk to collect filenames that match your criteria. For example:

你会想使用操作系统。走路去收集符合你的标准的文件名。例如:

import os
cfiles = []
for root, dirs, files in os.walk('src'):
  for file in files:
    if file.endswith('.c'):
      cfiles.append(os.path.join(root, file))

#7

Here's a solution with nested list comprehensions, os.walk and simple suffix matching instead of glob:

下面是一个包含嵌套列表理解的解决方案。走路和简单的后缀匹配而不是glob:

import os
cfiles = [os.path.join(root, filename)
          for root, dirnames, filenames in os.walk('src')
          for filename in filenames if filename.endswith('.c')]

It can be compressed to a one-liner:

它可以被压缩成一行:

import os;cfiles=[os.path.join(r,f) for r,d,fs in os.walk('src') for f in fs if f.endswith('.c')]

or generalized as a function:

或概括为一个函数:

import os

def recursive_glob(rootdir='.', suffix=''):
    return [os.path.join(looproot, filename)
            for looproot, _, filenames in os.walk(rootdir)
            for filename in filenames if filename.endswith(suffix)]

cfiles = recursive_glob('src', '.c')

If you do need full glob style patterns, you can follow Alex's and Bruno's example and use fnmatch:

如果你需要完整的glob样式，你可以跟随Alex和Bruno的例子，使用fnmatch:

import fnmatch
import os

def recursive_glob(rootdir='.', pattern='*'):
    return [os.path.join(looproot, filename)
            for looproot, _, filenames in os.walk(rootdir)
            for filename in filenames
            if fnmatch.fnmatch(filename, pattern)]

cfiles = recursive_glob('src', '*.c')

#8

Johan and Bruno provide excellent solutions on the minimal requirement as stated. I have just released Formic which implements Ant FileSet and Globs which can handle this and more complicated scenarios. An implementation of your requirement is:

Johan和Bruno提供了非常好的解决方案。我刚刚发布了Formic，它实现了Ant文件集和Globs，它可以处理这个和更复杂的场景。您的需求的实现是:

import formic
fileset = formic.FileSet(include="/src/**/*.c")
for file_name in fileset.qualified_files():
    print file_name

#9

based on other answers this is my current working implementation, which retrieves nested xml files in a root directory:

基于其他答案，这是我当前的工作实现，它在根目录中检索嵌套的xml文件:

files = []
for root, dirnames, filenames in os.walk(myDir):
    files.extend(glob.glob(root + "/*.xml"))

I'm really having fun with python :)

我真的很喜欢python:)

#10

Recently I had to recover my pictures with the extension .jpg. I ran photorec and recovered 4579 directories 2.2 million files within, having tremendous variety of extensions.With the script below I was able to select 50133 files havin .jpg extension within minutes:

最近我不得不把我的照片和扩展名.jpg恢复。我运行了photorec，找到了4579个目录，其中有220万个文件，有各种各样的扩展。在下面的脚本中，我可以在几分钟内选择50133个文件havin .jpg扩展:

#!/usr/binenv python2.7

import glob
import shutil
import os

src_dir = "/home/mustafa/Masaüstü/yedek"
dst_dir = "/home/mustafa/Genel/media"
for mediafile in glob.iglob(os.path.join(src_dir, "*", "*.jpg")): #"*" is for subdirectory
    shutil.copy(mediafile, dst_dir)

#11

Another way to do it using just the glob module. Just seed the rglob method with a starting base directory and a pattern to match and it will return a list of matching file names.

另一种方法是使用glob模块。只需在rglob方法中添加一个启动基目录和一个匹配的模式，它将返回匹配文件名的列表。

import glob
import os

def _getDirs(base):
    return [x for x in glob.iglob(os.path.join( base, '*')) if os.path.isdir(x) ]

def rglob(base, pattern):
    list = []
    list.extend(glob.glob(os.path.join(base,pattern)))
    dirs = _getDirs(base)
    if len(dirs):
        for d in dirs:
            list.extend(rglob(os.path.join(base,d), pattern))
    return list

#12

Just made this.. it will print files and directory in hierarchical way

只是做了这个. .它将以分层的方式打印文件和目录。

But I didn't used fnmatch or walk

但我没有使用过fnmatch或walk。

#!/usr/bin/python

import os,glob,sys

def dirlist(path, c = 1):

        for i in glob.glob(os.path.join(path, "*")):
                if os.path.isfile(i):
                        filepath, filename = os.path.split(i)
                        print '----' *c + filename

                elif os.path.isdir(i):
                        dirname = os.path.basename(i)
                        print '----' *c + dirname
                        c+=1
                        dirlist(i,c)
                        c-=1


path = os.path.normpath(sys.argv[1])
print(os.path.basename(path))
dirlist(path)

#13

In addition to the suggested answers, you can do this with some lazy generation and list comprehension magic:

除了建议的答案之外，你还可以通过一些懒惰的生成和列表理解魔术来实现这一点:

import os, glob, itertools

results = itertools.chain.from_iterable(glob.iglob(os.path.join(root,'*.c'))
                                               for root, dirs, files in os.walk('src'))

for f in results: print(f)

Besides fitting in one line and avoiding unnecessary lists in memory, this also has the nice side effect, that you can use it in a way similar to the ** operator, e.g., you could use os.path.join(root, 'some/path/*.c') in order to get all .c files in all sub directories of src that have this structure.

除了在一个行中进行拟合和避免内存中不必要的列表之外，这还具有良好的副作用，您可以使用类似于**操作符的方式使用它，例如，您可以使用os.path。连接(root， 'some/path/*.c')，以便在具有该结构的src的所有子目录中获取所有.c文件。

#14

Simplified version of Johan Dahlin's answer, without fnmatch.

简化版的Johan Dahlin的答案，没有fnmatch。

import os

matches = []
for root, dirnames, filenames in os.walk('src'):
  matches += [os.path.join(root, f) for f in filenames if f[-2:] == '.c']

#15

Or with a list comprehension:

或者有一个列表理解:

 >>> base = r"c:\User\xtofl"
 >>> binfiles = [ os.path.join(base,f) 
            for base, _, files in os.walk(root) 
            for f in files if f.endswith(".jpg") ]

#16

That one uses fnmatch or regular expression:

使用fnmatch或正则表达式:

import fnmatch, os

def filepaths(directory, pattern):
    for root, dirs, files in os.walk(directory):
        for basename in files:
            try:
                matched = pattern.match(basename)
            except AttributeError:
                matched = fnmatch.fnmatch(basename, pattern)
            if matched:
                yield os.path.join(root, basename)

# usage
if __name__ == '__main__':
    from pprint import pprint as pp
    import re
    path = r'/Users/hipertracker/app/myapp'
    pp([x for x in filepaths(path, re.compile(r'.*\.py$'))])
    pp([x for x in filepaths(path, '*.py')])

#17

Here is my solution using list comprehension to search for multiple file extensions recursively in a directory and all subdirectories:

下面是我的解决方案，使用列表理解在目录和所有子目录中递归地搜索多个文件扩展:

import os, glob

def _globrec(path, *exts):
""" Glob recursively a directory and all subdirectories for multiple file extensions 
    Note: Glob is case-insensitive, i. e. for '\*.jpg' you will get files ending
    with .jpg and .JPG

    Parameters
    ----------
    path : str
        A directory name
    exts : tuple
        File extensions to glob for

    Returns
    -------
    files : list
        list of files matching extensions in exts in path and subfolders

    """
    dirs = [a[0] for a in os.walk(path)]
    f_filter = [d+e for d in dirs for e in exts]    
    return [f for files in [glob.iglob(files) for files in f_filter] for f in files]

my_pictures = _globrec(r'C:\Temp', '\*.jpg','\*.bmp','\*.png','\*.gif')
for f in my_pictures:
    print f

#18

import sys, os, glob

dir_list = ["c:\\books\\heap"]

while len(dir_list) > 0:
    cur_dir = dir_list[0]
    del dir_list[0]
    list_of_files = glob.glob(cur_dir+'\\*')
    for book in list_of_files:
        if os.path.isfile(book):
            print(book)
        else:
            dir_list.append(book)

#19

I modified the top answer in this posting.. and recently created this script which will loop through all files in a given directory (searchdir) and the sub-directories under it... and prints filename, rootdir, modified/creation date, and size.

我在这个帖子里修改了上面的答案。并且最近创建了这个脚本，它将遍历给定目录(searchdir)中的所有文件和它下面的子目录。打印文件名、rootdir、修改/创建日期和大小。

Hope this helps someone... and they can walk the directory and get fileinfo.

希望这可以帮助别人……他们可以走到目录，获取文件信息。

import time
import fnmatch
import os

def fileinfo(file):
    filename = os.path.basename(file)
    rootdir = os.path.dirname(file)
    lastmod = time.ctime(os.path.getmtime(file))
    creation = time.ctime(os.path.getctime(file))
    filesize = os.path.getsize(file)

    print "%s**\t%s\t%s\t%s\t%s" % (rootdir, filename, lastmod, creation, filesize)

searchdir = r'D:\Your\Directory\Root'
matches = []

for root, dirnames, filenames in os.walk(searchdir):
    ##  for filename in fnmatch.filter(filenames, '*.c'):
    for filename in filenames:
        ##      matches.append(os.path.join(root, filename))
        ##print matches
        fileinfo(os.path.join(root, filename))

#20

Here is a solution that will match the pattern against the full path and not just the base filename.

这里有一个解决方案，它将与整个路径匹配，而不仅仅是基本文件名。

It uses fnmatch.translate to convert a glob-style pattern into a regular expression, which is then matched against the full path of each file found while walking the directory.

它使用:。将globstyle模式转换为正则表达式，然后在遍历目录时找到每个文件的完整路径。

re.IGNORECASE is optional, but desirable on Windows since the file system itself is not case-sensitive. (I didn't bother compiling the regex because docs indicate it should be cached internally.)

ignorecase是可选的，但在Windows上是可取的，因为文件系统本身不区分大小写。(我没有费心编译regex，因为文档表明它应该在内部缓存。)

import fnmatch
import os
import re

def findfiles(dir, pattern):
    patternregex = fnmatch.translate(pattern)
    for root, dirs, files in os.walk(dir):
        for basename in files:
            filename = os.path.join(root, basename)
            if re.search(patternregex, filename, re.IGNORECASE):
                yield filename

#21

I needed a solution for python 2.x that works fast on large directories.
I endet up with this:

我需要一个python 2的解决方案。在大目录上快速运行的x。我喜欢这样:

import subprocess
foundfiles= subprocess.check_output("ls src/*.c src/**/*.c", shell=True)
for foundfile in foundfiles.splitlines():
    print foundfile

Note that you might need some exception handling in case ls doesn't find any matching file.

注意，如果ls没有找到匹配的文件，您可能需要一些异常处理。

#1

907