获取一个目录中过滤的文件列表。

时间:2022-09-01 23:13:44

I am trying to get a list of files in a directory using Python, but I do not want a list of ALL the files.

我正在尝试使用Python在一个目录中获取一个文件列表,但是我不想要一个所有文件的列表。

What I essentially want is the ability to do something like the following but using Python and not executing ls.

我本质上想要的是能够执行以下操作,但是使用Python而不执行ls。

ls 145592*.jpg

If there is no built-in method for this, I am currently thinking of writing a for loop to iterate through the results of an os.listdir() and to append all the matching files to a new list.

如果没有内置的方法,我现在正在考虑编写一个for循环来遍历os.listdir()的结果,并将所有匹配的文件添加到一个新列表中。

However, there are a lot of files in that directory and therefore I am hoping there is a more efficient method (or a built-in method).

但是,该目录中有很多文件,因此我希望有一个更有效的方法(或内置方法)。

10 个解决方案

#1


249  

glob.glob('145592*.jpg')

glob.glob(145592 * . jpg)

#2


91  

glob.glob() is definitely the way to do it (as per Ignacio). However, if you do need more complicated matching, you can do it with a list comprehension and re.match(), something like so:

glob.glob()肯定是实现它的方法(根据Ignacio)。但是,如果您确实需要更复杂的匹配,那么您可以使用列表理解和re.match()来完成,如下所示:

files = [f for f in os.listdir('.') if re.match(r'[0-9]+.*\.jpg', f)]

More flexible, but as you note, less efficient.

更灵活,但正如你所注意到的,效率更低。

#3


27  

Keep it simple:

保持简单:

import os
relevant_path = "[path to folder]"
included_extenstions = ['jpg', 'bmp', 'png', 'gif']
file_names = [fn for fn in os.listdir(relevant_path)
              if any(fn.endswith(ext) for ext in included_extensions)]

I prefer this form of list comprehensions because it reads well in English.

我喜欢这种形式的列表理解,因为它在英语中读起来很好。

I read the fourth line as: For each fn in os.listdir for my path, give me only the ones that match any one of my included extensions.

我把第4行读为:对于os中的每个fn。对于路径的listdir,只提供与所包含的任何扩展匹配的。

It may be hard for novice python programmers to really get used to using list comprehensions for filtering, and it can have some memory overhead for very large data sets, but for listing a directory and other simple string filtering tasks, list comprehensions lead to more clean documentable code.

对于新手python程序员来说,要真正习惯于使用列表理解来进行过滤是很困难的,而且对于非常大的数据集来说,它可能会有一些内存开销,但是对于列出一个目录和其他简单的字符串过滤任务,列表理解会导致更清晰的可文档化代码。

The only thing about this design is that it doesn't protect you against making the mistake of passing a string instead of a list. For example if you accidentally convert a string to a list and end up checking against all the characters of a string, you could end up getting a slew of false positives.

这个设计的唯一特点是它不能防止您犯传递字符串而不是列表的错误。例如,如果您不小心将一个字符串转换为一个列表,并最终检查一个字符串的所有字符,您可能会得到大量的假阳性结果。

But it's better to have a problem that's easy to fix than a solution that's hard to understand.

但是有一个问题比一个难以理解的解决方案更容易解决。

#4


24  

Another option:

另一个选择:

>>> import os, fnmatch
>>> fnmatch.filter(os.listdir('.'), '*.py')
['manage.py']

https://docs.python.org/3/library/fnmatch.html

https://docs.python.org/3/library/fnmatch.html

#5


8  

use os.walk to recursively list your files

使用操作系统。递归地列出文件

import os
root = "/home"
pattern = "145992"
alist_filter = ['jpg','bmp','png','gif'] 
path=os.path.join(root,"mydir_to_scan")
for r,d,f in os.walk(path):
    for file in f:
        if file[-3:] in alist_filter and pattern in file:
            print os.path.join(root,file)

#6


4  

Preliminary code

初步的代码

import glob
import fnmatch
import pathlib
import os

pattern = '*.py'
path = '.'

Solution 1 - use "glob"

解决方案1 -使用“glob”

# lookup in current dir
glob.glob(pattern)

In [2]: glob.glob(pattern)
Out[2]: ['wsgi.py', 'manage.py', 'tasks.py']

Solution 2 - use "os" + "fnmatch"

解决方案2 -使用“os”+“fnmatch”

Variant 2.1 - Lookup in current dir

变体2.1 -在当前目录中查找

# lookup in current dir
fnmatch.filter(os.listdir(path), pattern)

In [3]: fnmatch.filter(os.listdir(path), pattern)
Out[3]: ['wsgi.py', 'manage.py', 'tasks.py']

Variant 2.2 - Lookup recursive

变体2.2 -查找递归

# lookup recursive
for dirpath, dirnames, filenames in os.walk(path):

    if not filenames:
        continue

    pythonic_files = fnmatch.filter(filenames, pattern)
    if pythonic_files:
        for file in pythonic_files:
            print('{}/{}'.format(dirpath, file))

Result

结果

./wsgi.py
./manage.py
./tasks.py
./temp/temp.py
./apps/diaries/urls.py
./apps/diaries/signals.py
./apps/diaries/actions.py
./apps/diaries/querysets.py
./apps/library/tests/test_forms.py
./apps/library/migrations/0001_initial.py
./apps/polls/views.py
./apps/polls/formsets.py
./apps/polls/reports.py
./apps/polls/admin.py

Solution 3 - use "pathlib"

解决方案3 -使用“pathlib”

# lookup in current dir
path_ = pathlib.Path('.')
tuple(path_.glob(pattern))

# lookup recursive
tuple(path_.rglob(pattern))

Notes:

注:

  1. Tested on the Python 3.4
  2. 在Python 3.4中测试
  3. The module "pathlib" was added only in the Python 3.4
  4. 模块“pathlib”仅在Python 3.4中添加
  5. The Python 3.5 added a feature for recursive lookup with glob.glob https://docs.python.org/3.5/library/glob.html#glob.glob. Since my machine is installed with Python 3.4, I have not tested that.
  6. Python 3.5为使用glob的递归查找添加了一个特性。水珠https://docs.python.org/3.5/library/glob.html glob.glob。由于我的机器安装了Python 3.4,所以我还没有对它进行测试。

#7


1  

you might also like a more high-level approach (I have implemented and packaged as findtools):

您可能还喜欢更高级的方法(我已经实现并打包为findtools):

from findtools.find_files import (find_files, Match)


# Recursively find all *.txt files in **/home/**
txt_files_pattern = Match(filetype='f', name='*.txt')
found_files = find_files(path='/home', match=txt_files_pattern)

for found_file in found_files:
    print found_file

can be installed with

可以安装在

pip install findtools

#8


1  

import os

dir="/path/to/dir"
[x[0]+"/"+f for x in os.walk(dir) for f in x[2] if f.endswith(".jpg")]

This will give you a list of jpg files with their full path. You can replace x[0]+"/"+f with f for just filenames. You can also replace f.endswith(".jpg") with whatever string condition you wish.

这将为您提供一个完整路径的jpg文件列表。你可以用f替换x[0]+"/"+f,只用于文件名。您也可以用您希望的任何字符串条件替换f.endswith(“.jpg”)。

#9


0  

You can use subprocess.check_ouput() as

您可以使用subprocess.check_ouput() as

import subprocess

list_files = subprocess.check_output("ls 145992*.jpg", shell=True) 

Of course, the string between quotes can be anything you want to execute in the shell, and store the output.

当然,引号之间的字符串可以是您想要在shell中执行的任何内容,并存储输出。

#10


0  

Filenames with "jpg" and "png" extensions in "path/to/images":

在“路径/到/图像”中有“jpg”和“png”扩展名的档名:

import os
accepted_extensions = ["jpg", "png"]
filenames = [fn for fn in os.listdir("path/to/images") if fn.split(".")[-1] in accepted_extensions]

#1


249  

glob.glob('145592*.jpg')

glob.glob(145592 * . jpg)

#2


91  

glob.glob() is definitely the way to do it (as per Ignacio). However, if you do need more complicated matching, you can do it with a list comprehension and re.match(), something like so:

glob.glob()肯定是实现它的方法(根据Ignacio)。但是,如果您确实需要更复杂的匹配,那么您可以使用列表理解和re.match()来完成,如下所示:

files = [f for f in os.listdir('.') if re.match(r'[0-9]+.*\.jpg', f)]

More flexible, but as you note, less efficient.

更灵活,但正如你所注意到的,效率更低。

#3


27  

Keep it simple:

保持简单:

import os
relevant_path = "[path to folder]"
included_extenstions = ['jpg', 'bmp', 'png', 'gif']
file_names = [fn for fn in os.listdir(relevant_path)
              if any(fn.endswith(ext) for ext in included_extensions)]

I prefer this form of list comprehensions because it reads well in English.

我喜欢这种形式的列表理解,因为它在英语中读起来很好。

I read the fourth line as: For each fn in os.listdir for my path, give me only the ones that match any one of my included extensions.

我把第4行读为:对于os中的每个fn。对于路径的listdir,只提供与所包含的任何扩展匹配的。

It may be hard for novice python programmers to really get used to using list comprehensions for filtering, and it can have some memory overhead for very large data sets, but for listing a directory and other simple string filtering tasks, list comprehensions lead to more clean documentable code.

对于新手python程序员来说,要真正习惯于使用列表理解来进行过滤是很困难的,而且对于非常大的数据集来说,它可能会有一些内存开销,但是对于列出一个目录和其他简单的字符串过滤任务,列表理解会导致更清晰的可文档化代码。

The only thing about this design is that it doesn't protect you against making the mistake of passing a string instead of a list. For example if you accidentally convert a string to a list and end up checking against all the characters of a string, you could end up getting a slew of false positives.

这个设计的唯一特点是它不能防止您犯传递字符串而不是列表的错误。例如,如果您不小心将一个字符串转换为一个列表,并最终检查一个字符串的所有字符,您可能会得到大量的假阳性结果。

But it's better to have a problem that's easy to fix than a solution that's hard to understand.

但是有一个问题比一个难以理解的解决方案更容易解决。

#4


24  

Another option:

另一个选择:

>>> import os, fnmatch
>>> fnmatch.filter(os.listdir('.'), '*.py')
['manage.py']

https://docs.python.org/3/library/fnmatch.html

https://docs.python.org/3/library/fnmatch.html

#5


8  

use os.walk to recursively list your files

使用操作系统。递归地列出文件

import os
root = "/home"
pattern = "145992"
alist_filter = ['jpg','bmp','png','gif'] 
path=os.path.join(root,"mydir_to_scan")
for r,d,f in os.walk(path):
    for file in f:
        if file[-3:] in alist_filter and pattern in file:
            print os.path.join(root,file)

#6


4  

Preliminary code

初步的代码

import glob
import fnmatch
import pathlib
import os

pattern = '*.py'
path = '.'

Solution 1 - use "glob"

解决方案1 -使用“glob”

# lookup in current dir
glob.glob(pattern)

In [2]: glob.glob(pattern)
Out[2]: ['wsgi.py', 'manage.py', 'tasks.py']

Solution 2 - use "os" + "fnmatch"

解决方案2 -使用“os”+“fnmatch”

Variant 2.1 - Lookup in current dir

变体2.1 -在当前目录中查找

# lookup in current dir
fnmatch.filter(os.listdir(path), pattern)

In [3]: fnmatch.filter(os.listdir(path), pattern)
Out[3]: ['wsgi.py', 'manage.py', 'tasks.py']

Variant 2.2 - Lookup recursive

变体2.2 -查找递归

# lookup recursive
for dirpath, dirnames, filenames in os.walk(path):

    if not filenames:
        continue

    pythonic_files = fnmatch.filter(filenames, pattern)
    if pythonic_files:
        for file in pythonic_files:
            print('{}/{}'.format(dirpath, file))

Result

结果

./wsgi.py
./manage.py
./tasks.py
./temp/temp.py
./apps/diaries/urls.py
./apps/diaries/signals.py
./apps/diaries/actions.py
./apps/diaries/querysets.py
./apps/library/tests/test_forms.py
./apps/library/migrations/0001_initial.py
./apps/polls/views.py
./apps/polls/formsets.py
./apps/polls/reports.py
./apps/polls/admin.py

Solution 3 - use "pathlib"

解决方案3 -使用“pathlib”

# lookup in current dir
path_ = pathlib.Path('.')
tuple(path_.glob(pattern))

# lookup recursive
tuple(path_.rglob(pattern))

Notes:

注:

  1. Tested on the Python 3.4
  2. 在Python 3.4中测试
  3. The module "pathlib" was added only in the Python 3.4
  4. 模块“pathlib”仅在Python 3.4中添加
  5. The Python 3.5 added a feature for recursive lookup with glob.glob https://docs.python.org/3.5/library/glob.html#glob.glob. Since my machine is installed with Python 3.4, I have not tested that.
  6. Python 3.5为使用glob的递归查找添加了一个特性。水珠https://docs.python.org/3.5/library/glob.html glob.glob。由于我的机器安装了Python 3.4,所以我还没有对它进行测试。

#7


1  

you might also like a more high-level approach (I have implemented and packaged as findtools):

您可能还喜欢更高级的方法(我已经实现并打包为findtools):

from findtools.find_files import (find_files, Match)


# Recursively find all *.txt files in **/home/**
txt_files_pattern = Match(filetype='f', name='*.txt')
found_files = find_files(path='/home', match=txt_files_pattern)

for found_file in found_files:
    print found_file

can be installed with

可以安装在

pip install findtools

#8


1  

import os

dir="/path/to/dir"
[x[0]+"/"+f for x in os.walk(dir) for f in x[2] if f.endswith(".jpg")]

This will give you a list of jpg files with their full path. You can replace x[0]+"/"+f with f for just filenames. You can also replace f.endswith(".jpg") with whatever string condition you wish.

这将为您提供一个完整路径的jpg文件列表。你可以用f替换x[0]+"/"+f,只用于文件名。您也可以用您希望的任何字符串条件替换f.endswith(“.jpg”)。

#9


0  

You can use subprocess.check_ouput() as

您可以使用subprocess.check_ouput() as

import subprocess

list_files = subprocess.check_output("ls 145992*.jpg", shell=True) 

Of course, the string between quotes can be anything you want to execute in the shell, and store the output.

当然,引号之间的字符串可以是您想要在shell中执行的任何内容,并存储输出。

#10


0  

Filenames with "jpg" and "png" extensions in "path/to/images":

在“路径/到/图像”中有“jpg”和“png”扩展名的档名:

import os
accepted_extensions = ["jpg", "png"]
filenames = [fn for fn in os.listdir("path/to/images") if fn.split(".")[-1] in accepted_extensions]