如何使用grep和regex查找目录中的所有文件?

时间:2023-01-14 23:30:28

I have a Directory(Linux/Unix) on a Apache Server with a lot of subdirectory containing lot of files like this:

我在Apache服务器上有一个目录(Linux / Unix),其中有很多子目录,包含很多这样的文件:

- Dir  
  - 2010_01/
    - 142_78596_101_322.pdf
    - 12_10.pdf
    - ...
  - 2010_02/   
    - ...

How can i find all files with filesnames looking like: *_*_*_*.pdf ? where * is always a digit!!

我怎样才能找到文件名看起来像的所有文件:* _ * _ * _ * .pdf?其中*总是一个数字!!

I try to solve it like this:

我试着像这样解决它:

ls -1Rl 2010-01 | grep -i '\(\d)+[_](\d)+[_](\d)+[_](\d)+[.](pdf)$' | wc -l

But the regular expression \(\d)+[_](\d)+[_](\d)+[_](\d)+[.](pdf)$ doesn't work with grep.

但正则表达式\(\ d)+ [_](\ d)+ [_](\ d)+ [_](\ d)+ [。](pdf)$不适用于grep。

Edit 1: Trying ls -l 2010-03 | grep -E '(\d+_){3}\d+\.pdf' | wc -l for example just return null. So it's dont work perfectly

编辑1:尝试ls -l 2010-03 | grep -E'(\ d + _){3} \ d + \ .pdf'| wc -l例如只返回null。所以它不能完美地运作

3 个解决方案

#1


3  

Try using find.

尝试使用find。

The command that satisfies your specification __*_*.pdf where * is always a digit:

满足您的规范的命令__ * _ * .pdf其中*始终为数字:

find 2010_10/ -regex '__\d+_\d+\.pdf'

You seem to be wanting a sequence of 4 numbers separated by underscores, however, based on the regex that you tried.

您似乎想要一个由下划线分隔的4个数字的序列,但是,基于您尝试的正则表达式。

(\d+_){3}\d+\.pdf

Or do you want to match all names containing solely numbers/underscores?

或者您想匹配仅包含数字/下划线的所有名称?

[\d_]+\.pdf

#2


1  

First, you should be using egrep vs grep or call grep with -E for extended patterns.

首先,您应该使用egrep vs grep或使用-E调用grep来扩展模式。

So this works for me:

所以这对我有用:

$ cat test2.txt
- Dir  
  - 2010_01/
    - 142_78596_101_322.pdf
    - 12_10.pdf
    - ...
  - 2010_02/   
    - ...

Now egrep that file:

现在egrep那个文件:

cat test2.txt | egrep '((?:\d+_){3}(?:\d+)\.pdf$)'
- 142_78596_101_322.pdf

Since there are parenthesis around the whole pattern, the entire file name will be captured.

由于整个模式周围有括号,因此将捕获整个文件名。

Note that the pattern does NOT work with grep in traditional mode:

请注意,在传统模式下,模式不适用于grep:

$ cat test2.txt | grep '((?:\d+_){3}(?:\d+)\.pdf$)'
... no return

But DOES work if you use the extend pattern switch (the same as calling egrep):

但是,如果使用扩展模式开关(与调用egrep相同),则可以工作:

$ cat test2.txt | grep -E '((?:\d+_){3}(?:\d+)\.pdf$)'
- 142_78596_101_322.pdf 

#3


0  

Thanks to gbchaosmaster and the wolf I find a way which work for me:

感谢gbchaosmaster和狼我找到了适合我的方法:

Into a Directory:

进入目录:

find . | grep -P "(\d+_){3}\d+\.pdf" | wc -l

At the Root Directory:

在根目录:

find 20*/ | grep -P "(\d+_){3}\d+\.pdf" | wc -l

#1


3  

Try using find.

尝试使用find。

The command that satisfies your specification __*_*.pdf where * is always a digit:

满足您的规范的命令__ * _ * .pdf其中*始终为数字:

find 2010_10/ -regex '__\d+_\d+\.pdf'

You seem to be wanting a sequence of 4 numbers separated by underscores, however, based on the regex that you tried.

您似乎想要一个由下划线分隔的4个数字的序列,但是,基于您尝试的正则表达式。

(\d+_){3}\d+\.pdf

Or do you want to match all names containing solely numbers/underscores?

或者您想匹配仅包含数字/下划线的所有名称?

[\d_]+\.pdf

#2


1  

First, you should be using egrep vs grep or call grep with -E for extended patterns.

首先,您应该使用egrep vs grep或使用-E调用grep来扩展模式。

So this works for me:

所以这对我有用:

$ cat test2.txt
- Dir  
  - 2010_01/
    - 142_78596_101_322.pdf
    - 12_10.pdf
    - ...
  - 2010_02/   
    - ...

Now egrep that file:

现在egrep那个文件:

cat test2.txt | egrep '((?:\d+_){3}(?:\d+)\.pdf$)'
- 142_78596_101_322.pdf

Since there are parenthesis around the whole pattern, the entire file name will be captured.

由于整个模式周围有括号,因此将捕获整个文件名。

Note that the pattern does NOT work with grep in traditional mode:

请注意,在传统模式下,模式不适用于grep:

$ cat test2.txt | grep '((?:\d+_){3}(?:\d+)\.pdf$)'
... no return

But DOES work if you use the extend pattern switch (the same as calling egrep):

但是,如果使用扩展模式开关(与调用egrep相同),则可以工作:

$ cat test2.txt | grep -E '((?:\d+_){3}(?:\d+)\.pdf$)'
- 142_78596_101_322.pdf 

#3


0  

Thanks to gbchaosmaster and the wolf I find a way which work for me:

感谢gbchaosmaster和狼我找到了适合我的方法:

Into a Directory:

进入目录:

find . | grep -P "(\d+_){3}\d+\.pdf" | wc -l

At the Root Directory:

在根目录:

find 20*/ | grep -P "(\d+_){3}\d+\.pdf" | wc -l