如何查找以“.html”结尾但文件名中没有“.bin”的文件?

时间:2022-06-19 15:06:14

I have the following types of file names:

我有以下类型的文件名:

  1. One ends with .html:

    一个以.html结尾:

    l_scheduling_suite.temp.html
    
  2. Another type ends with .html but has .bin in its name:

    另一种类型以.html结尾但名称中包含.bin:

    l_scheduling_suite.temp.bin.html
    
  3. And a third ends with .bin:

    第三个以.bin结尾:

    l_scheduling_suite.temp.bin
    

The filename is arbitrary. It won't necessarily always have a temp before .html or .bin. I need to find all the files that comply with only the first format. I am piping to grep using the following regex to find the files, but I am not able to make it work:

文件名是任意的。它不一定总是在.html或.bin之前有一个临时值。我需要找到所有符合第一种格式的文件。我使用以下正则表达式来管道grep以查找文件,但我无法使其工作:

"(?=(\.html)$) (?=(?!\.bin))"

How should I use grep or find to get the right list of files?

我应该如何使用grep或find来获取正确的文件列表?

2 个解决方案

#1


1  

Try this:

find -type f | grep -P '^.*(?<!\.bin)\.html$'

This uses a negative lookbehind. Basically it means, get all names that end with .html, but then just make sure that .bin doesn't come before it.

这使用负面的背后隐藏。基本上它意味着,获取以.html结尾的所有名称,但只需确保.bin不会在它之前。

#2


1  

Use a Simple Glob Pattern

You're vastly overcomplicating the problem. All you need (based on your posted corpus) is:

你的问题非常复杂。您所需要的(基于您发布的语料库)是:

find . -name \*.temp.html

This will find all files that end with .temp.html. Your other examples wouldn't match because *.bin.html and *.temp.bin have no overlap with this glob pattern.

这将找到以.temp.html结尾的所有文件。您的其他示例不匹配,因为* .bin.html和* .temp.bin与此glob模式没有重叠。

Use Negated Globs

If your corpus was poorly chosen, and you're actually trying to match all files that end in .html but that don't include .bin anywhere in the name, then you can just use the find utility with a negated glob without resorting to regular expressions, pipes, extended shell globs, or other contortions. For example:

如果您的语料库选择不当,并且您实际上正在尝试匹配以.html结尾但在名称中不包含.bin的所有文件,那么您可以使用带有否定glob的find实用程序而无需求助于正则表达式,管道,扩展壳体或其他扭曲。例如:

find . -name '*.html' -not -name '*.bin*'

#1


1  

Try this:

find -type f | grep -P '^.*(?<!\.bin)\.html$'

This uses a negative lookbehind. Basically it means, get all names that end with .html, but then just make sure that .bin doesn't come before it.

这使用负面的背后隐藏。基本上它意味着,获取以.html结尾的所有名称,但只需确保.bin不会在它之前。

#2


1  

Use a Simple Glob Pattern

You're vastly overcomplicating the problem. All you need (based on your posted corpus) is:

你的问题非常复杂。您所需要的(基于您发布的语料库)是:

find . -name \*.temp.html

This will find all files that end with .temp.html. Your other examples wouldn't match because *.bin.html and *.temp.bin have no overlap with this glob pattern.

这将找到以.temp.html结尾的所有文件。您的其他示例不匹配,因为* .bin.html和* .temp.bin与此glob模式没有重叠。

Use Negated Globs

If your corpus was poorly chosen, and you're actually trying to match all files that end in .html but that don't include .bin anywhere in the name, then you can just use the find utility with a negated glob without resorting to regular expressions, pipes, extended shell globs, or other contortions. For example:

如果您的语料库选择不当,并且您实际上正在尝试匹配以.html结尾但在名称中不包含.bin的所有文件,那么您可以使用带有否定glob的find实用程序而无需求助于正则表达式,管道,扩展壳体或其他扭曲。例如:

find . -name '*.html' -not -name '*.bin*'