递归地删除目录,忽略所有二进制文件:

时间:2022-01-05 10:06:09

Working on a Fedora Constantine box. I am looking to diff two directories recursively to check for source changes. Due to the setup of the project (prior to my own engagement with said project! sigh), the directories contain both source and binaries, as well as large binary datasets. While diffing eventually works on these directories, it would take perhaps twenty seconds if I could ignore the binary files.

在Fedora Constantine箱上工作。我想递归地查找两个目录以检查源代码的更改。由于项目的设置(在我自己参与这个项目之前!叹气),目录包含源和二进制文件,以及大型二进制数据集。尽管diffing最终适用于这些目录,但如果我可以忽略二进制文件的话,可能需要20秒。

As far as I understand, diff does not have an 'ignore binary file' mode, but does have an ignore argument which will ignore regular expression within a file. I don't know what to write there to ignore binary files, regardless of extension.

据我所知,diff没有“忽略二进制文件”模式,但是有一个忽略参数,它会忽略文件中的正则表达式。我不知道该写什么来忽略二进制文件,不管扩展是什么。

I'm using the following command, but it does not ignore binary files. Does anyone know how to modify this command to do this?

我正在使用以下命令,但它不会忽略二进制文件。有人知道如何修改此命令来完成此操作吗?

diff -rq dir1 dir2

diff rq dir1 dir2

6 个解决方案

#1


31  

Maybe use grep -I (which is equivalent to grep --binary-files=without-match) as a filter to sort out binary files.

也许可以使用grep -I(相当于grep -二进制文件=不匹配)作为过滤器来对二进制文件进行分类。

dir1='folder-1'
dir2='folder-2'
IFS=$'\n'
for file in $(grep -Ilsr -m 1 '.' "$dir1"); do
   diff -q "$file" "${file/${dir1}/${dir2}}"
done

#2


54  

Kind of cheating but here's what I used:

有点作弊,但我用的是:

diff -r dir1/ dir2/ | sed '/Binary\ files\ /d' >outputfile

This recursively compares dir1 to dir2, sed removes the lines for binary files(begins with "Binary files "), then it's redirected to the outputfile.

这递归地将dir1与dir2进行比较,sed删除二进制文件的行(以“二进制文件”开头),然后将其重定向到outputfile。

#3


11  

I came to this (old) question looking for something similar (Config files on a legacy production server compared to default apache installation). Following @fearlesstost's suggestion in the comments, git is sufficiently lightweight and fast that it's probably more straightforward than any of the above suggestions. Copy version1 to a new directory. Then do:

我遇到了这个(旧的)问题,寻找类似的东西(与默认的apache安装相比,遗留产品服务器上的配置文件)。遵循@fearlesstost在评论中的建议,git足够轻量级且快速,它可能比上面的任何建议都更直接。将version1复制到一个新目录。然后做:

git init
git add .
git commit -m 'Version 1'

Now delete all the files from version 1 in this directory and copy version 2 into the directory. Now do:

现在从这个目录中删除版本1中的所有文件,并将版本2复制到该目录中。现在做的事:

git add .
git commit -m 'Version 2'
git show

This will show you Git's version of all the differences between the first commit and the second. For binary files it will just say that they differ. Alternatively, you could create a branch for each version and try to merge them using git's merge tools.

这将向您展示Git版本中第一个提交和第二个提交之间的所有差异。对于二进制文件,它只会说它们不同。或者,您可以为每个版本创建一个分支,并尝试使用git的merge工具合并它们。

#4


1  

If the names of the binary files in your project follow a specific pattern (*.o, *.so, ...), as they usually do, you can put those patterns in a file and specify it using -X (hyphen X).

如果项目中的二进制文件的名称遵循特定的模式(*)。啊,*。因此,…),正如它们通常所做的,您可以将这些模式放在一个文件中,并使用-X(连字符X)指定它。

contents of my "exclude file" *.o *.so *.git

我的“排除文件”的内容*。o *。所以* .

diff -X exclude_file -r . other_tree > my_diff_file

#5


0  

Use a combination of find and the file command. This requires you to do some research on the output of the file command in your directory; below I'm assuming that the files you want to diff is reported as ascii. OR, use grep -v to filter out the binary files.

使用find和file命令的组合。这需要您对目录中的文件命令的输出进行一些研究;下面我假设您想要diff的文件被报告为ascii。或者,使用grep -v过滤二进制文件。

#!/bin/bash

dir1=/path/to/first/folder
dir2=/path/to/second/folder

cd $dir1
files=$(find . -type f -print | xargs file | grep ASCII | cut -d: -f1)

for i in $files;
do
    echo diffing $i ---- $dir2/$i
    diff -q $i $dir2/$i
done

Since you probably know the names of the huge binaries, place them in a hash-array and only do the diff when a file is not in the hash,something like this:

由于您可能知道大型二进制文件的名称,因此将它们放在hash-array中,并且只在文件不在散列中时执行diff,类似如下:

#!/bin/bash

dir1=/path/to/first/directory
dir2=/path/to/second/directory

content_dir1=$(mktemp)
content_dir2=$(mktemp)

$(cd $dir1 && find . -type f -print > $content_dir1)
$(cd $dir2 && find . -type f -print > $content_dir2)

echo Files that only exist in one of the paths
echo -----------------------------------------
diff $content_dir1 $content_dir2    

#Files 2 Ignore
declare -A F2I
F2I=( [sqlite3]=1 [binfile2]=1 )

while read f;
do
    b=$(basename $f)
    if ! [[ ${F2I[$b]} ]]; then
        diff $dir1/$f $dir2/$f
    fi
done < $content_dir1

#6


0  

Well, as a crude sort of check, you could ignore files that match /\0/.

作为一种粗略的检查,您可以忽略匹配/\0/的文件。

#1


31  

Maybe use grep -I (which is equivalent to grep --binary-files=without-match) as a filter to sort out binary files.

也许可以使用grep -I(相当于grep -二进制文件=不匹配)作为过滤器来对二进制文件进行分类。

dir1='folder-1'
dir2='folder-2'
IFS=$'\n'
for file in $(grep -Ilsr -m 1 '.' "$dir1"); do
   diff -q "$file" "${file/${dir1}/${dir2}}"
done

#2


54  

Kind of cheating but here's what I used:

有点作弊,但我用的是:

diff -r dir1/ dir2/ | sed '/Binary\ files\ /d' >outputfile

This recursively compares dir1 to dir2, sed removes the lines for binary files(begins with "Binary files "), then it's redirected to the outputfile.

这递归地将dir1与dir2进行比较,sed删除二进制文件的行(以“二进制文件”开头),然后将其重定向到outputfile。

#3


11  

I came to this (old) question looking for something similar (Config files on a legacy production server compared to default apache installation). Following @fearlesstost's suggestion in the comments, git is sufficiently lightweight and fast that it's probably more straightforward than any of the above suggestions. Copy version1 to a new directory. Then do:

我遇到了这个(旧的)问题,寻找类似的东西(与默认的apache安装相比,遗留产品服务器上的配置文件)。遵循@fearlesstost在评论中的建议,git足够轻量级且快速,它可能比上面的任何建议都更直接。将version1复制到一个新目录。然后做:

git init
git add .
git commit -m 'Version 1'

Now delete all the files from version 1 in this directory and copy version 2 into the directory. Now do:

现在从这个目录中删除版本1中的所有文件,并将版本2复制到该目录中。现在做的事:

git add .
git commit -m 'Version 2'
git show

This will show you Git's version of all the differences between the first commit and the second. For binary files it will just say that they differ. Alternatively, you could create a branch for each version and try to merge them using git's merge tools.

这将向您展示Git版本中第一个提交和第二个提交之间的所有差异。对于二进制文件,它只会说它们不同。或者,您可以为每个版本创建一个分支,并尝试使用git的merge工具合并它们。

#4


1  

If the names of the binary files in your project follow a specific pattern (*.o, *.so, ...), as they usually do, you can put those patterns in a file and specify it using -X (hyphen X).

如果项目中的二进制文件的名称遵循特定的模式(*)。啊,*。因此,…),正如它们通常所做的,您可以将这些模式放在一个文件中,并使用-X(连字符X)指定它。

contents of my "exclude file" *.o *.so *.git

我的“排除文件”的内容*。o *。所以* .

diff -X exclude_file -r . other_tree > my_diff_file

#5


0  

Use a combination of find and the file command. This requires you to do some research on the output of the file command in your directory; below I'm assuming that the files you want to diff is reported as ascii. OR, use grep -v to filter out the binary files.

使用find和file命令的组合。这需要您对目录中的文件命令的输出进行一些研究;下面我假设您想要diff的文件被报告为ascii。或者,使用grep -v过滤二进制文件。

#!/bin/bash

dir1=/path/to/first/folder
dir2=/path/to/second/folder

cd $dir1
files=$(find . -type f -print | xargs file | grep ASCII | cut -d: -f1)

for i in $files;
do
    echo diffing $i ---- $dir2/$i
    diff -q $i $dir2/$i
done

Since you probably know the names of the huge binaries, place them in a hash-array and only do the diff when a file is not in the hash,something like this:

由于您可能知道大型二进制文件的名称,因此将它们放在hash-array中,并且只在文件不在散列中时执行diff,类似如下:

#!/bin/bash

dir1=/path/to/first/directory
dir2=/path/to/second/directory

content_dir1=$(mktemp)
content_dir2=$(mktemp)

$(cd $dir1 && find . -type f -print > $content_dir1)
$(cd $dir2 && find . -type f -print > $content_dir2)

echo Files that only exist in one of the paths
echo -----------------------------------------
diff $content_dir1 $content_dir2    

#Files 2 Ignore
declare -A F2I
F2I=( [sqlite3]=1 [binfile2]=1 )

while read f;
do
    b=$(basename $f)
    if ! [[ ${F2I[$b]} ]]; then
        diff $dir1/$f $dir2/$f
    fi
done < $content_dir1

#6


0  

Well, as a crude sort of check, you could ignore files that match /\0/.

作为一种粗略的检查,您可以忽略匹配/\0/的文件。