两个文本文件之间的百分比差异

时间:2022-08-29 22:57:33

I know that I can use cmp, diff, etc to compare two files, but what I am looking for is a utility that gives me percentage difference between two files.

我知道我可以使用cmp、diff等来比较两个文件,但是我要寻找的是一个实用程序,它可以提供两个文件之间的百分比差异。

if there is no such utility, any algorithm would do fine too. I have read about fuzzy programming, but I have not quite understand it.

如果没有这样的效用,任何算法都可以。我读过关于模糊编程的书,但是我不太理解它。

3 个解决方案

#1


27  

You can use difflib.SequenceMatcher ratio method

您可以使用difflib。SequenceMatcher比率法

From the documentation:

从文档:

Return a measure of the sequences’ similarity as a float in the range [0, 1].

返回序列的相似性度量,作为范围内的浮点数[0,1]。

For example:

例如:

from difflib import SequenceMatcher
text1 = open(file1).read()
text2 = open(file2).read()
m = SequenceMatcher(None, text1, text2)
m.ratio()

#2


2  

It looks like Linux has a utility called dwdiff that can give percentage differences by using the "-s" flag

看起来Linux有一个叫做dwdiff的实用程序,它可以通过使用“-s”标志来给出百分比差异

http://www.softpanorama.org/Utilities/diff_tools.shtml

http://www.softpanorama.org/Utilities/diff_tools.shtml

#3


0  

Beyond Compare has very nice file difference statistics export to csv. Differences at line level are reported so it's nice to compare source code files.

除了比较有很好的文件差异统计导出到csv。行级的差异被报告,所以比较源代码文件是很好的。

#1


27  

You can use difflib.SequenceMatcher ratio method

您可以使用difflib。SequenceMatcher比率法

From the documentation:

从文档:

Return a measure of the sequences’ similarity as a float in the range [0, 1].

返回序列的相似性度量,作为范围内的浮点数[0,1]。

For example:

例如:

from difflib import SequenceMatcher
text1 = open(file1).read()
text2 = open(file2).read()
m = SequenceMatcher(None, text1, text2)
m.ratio()

#2


2  

It looks like Linux has a utility called dwdiff that can give percentage differences by using the "-s" flag

看起来Linux有一个叫做dwdiff的实用程序,它可以通过使用“-s”标志来给出百分比差异

http://www.softpanorama.org/Utilities/diff_tools.shtml

http://www.softpanorama.org/Utilities/diff_tools.shtml

#3


0  

Beyond Compare has very nice file difference statistics export to csv. Differences at line level are reported so it's nice to compare source code files.

除了比较有很好的文件差异统计导出到csv。行级的差异被报告,所以比较源代码文件是很好的。