文章/答案/技术大牛

发布

社区首页 >问答首页 >什么信息描述了两个相同大小的大型给定文件之间的数量差异？

问什么信息描述了两个相同大小的大型给定文件之间的数量差异？
EN

Stack Overflow用户

提问于 2011-10-20 17:51:29

回答 2查看 203关注 0票数 2

通常，为了找出两个二进制文件的不同之处，我会使用diff和hexdump工具。但在某些情况下，如果给定两个大小相同的大型二进制文件，我只希望看到它们的数量差异，如差异区域的数量，累积差异。

示例:2个文件A和B。它们有两个不同的区域，它们的累积差异是6c-a3 + 6c-11 + 6f-6e + 20-22。

File A = 48 65 6c 6c 6f 2c 20 57
File B = 48 65 a3 11 6e 2c 22 57
              |--------|  |--|
                 reg 1   reg 2

如何使用标准GNU工具和Bash获取这些信息，或者我应该更好地使用简单的Python脚本？关于两个文件的不同之处的其他统计数据也是有用的，但我不知道还有什么以及如何测量？熵差？方差差？

bash

statistics

hexdump

python

linux

回答 2

Stack Overflow用户

回答已采纳

发布于 2011-10-20 20:16:54

除了区域之外的所有东西，你都可以使用numpy。类似于以下内容(未经测试)：

import numpy as np
a = np.fromfile("file A", dtype="uint8")
b = np.fromfile("file B", dtype="uint8")

# Compute the number of bytes that are different
different_bytes = np.sum(a != b)

# Compute the sum of the differences
difference = np.sum(a - b)

# Compute the sum of the absolute value of the differences
absolute_difference = np.sum(np.abs(a - b))

# In some cases, the number of bits that have changed is a better
# measurement of change. To compute it we make a lookup array where 
# bitcount_lookup[byte] == number_of_1_bits_in_byte (so
# bitcount_lookup[0:16] == [0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4])
bitcount_lookup = np.array(
    [bin(i).count("1") for i in range(256)], dtype="uint8")

# Numpy allows using an array as an index. ^ computes the XOR of
# each pair of bytes. The result is a byte with a 1 bit where the
# bits of the input differed, and a 0 bit otherwise.
bit_diff_count = np.sum(bitcount_lookup[a ^ b])

我找不到一个用于计算区域的numpy函数，但是使用a != b作为输入就可以编写自己的函数，这应该不难。请参阅this问题以获取灵感。

票数 1

Stack Overflow用户

发布于 2011-10-20 21:20:01

浮现在脑海中的一种方法是在二进制差分算法上做一点修改。例如，a python implementation of the rsync algorithm。从这开始，应该可以相对容易地获得文件不同的块范围的列表，然后对这些块执行任何您想要的统计。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/7834123

复制

相似问题

问什么信息描述了两个相同大小的大型给定文件之间的数量差异？
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问什么信息描述了两个相同大小的大型给定文件之间的数量差异？EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问什么信息描述了两个相同大小的大型给定文件之间的数量差异？
EN