文章/答案/技术大牛

发布

社区首页 >问答首页 >求整数阵hamming距离的最快方法

问求整数阵hamming距离的最快方法
EN

Stack Overflow用户

提问于 2016-11-29 20:34:20

回答 4查看 8.5K关注 0票数 10

设a和b是具有8位整数(0-255)的相同大小的向量.我想要计算那些向量不同的位数，也就是那些数字的二进制表示形式的连接所形成的向量之间的Hamming距离。例如：

a = [127,255]
b= [127,240]

使用numpy库

np.bitwise_xor(a,b)
# Output: array([ 0, 15])

我现在需要的是二进制表示上述数组的每个元素，并在数组的所有元素中计数1s。上面的例子将给出0+4 = 4的hamming距离。在Python中，有快速而优雅的解决方案吗？

python

numpy

binary

xor

hamming-distance

回答 4

Stack Overflow用户

回答已采纳

发布于 2016-11-29 21:01:27

方法1 :我们可以将它们广播成二进制比特&计数不同位数，如-

def hamming_distance(a, b):
    r = (1 << np.arange(8))[:,None]
    return np.count_nonzero( (a & r) != (b & r) )

样本运行-

In [144]: a = [127,255]
     ...: b = [127,240]
     ...: 

In [145]: hamming_distance(a, b)
Out[145]: 4

方法2 :使用bitwise-xor操作，我们可以找到a和b之间的不同二进制位数-

def hamming_distance_v2(a, b):
    r = (1 << np.arange(8))[:,None]
    return np.count_nonzero((np.bitwise_xor(a,b) & r) != 0)

票数 12

Stack Overflow用户

发布于 2016-11-29 22:16:22

如果要在执行程序时多次调用距离函数，则可以使用预先计算的位计数表获得一定的速度。以下是Hamming距离函数的另一个版本：

# _nbits[k] is the number of 1s in the binary representation of k for 0 <= k < 256.
_nbits = np.array(
      [0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4, 1, 2, 2, 3, 2, 3, 3,
       4, 2, 3, 3, 4, 3, 4, 4, 5, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4,
       4, 5, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 1, 2, 2, 3, 2,
       3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5,
       4, 5, 5, 6, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4,
       5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3,
       3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 2,
       3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6,
       4, 5, 5, 6, 5, 6, 6, 7, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5,
       6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 3, 4, 4, 5, 4, 5,
       5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6,
       7, 7, 8], dtype=np.uint8)


def hamming_distance1(a, b):
    c = np.bitwise_xor(a, b)
    n = _nbits[c].sum()
    return n

在下面，a和b是在对这个问题的注释中给出的32长度的Python。divakar_hamming_distance()和divakar_hamming_distance_v2()来自@Divakar的答案。

以下是@Divakar功能的时间安排：

In [116]: %timeit divakar_hamming_distance(a, b)
The slowest run took 5.57 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 11.3 µs per loop

In [117]: %timeit divakar_hamming_distance_v2(a, b)
The slowest run took 5.35 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 10.3 µs per loop

hamming_distance1(a, b)的速度要快一点：

In [118]: %timeit hamming_distance1(a, b)
The slowest run took 6.04 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 7.42 µs per loop

在我的计算机上，初始化_nbits需要大约11个hamming_distance1，所以如果只调用函数一次，那么使用hamming_distance1就没有什么好处了。如果你把它称为三次或三次以上，那么它的性能就会有净增长。

如果输入已经是numpy数组，那么所有函数的速度都要快得多：

In [119]: aa = np.array(a)

In [120]: bb = np.array(b)

In [121]: %timeit divakar_hamming_distance_v2(aa, bb)
The slowest run took 8.22 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 5.72 µs per loop

In [122]: %timeit hamming_distance1(aa, bb)
The slowest run took 12.67 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 2.77 µs per loop

当然，如果你总是在计算汉明距离之前就这样做，那么转换的时间必须包括在整个计时中。但是，如果您编写了生成a和b的代码，以便更早地利用numpy，那么在计算Hamming距离时，它们可能已经成为numpy数组。

(我还试验了8位值之间的二维预计算汉明距离-形状为(256,256)的数组--但初始化成本较高，性能增益较小。)

票数 7

Stack Overflow用户

发布于 2016-11-29 20:50:00

也许不是最有效的方法，但最简单的imo是将您的ouptut数组转换为二进制形式的字符串，然后将转换回ints的所有字符的和.

import numpy as np

output = np.random.randint(0,63,10)
hamming = ['{:b}'.format(x).count('1') for x in output]

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/40875282

复制

相似问题

问求整数阵hamming距离的最快方法
EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问求整数阵hamming距离的最快方法EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问求整数阵hamming距离的最快方法
EN