Mysql汉明距离十六进制值。

时间:2022-05-05 19:14:41

I have some hashes stored in mysql, which I would fetch with comparison by hamming distance.

我有一些散列存储在mysql中,我将通过汉明距离来获取。

Hashes stored are these:

散列存储这些:

qw 1 ffe71b001820a1fd 
qw 2 ffffb81c1c3838a0 
qw 3 fff8381c1c3e3828 
qw 4 fffa181c3c2e3920 
qw 5 fffa981c1c3e2820 
qw 6 ff5f1c38387c1c04 
qw 7 fff1e0c1c38387ef 
qw 8 fffa181c1c3e3820 
qw 9 fffa381c1c3e3828

I normally fetch like:

我通常获取:

SELECT product_id, HAMMING_DISTANCE(phash, 'phashfromuserinput') ;

But in mysql hamming distance is bitwise operator which I can do if strings were only numbers:

但在mysql中,汉明距离是位运算符如果字符串是数字,我可以这样做:

SELECT pagedata,BIT_COUNT(pagecontent^'$encrypted')searchengine WHERE pagecontent > 2 ; ")

It only works in integer (number) but my requirement is work with numbers and alphabets, for example:

它只适用于整数(数字),但我的要求是使用数字和字母,例如:

74898fababfbef46 and 95efabfeba752545

From my little research I know that first I have to convert field to binary and then use bitcount by using CAST or CONVERT like:

根据我的研究,我知道首先我必须将字段转换为二进制,然后使用位计数,使用CAST或convert like:

SELECT BIT_COUNT( CONV( hash, 2, 10 ) ^ 
0b0000000101100111111100011110000011100000111100011011111110011011 )

or

SELECT BIT_COUNT(CAST(hash AS BINARY)) FROM data;

This is ok as converting data to binary and using bitcount. Now question arises that varbinary characters/hashes stored in mysql already are alphanumeric and if I convert field to varbinary and bitcount then it will not work as stored hashes are not binary strings.

这可以将数据转换为二进制并使用位计数。现在问题出现了,存储在mysql中的varbinary字符/散列已经是字母数字,如果我将字段转换为varbinary和bitcount,那么它就不能工作,因为存储的散列不是二进制字符串。

What should I do?

我应该做什么?

I was refering as php hamming distance matching example of:

我指的是php汉明距离匹配的例子:

function HammingDistance($bin1, $bin2) {
    $a1 = str_split($bin1);
    $a2 = str_split($bin2);
    $dh = 0;
    for ($i = 0; $i < count($a1); $i++) 
        if($a1[$i] != $a2[$i]) $dh++;
    return $dh;
}

echo HammingDistance('10101010','01010101'); //returns 8

But I'm not understanding how to match with mysql and fetch, because I can't implement it in mysql.

但是我不知道如何匹配mysql和fetch,因为我无法在mysql中实现它。

1 个解决方案

#1


6  

Using the last two numbers as an example:

以最后两个数字为例:

SELECT BIT_COUNT( CAST(CONV('fffa181c1c3e3820', 16, 10) AS UNSIGNED) ^
                  CAST(CONV('fffa381c1c3e3828', 16, 10) AS UNSIGNED) ) ;
--> 2
  • The hashes are hex.
  • 散列值是十六进制。
  • The conversion needs to end up with BIGINT UNSIGNED.
  • 转换需要以未签名的BIGINT结束。

(If you had had MD5 (128-bit) or SHA1 (160-bit) hashes, we would have had to split them via SUBSTR(), Xor each pair, BIT_COUNT, then added the results.)

(如果您有MD5(128位)或SHA1(160位)散列,我们就必须通过SUBSTR()、Xor每对BIT_COUNT将它们分割开来,然后添加结果。)

Edit to use column name:

编辑使用列名:

SELECT BIT_COUNT( CAST(CONV( a.pagecontent , 16, 10) AS UNSIGNED) ^
                  CAST(CONV( b.pagecontent , 16, 10) AS UNSIGNED) ) ;

#1


6  

Using the last two numbers as an example:

以最后两个数字为例:

SELECT BIT_COUNT( CAST(CONV('fffa181c1c3e3820', 16, 10) AS UNSIGNED) ^
                  CAST(CONV('fffa381c1c3e3828', 16, 10) AS UNSIGNED) ) ;
--> 2
  • The hashes are hex.
  • 散列值是十六进制。
  • The conversion needs to end up with BIGINT UNSIGNED.
  • 转换需要以未签名的BIGINT结束。

(If you had had MD5 (128-bit) or SHA1 (160-bit) hashes, we would have had to split them via SUBSTR(), Xor each pair, BIT_COUNT, then added the results.)

(如果您有MD5(128位)或SHA1(160位)散列,我们就必须通过SUBSTR()、Xor每对BIT_COUNT将它们分割开来,然后添加结果。)

Edit to use column name:

编辑使用列名:

SELECT BIT_COUNT( CAST(CONV( a.pagecontent , 16, 10) AS UNSIGNED) ^
                  CAST(CONV( b.pagecontent , 16, 10) AS UNSIGNED) ) ;