UnicodeDecodeError:“utf-8”编解码器在第1位无法解码字节0xe3:连续字节无效

时间:2023-01-05 09:24:00

I want to convert a byte variable to string. Of course, there are previous questions related to mine. However, trying to hash in md5() the content of a file this way:

我想把一个字节变量转换成字符串。当然,之前也有和我有关的问题。但是,尝试以这种方式在md5()中哈希文件的内容:

import hashlib
with open("C:\\boot.ini","r") as f:
    r=f.read()
a=hashlib.md5()
a.update(r.encode('utf8'))
bytes_data=a.digest()
print(bytes_data)
r=type(bytes_data)
print(r) # <-- Just to be sure, it is in bytes 
myString=bytes_data.decode(encoding='UTF-8')

I got this error:

我得到这个错误:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe3 in position 1: invalid continuation byte

I understand the reason of my problem thanks to this question, however I am dealing with different files to calculate their hash, so I have no control on the bytes, so how can I resolve this problem ?

由于这个问题,我理解了问题的原因,但是我正在处理不同的文件来计算它们的散列,所以我无法控制字节,所以我如何解决这个问题?

1 个解决方案

#1


8  

The hash.digest() return value is not a UTF-8-encoded string. Don't try to decode it; it is a sequence of bytes in the range 0-255 and these bytes do not represent text.

digest()返回值不是utf -8编码的字符串。不要试图解码它;它是在0-255范围内的字节序列,这些字节不表示文本。

Not all bytes contents encode text; this is one such value.

并非所有字节内容都对文本进行编码;这是一个这样的值。

Use hash.hexdigest() if you want something printable instead. This method returns the bytes expressed as hexadecimal numbers instead (two hex characters per digest byte). This is the commonly used form when sharing a MD5 digest with others.

如果您想要可打印的内容,请使用hash.hexdigest()。该方法返回以十六进制数字表示的字节(每个摘要字节有两个十六进制字符)。这是与他人共享MD5摘要时常用的形式。

#1


8  

The hash.digest() return value is not a UTF-8-encoded string. Don't try to decode it; it is a sequence of bytes in the range 0-255 and these bytes do not represent text.

digest()返回值不是utf -8编码的字符串。不要试图解码它;它是在0-255范围内的字节序列,这些字节不表示文本。

Not all bytes contents encode text; this is one such value.

并非所有字节内容都对文本进行编码;这是一个这样的值。

Use hash.hexdigest() if you want something printable instead. This method returns the bytes expressed as hexadecimal numbers instead (two hex characters per digest byte). This is the commonly used form when sharing a MD5 digest with others.

如果您想要可打印的内容,请使用hash.hexdigest()。该方法返回以十六进制数字表示的字节(每个摘要字节有两个十六进制字符)。这是与他人共享MD5摘要时常用的形式。