utf-8编码解码器无法解码字节0x80。

I'm trying to download BVLC-trained model and I'm stuck with this error

我试着下载bvlc训练的模型，但我仍然坚持这个错误。

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 110: invalid start byte

I think it's because of the following function (complete code)

我认为这是因为下面的函数(完整的代码)

  # Closure-d function for checking SHA1.
  def model_checks_out(filename=model_filename, sha1=frontmatter['sha1']):
      with open(filename, 'r') as f:
          return hashlib.sha1(f.read()).hexdigest() == sha1

Any idea how to fix this?

你知道怎么解决这个问题吗?

3 个解决方案

#1

You are opening a file that is not UTF-8 encoded, while the default encoding for your system is set to UTF-8.

您正在打开一个不是UTF-8编码的文件，而您的系统的默认编码被设置为UTF-8。

Since you are calculating a SHA1 hash, you should read the data as binary instead. The hashlib functions require you pass in bytes:

由于您正在计算SHA1散列，所以应该将数据作为二进制来读取。hashlib函数要求您以字节为单位:

with open(filename, 'rb') as f:
    return hashlib.sha1(f.read()).hexdigest() == sha1

Note the addition of b in the file mode.

注意文件模式中添加了b。

See the open() documentation:

看到open()文档:

mode is an optional string that specifies the mode in which the file is opened. It defaults to 'r' which means open for reading in text mode. [...] In text mode, if encoding is not specified the encoding used is platform dependent: locale.getpreferredencoding(False) is called to get the current locale encoding. (For reading and writing raw bytes use binary mode and leave encoding unspecified.)

模式是一个可选的字符串，它指定打开文件的模式。它默认为“r”，这意味着在文本模式下阅读。[…在文本模式中，如果不指定编码，则使用的编码是平台依赖的:locale.getpreferredencoding(False)被调用来获取当前的语言环境编码。(用于读取和写入原始字节，使用二进制模式，并保留未指定的编码。)

and from the hashlib module documentation:

从hashlib模块文档中:

You can now feed this object with bytes-like objects (normally bytes) using the update() method.

现在，您可以使用update()方法来将这个对象与字节相似的对象(通常是字节)进行提要。

#2

You didn't specify to open the file in binary mode, so f.read() is trying to read the file as a UTF-8-encoded text file, which doesn't seem to be working. But since we take the hash of bytes, not of strings, it doesn't matter what the encoding is, or even whether the file is text at all: just open it, and then read it, as a binary file.

您没有指定以二进制模式打开文件，所以f.read()试图将该文件读为utf -8编码的文本文件，但这似乎并不起作用。但是，由于我们使用的是字节的散列，而不是字符串，所以不管编码是什么，甚至是文件是否为文本，都不重要:只要打开它，然后读取它，作为二进制文件。

>>> with open("test.h5.bz2","r") as f: print(hashlib.sha1(f.read()).hexdigest())
Traceback (most recent call last):
  File "<ipython-input-3-fdba09d5390b>", line 1, in <module>
    with open("test.h5.bz2","r") as f: print(hashlib.sha1(f.read()).hexdigest())
  File "/home/dsm/sys/pys/Python-3.5.1-bin/lib/python3.5/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb8 in position 10: invalid start byte

but

但

>>> with open("test.h5.bz2","rb") as f: print(hashlib.sha1(f.read()).hexdigest())
21bd89480061c80f347e34594e71c6943ca11325

#3

Since there is not a single hint in the documentation nor src code, I have no clue why, but using the b char (i guess for binary) totally works (tf-version: 1.1.0):

由于文档中没有任何提示，也没有src代码，所以我不知道为什么，但是使用b char(我猜是二进制)完全有效(tf-version: 1.1.0):

image_data = tf.gfile.FastGFile(filename, 'rb').read()

For more information, check out: gfile

要了解更多信息，请查看:gfile。

#1