通过Python 3修改非文本文件

I'm working on an encryption/decryption program, and I got it working on text files; however, I can not open any other formats. For example, if I do:

我正在研究加密/解密程序，我让它处理文本文件;但是，我无法打开任何其他格式。例如，如果我这样做：

a_file = open('C:\Images\image.png', 'r', encoding='utf-8')
for a_line in a_file:
    print(a_line)

I get:

我明白了：

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\WinPython-64bit-3.4.3.4\python-3.4.3.amd64\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 685, in runfile
execfile(filename, namespace)
File "C:\WinPython-64bit-3.4.3.4\python-3.4.3.amd64\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 85, in execfile
exec(compile(open(filename, 'rb').read(), filename, 'exec'), namespace)
File "C:/Comp_Sci/Coding/line_read_test.py", line 2, in <module>
for a_line in a_file:
File "C:\WinPython-64bit-3.4.3.4\python-3.4.3.amd64\lib\codecs.py", line 319, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte

What am I doing terribly wrong?

我做错了什么？

2 个解决方案

#1

Short version: You're opening binary files in text mode. Use 'rb' instead of 'r' (and drop the encoding parameter) and you'll be doing it right.

简短版本：您正在以文本模式打开二进制文件。使用'rb'代替'r'（并删除编码参数），你就会做得对。

Long version: Python 3 makes a very strict distinction between bytestrings and Unicode strings. The str type contains only Unicode strings; each character of a str is a single Unicode codepoint. The bytes type, on the other hand, represents a series of 8-bit values that do not necessarily correspond to text. E.g., a .PNG file should be loaded as a bytes object, not as a str object. By passing the encoding="utf-8" parameter to open(), you're telling Python that your file contains only valid UTF-8 text, which a .PNG obviously does not. Instead, you should be opening the file as a binary file with 'rb' and not using any encoding. Then you'll get bytes objects rather than str objects when you read the file, and you'll need to treat them differently.

长版本：Python 3对字节串和Unicode字符串进行了非常严格的区分。 str类型只包含Unicode字符串; str的每个字符都是一个Unicode代码点。另一方面，字节类型表示一系列不一定对应于文本的8位值。例如，.PNG文件应作为字节对象加载，而不是作为str对象加载。通过将encoding =“utf-8”参数传递给open（），您告诉Python您的文件只包含有效的UTF-8文本，而.PNG显然没有。相反，您应该使用'rb'打开文件作为二进制文件，而不是使用任何编码。然后，当您读取文件时，您将获得字节对象而不是str对象，并且您需要以不同方式对待它们。

I see that @ignacio-vazquez-abrams has already posted good sample code while I've been typing this answer, so I won't duplicate his efforts. His code is correct: use it and you'll be fine.

我看到@ ignacio-vazquez-abrams在我输入这个答案时已经发布了很好的示例代码，所以我不会重复他的努力。他的代码是正确的：使用它，你会没事的。

#2

You're opening it as a text file, and assuming that you can read lines and print anything from it meaningfully.

您将其作为文本文件打开，并假设您可以读取行并从中有意义地打印任何内容。

with open(r'C:\Images\image.png', 'rb') as a_file:
  while True:
    data = a_file.read(32)
    if not data:
      break
    print(data)

#1