Python pickle:在加载前修复\r字符。

I got a pickled object (a list with a few numpy arrays in it) that was created on Windows and apparently saved to a file loaded as text, not in binary mode (ie. with open(filename, 'w') instead of open(filename, 'wb')). Result is that now I can't unpickle it (not even on Windows) because it's infected with \r characters (and possibly more)? The main complaint is

我得到了一个pickle对象(其中包含几个numpy数组)，它是在Windows上创建的，显然是保存到作为文本加载的文件中，而不是二进制模式(ie)。打开(文件名，'w')而不是打开(文件名，'wb')。结果是，现在我不能把它(甚至在Windows上)去掉，因为它已经感染了\r字符(甚至可能更多)?主要的抱怨是

ImportError: No module named multiarray

supposedly because it's looking for numpy.core.multiarray\r, which of course doesn't exist. Simply removing the \r characters didn't do the trick (tried both sed -e 's/\r//g' and, in python s = file.read().replace('\r', ''), but both break the file and yield a cPickle.UnpicklingError later on)

应该是因为它在寻找numpy.core。multiarray\r，当然不存在。简单地删除\r字符并没有成功(尝试了sed -e 's/\r//g'和，在python = file.read()中。替换('\r'， ")，但都打破文件并产生一个cPickle。UnpicklingError之后)

Problem is that I really need to get the data out of the objects. Any ideas how to fix the files?

问题是我确实需要从对象中获取数据。有什么办法解决这些文件吗?

Edit: On request, the first few hundred bytes of my file, Octal:

编辑:根据请求，我的文件的前几百个字节，八进制:

\x80\x02]q\x01(}q\x02(U\r\ntotal_timeq\x03G?\x90\x15r\xc9(s\x00U\rreaction_timeq\x04NU\x0ejump_directionq\x05cnumpy.core.multiarray\r\nscalar\r\nq\x06cnumpy\r\ndtype\r\nq\x07U\x02f8K\x00K\x01\x87Rq\x08(K\x03U\x01<NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00tbU\x08\x025\x9d\x13\xfc#\xc8?\x86Rq\tU\x14normalised_directionq\r\nh\x06h\x08U\x08\xf0\xf9,\x0eA\x18\xf8?\x86Rq\x0bU\rjump_distanceq\x0ch\x06h\x08U\x08\x13\x14\xea&\xb0\x9b\x1a@\x86Rq\rU\x04jumpq\x0ecnumpy.core.multiarray\r\n_reconstruct\r\nq\x0fcnumpy\r\nndarray\r\nq\x10K\x00\x85U\x01b\x87Rq\x11(K\x01K\x02\x85h\x08\x89U\x10\x87\x16\xdaEG\xf4\xf3?\x06`OC\xe7"\x1a@tbU\x0emovement_speedq\x12h\x06h\x08U\x08\\p\xf5[2\xc2\xef?\x86Rq\x13U\x0ctrial_lengthq\x14G@\t\x98\x87\xf8\x1a\xb4\xbaU\tconditionq\x15U\x0bhigh_mentalq\x16U\x07subjectq\x17K\x02U\x12movement_directionq\x18h\x06h\x08U\x08\xde\x06\xcf\x1c50\xfd?\x86Rq\x19U\x08positionq\x1ah\x0fh\x10K\x00\x85U\x01b\x87Rq\x1b(K\x01K\x02\x85h\x08\x89U\x10K\xb7\xb4\x07q=\x1e\xc0\xf2\xc2YI\xb7U&\xc0tbU\x04typeq\x1ch\x0eU\x08movementq\x1dh\x0fh\x10K\x00\x85U\x01b\x87Rq\x1e(K\x01K\x02\x85h\x08\x89U\x10\xad8\x9c9\x10\xb5\xee\xbf\xffa\xa2hWR\xcf?tbu}q\x1f(h\x03G@\t\xba\xbc\xb8\xad\xc8\x14h\x04G?\xd9\x99%]\xadV\x00h\x05h\x06h\x08U\x08\xe3X\xa9=\xc1\xb1\xeb?\x86Rq h\r\nh\x06h\x08U\x08\x88\xf7\xb9\xc1\t\xd6\xff?\x86Rq!h\x0ch\x06h\x08U\x08v\x7f\xeb\x11\xea5\r@\x86Rq"h\x0eh\x0fh\x10K\x00\x85U\x01b\x87Rq#(K\x01K\x02\x85h\x08\x89U\x10\xcd\xd9\x92\x9a\x94=\x06@]C\xaf\xef\xeb\xef\x02@tbh\x12h\x06h\x08U\x08-\x9c&\x185\xfd\xef?\x86Rq$h\x14G@\r\xb8W\xb2`V\xach\x15h\x16h\x17K\x02h\x18h\x06h\x08U\x08\x8e\x87\xd1\xc2

You may also download the whole file (22k).

您也可以下载整个文件(22k)。

4 个解决方案

#1

Presuming that the file was created with the default protocol=0 ASCII-compatible method, you should be able to load it anywhere by using open('pickled_file', 'rU') i.e. universal newlines.

假设该文件是使用默认协议=0 ascii兼容的方法创建的，您应该可以使用open('pickled_file'， 'rU')，即通用的新行来加载它。

If this doesn't work, show us the first few hundred bytes: print repr(open('pickled_file', 'rb').read(200)) and paste the results into an edit of your question.

如果这不行，请向我们展示前几百个字节:print repr(open('pickled_file'， 'rb').read(200))，并将结果粘贴到您的问题的编辑器中。

Update after file contents were published:

文件内容发布后更新:

Your file starts with '\x80\x02'; it was dumped with protocol 2, the latest/best. Protocols 1 and 2 are binary protocols. Your file was written in text mode on Windows. This has resulted in each '\n' being converted to '\r\n' by the C runtime. Files should be opened in binary mode like this:

您的文件以“\x80\x02”开头;它被丢弃在协议2，最新的/最好的。协议1和2是二进制协议。你的文件是用文本模式写在Windows上的。这导致了每个“\n”被C运行时转换为“\r\n”。文件应该以二进制方式打开:

with open('result.pickle', 'wb') as f: # b for binary
    pickle.dump(obj, f, pickle.HIGHEST_PROTOCOL)

with open('result.pickle', 'rb') as f: # b for binary
    obj = pickle.load(f)

Docs are here. This code will work portably on both Windows and non-Windows systems.

文档都在这里。这段代码在Windows和非Windows系统上都可以很好地运行。

You can recover the original pickle image by reading the file in binary mode and then reversing the damage by replacing all occurrences of '\r\n' by '\n'. Note: This recovery procedure is necessary whether you are trying to read it on Windows or not.

您可以通过读取二进制模式中的文件来恢复原始的pickle图像，然后通过“\n”替换所有发生的“\r\n”来逆转损坏。注意:无论您是否尝试在Windows上阅读，这个恢复过程都是必需的。

#2

Newlines in Windows aren't just '\r', it's CRLF, or '\r\n'.

Windows中的新行不仅仅是“\r”，它是CRLF，或“\r\n”。

Give file.read().replace('\r\n', '\n') a try. You were previously deleting carriage returns that may not have actually been part of newlines.

给以()。替换(' \ r \ n ',' \ n ')一试。您之前删除了可能不属于换行的回车。

#3

Can't you -- on Windows -- just open the file in text mode, the same way it was written, read it in and then write it out to another file opened properly in binary mode?

你不能——在Windows上——打开文本模式的文件，就像它写的一样，读取它，然后把它写入另一个以二进制模式打开的文件?

#4

Have you tried unpickling in text mode? That is,

你试过在文本模式下的unpickle吗?也就是说,

x = pickle.load(open(filename, 'r'))

(On Windows, of course.)

(当然,在Windows上)。

#1