UnicodeDecodeError: 'utf8' codec不能解码位置0中的字节0xa5:无效的开始字节。

时间:2023-01-06 21:20:47

I am using Python-2.6 CGI scripts but found this error in server log while doing json.dumps(),

我使用的是Python-2.6 CGI脚本,但在执行json.dump()时,在服务器日志中发现了这个错误。

Traceback (most recent call last):
  File "/etc/mongodb/server/cgi-bin/getstats.py", line 135, in <module>
    print json.dumps(​​__get​data())
  File "/usr/lib/python2.7/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib/python2.7/json/encoder.py", line 201, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python2.7/json/encoder.py", line 264, in iterencode
    return _iterencode(o, 0)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa5 in position 0: invalid start byte

​Here ,

在这里,

​__get​data() function returns dictionary {} .

__get data()函数返回字典{}。

Before posting this question I have referred this of question os SO.

在发布这个问题之前,我已经提到了这个问题。


UPDATES

Following line is hurting JSON encoder,

下面的代码正在伤害JSON编码器,

now = datetime.datetime.now()
now = datetime.datetime.strftime(now, '%Y-%m-%dT%H:%M:%S.%fZ')
print json.dumps({'current_time': now}) // this is the culprit

I got a temporary fix for it

我得到了一个临时的解决办法。

print json.dumps( {'old_time': now.encode('ISO-8859-1').strip() })

But I am not sure is it correct way to do it.

但我不确定这样做是否正确。

7 个解决方案

#1


35  

The error is because there is some non-ascii character in the dictionary and it can't be encoded/decoded. One simple way to avoid this error is to encode such strings with encode() function as follows (if a is the string with non-ascii character):

这个错误是因为字典中有一些非ascii字符,不能进行编码/解码。避免此错误的一种简单方法是用encode()函数对这些字符串进行编码(如果a是带有非ascii字符的字符串):

a.encode('utf-8').strip()

#2


24  

Your string has a non ascii character encoded in it.

字符串中有一个非ascii字符编码。

Not being able to decode with utf-8 may happen if you've needed to use other encodings in your code. For example:

如果您需要在代码中使用其他编码,则无法使用utf-8进行解码。例如:

>>> 'my weird character \x96'.decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 19: invalid start byte

In this case, the encoding is windows-1252 so you have to do:

在这种情况下,编码是window -1252,所以你必须这样做:

>>> 'my weird character \x96'.decode('windows-1252')
u'my weird character \u2013'

Now that you have unicode, you can safely encode into utf-8.

现在已经有了unicode,您可以安全地编码到utf-8。

#3


18  

Try the below code snippet:

试试下面的代码片段:

with open(path, 'rb') as f:
  text = f.read()

#4


5  

Set default encoder at the top of your code

在代码的顶部设置默认编码器。

import sys
reload(sys)
sys.setdefaultencoding("ISO-8859-1")

#5


2  

Following line is hurting JSON encoder,

下面的代码正在伤害JSON编码器,

now = datetime.datetime.now()
now = datetime.datetime.strftime(now, '%Y-%m-%dT%H:%M:%S.%fZ')
print json.dumps({'current_time': now}) // this is the culprit

I got a temporary fix for it

我得到了一个临时的解决办法。

print json.dumps( {'old_time': now.encode('ISO-8859-1').strip() })

Marking this as correct as a temporary fix (Not sure so).

将其标记为正确的临时修复(不确定是这样)。

#6


1  

After trying all the aforementioned workarounds, if it still throws the same error, you can try exporting the file as CSV (a second time if you already have). Especially if you're using scikit learn, it is best to import the dataset as a CSV file.

在尝试了上述所有的工作区之后,如果仍然抛出相同的错误,您可以尝试将该文件导出为CSV(如果您已经有了第二次的话)。特别是如果您使用scikit学习,最好将数据集导入为CSV文件。

I spent hours together, whereas the solution was this simple. Export the file as a CSV to the directory where Anaconda or your classifier tools are installed and try.

我花了几个小时在一起,而解决方法就是这么简单。将文件作为CSV导出到Anaconda或您的分类器工具安装和尝试的目录。

#7


-1  

Just in my case, if I save the xslx excel file as a CSV(Comma delimited), the error will present. However, when I save is as CSV(MS-DOS), the error won't come.

就我而言,如果将xslx excel文件保存为CSV(逗号分隔),则会出现错误。但是,当我保存为CSV(MS-DOS)时,错误不会出现。

#1


35  

The error is because there is some non-ascii character in the dictionary and it can't be encoded/decoded. One simple way to avoid this error is to encode such strings with encode() function as follows (if a is the string with non-ascii character):

这个错误是因为字典中有一些非ascii字符,不能进行编码/解码。避免此错误的一种简单方法是用encode()函数对这些字符串进行编码(如果a是带有非ascii字符的字符串):

a.encode('utf-8').strip()

#2


24  

Your string has a non ascii character encoded in it.

字符串中有一个非ascii字符编码。

Not being able to decode with utf-8 may happen if you've needed to use other encodings in your code. For example:

如果您需要在代码中使用其他编码,则无法使用utf-8进行解码。例如:

>>> 'my weird character \x96'.decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 19: invalid start byte

In this case, the encoding is windows-1252 so you have to do:

在这种情况下,编码是window -1252,所以你必须这样做:

>>> 'my weird character \x96'.decode('windows-1252')
u'my weird character \u2013'

Now that you have unicode, you can safely encode into utf-8.

现在已经有了unicode,您可以安全地编码到utf-8。

#3


18  

Try the below code snippet:

试试下面的代码片段:

with open(path, 'rb') as f:
  text = f.read()

#4


5  

Set default encoder at the top of your code

在代码的顶部设置默认编码器。

import sys
reload(sys)
sys.setdefaultencoding("ISO-8859-1")

#5


2  

Following line is hurting JSON encoder,

下面的代码正在伤害JSON编码器,

now = datetime.datetime.now()
now = datetime.datetime.strftime(now, '%Y-%m-%dT%H:%M:%S.%fZ')
print json.dumps({'current_time': now}) // this is the culprit

I got a temporary fix for it

我得到了一个临时的解决办法。

print json.dumps( {'old_time': now.encode('ISO-8859-1').strip() })

Marking this as correct as a temporary fix (Not sure so).

将其标记为正确的临时修复(不确定是这样)。

#6


1  

After trying all the aforementioned workarounds, if it still throws the same error, you can try exporting the file as CSV (a second time if you already have). Especially if you're using scikit learn, it is best to import the dataset as a CSV file.

在尝试了上述所有的工作区之后,如果仍然抛出相同的错误,您可以尝试将该文件导出为CSV(如果您已经有了第二次的话)。特别是如果您使用scikit学习,最好将数据集导入为CSV文件。

I spent hours together, whereas the solution was this simple. Export the file as a CSV to the directory where Anaconda or your classifier tools are installed and try.

我花了几个小时在一起,而解决方法就是这么简单。将文件作为CSV导出到Anaconda或您的分类器工具安装和尝试的目录。

#7


-1  

Just in my case, if I save the xslx excel file as a CSV(Comma delimited), the error will present. However, when I save is as CSV(MS-DOS), the error won't come.

就我而言,如果将xslx excel文件保存为CSV(逗号分隔),则会出现错误。但是,当我保存为CSV(MS-DOS)时,错误不会出现。