Python json。使用' ValueError:无效控制字符:第1行33列(char 33) '加载失败

时间:2022-05-15 06:07:13

I have a string like this:

我有这样一条线:

s = u"""{"desc": "\u73cd\u54c1\u7f51-\u5168\u7403\u6f6e\u6d41\u5962\u54c1\u7f51\u7edc\u96f6\u552e\u5546 <br \/>\r\nhttp:\/\/www.zhenpin.com\/ <br \/>\r\n<br \/>\r\n200\u591a\u4e2a\u56fd\u9645\u4e00\u7ebf\u54c1\u724c\uff0c\u9876\u7ea7\u4e70\u624b\u5168\u7403\u91c7\u8d2d\uff0c100%\u6b63\u54c1\u4fdd\u969c\uff0c7\u5929\u65e0\u6761\u2026"}"""

json.loads(s) returns error message like this:

load (s)返回如下所示的错误消息:

ValueError: Invalid control character at: line 1 column 33 (char 33)

Why does this error occur? How can I solve this problem?

为什么会出现这种错误?我如何解决这个问题?

5 个解决方案

#1


52  

The problem is your unicode string contains carriage returns (\r) and newlines (\n) within a string literal in the JSON data. If they were meant to be part of the string itself, they should be escaped appropriately. If they weren't meant to be part of the string, they shouldn't be in your JSON either.

问题是您的unicode字符串在JSON数据中的字符串文本中包含回车(\r)和换行(\n)。如果它们是字符串本身的一部分,它们应该被适当地转义。如果它们不是字符串的一部分,它们也不应该在JSON中。

If you can't fix where you got this JSON string to produce valid JSON, you could either remove the offending characters:

如果您无法修复这个JSON字符串在何处生成有效的JSON,那么您可以删除这些违规字符:

>>> json.loads(s.replace('\r\n', ''))

or escape them manually:

手动或逃避他们:

>>> json.loads(s.replace('\r\n', '\\r\\n'))

#2


78  

Another option, perhaps, is to use the strict=False argument

另一种选择,可能是使用strict=False参数

According to http://docs.python.org/2/library/json.html

据http://docs.python.org/2/library/json.html

"If strict is False (True is the default), then control characters will be allowed inside strings. Control characters in this context are those with character codes in the 0-31 range, including '\t' (tab), '\n', '\r' and '\0'."

“如果严格是假的(默认是真),那么控制字符将被允许在字符串中。”在此上下文中,控制字符是那些具有0-31范围内的字符代码的字符,包括'\t' (tab)、'\n'、'\r'和'\0'。

For example:

例如:

json.loads(json_str, strict=False)

#3


10  

The problem is that the character at index 33 is a carriage return control character.

问题是索引33中的字符是一个回车控制字符。

>>> s[33]
u'\r'

According to the JSON spec, valid characters are:

根据JSON规范,有效字符为:

  • Any Unicode character except: ", \, and control-characters (ord(char) < 32).

    除了:“、\和控制字符(ord(char) < 32))之外的任何Unicode字符。

  • The following character sequences are allowed: \", \\, \/, \b (backspace), \f (form feed), \n (line-feed/new-line), \r (carriage return), \t (tab), or \u followed by four hexadecimal digits.

    可以使用以下字符序列:\“\”、\/ \b (backspace)、\f (form feed)、\n (line-feed/new-line)、\r(回车)、\t (tab)、\u (\u)以及四个十六进制数字。

However, in Python you're going to have to double escape control characters (unless the string is raw) because Python also interprets those control characters.

但是,在Python中,您将不得不重复转义控制字符(除非字符串是原始的),因为Python也解释这些控制字符。

>>> s = ur"""{"desc": "\u73cd\u54c1\u7f51-\u5168\u7403\u6f6e\u6d41\u5962\u54c1\u7f51\u7edc\u96f6\u552e\u5546 <br \/>\r\nhttp:\/\/www.zhenpin.com\/ <br \/>\r\n<br \/>\r\n200\u591a\u4e2a\u56fd\u9645\u4e00\u7ebf\u54c1\u724c\uff0c\u9876\u7ea7\u4e70\u624b\u5168\u7403\u91c7\u8d2d\uff0c100%\u6b63\u54c1\u4fdd\u969c\uff0c7\u5929\u65e0\u6761\u2026"}"""
>>> json.loads(s)
{u'desc': u'\u73cd\u54c1\u7f51-\u5168\u7403\u6f6e\u6d41\u5962\u54c1\u7f51\u7edc\u96f6\u552e\u5546 <br />\r\nhttp://www.zhenpin.com/ <br />\r\n<br />\r\n200\u591a\u4e2a\u56fd\u9645\u4e00\u7ebf\u54c1\u724c\uff0c\u9876\u7ea7\u4e70\u624b\u5168\u7403\u91c7\u8d2d\uff0c100%\u6b63\u54c1\u4fdd\u969c\uff0c7\u5929\u65e0\u6761\u2026'}

References:

引用:

#4


7  

Try to escape your \n and \r:

试着逃离你的\n和\r:

s = s.replace('\r', '\\r').replace('\n', '\\n')
json.loads(s)
>>> {u'desc': u'\u73cd\u54c1\u7f51-\u5168\u7403\u6f6e\u6d41\u5962\u54c1\u7f51\u7edc\u96f6\u552e\u5546 <br />\r\nhttp://www.zhenpin.com/ <br />\r\n<br />\r\n200\u591a\u4e2a\u56fd\u9645\u4e00\u7ebf\u54c1\u724c\uff0c\u9876\u7ea7\u4e70\u624b\u5168\u7403\u91c7\u8d2d\uff0c100%\u6b63\u54c1\u4fdd\u969c\uff0c7\u5929\u65e0\u6761\u2026'}

#5


0  

In some cases, this error will be raised when the file actually contains a string with a whitespace in it. Deleting the whitespace will solve the problem.

在某些情况下,当文件实际包含一个带空格的字符串时,将引发此错误。删除空格将解决这个问题。

#1


52  

The problem is your unicode string contains carriage returns (\r) and newlines (\n) within a string literal in the JSON data. If they were meant to be part of the string itself, they should be escaped appropriately. If they weren't meant to be part of the string, they shouldn't be in your JSON either.

问题是您的unicode字符串在JSON数据中的字符串文本中包含回车(\r)和换行(\n)。如果它们是字符串本身的一部分,它们应该被适当地转义。如果它们不是字符串的一部分,它们也不应该在JSON中。

If you can't fix where you got this JSON string to produce valid JSON, you could either remove the offending characters:

如果您无法修复这个JSON字符串在何处生成有效的JSON,那么您可以删除这些违规字符:

>>> json.loads(s.replace('\r\n', ''))

or escape them manually:

手动或逃避他们:

>>> json.loads(s.replace('\r\n', '\\r\\n'))

#2


78  

Another option, perhaps, is to use the strict=False argument

另一种选择,可能是使用strict=False参数

According to http://docs.python.org/2/library/json.html

据http://docs.python.org/2/library/json.html

"If strict is False (True is the default), then control characters will be allowed inside strings. Control characters in this context are those with character codes in the 0-31 range, including '\t' (tab), '\n', '\r' and '\0'."

“如果严格是假的(默认是真),那么控制字符将被允许在字符串中。”在此上下文中,控制字符是那些具有0-31范围内的字符代码的字符,包括'\t' (tab)、'\n'、'\r'和'\0'。

For example:

例如:

json.loads(json_str, strict=False)

#3


10  

The problem is that the character at index 33 is a carriage return control character.

问题是索引33中的字符是一个回车控制字符。

>>> s[33]
u'\r'

According to the JSON spec, valid characters are:

根据JSON规范,有效字符为:

  • Any Unicode character except: ", \, and control-characters (ord(char) < 32).

    除了:“、\和控制字符(ord(char) < 32))之外的任何Unicode字符。

  • The following character sequences are allowed: \", \\, \/, \b (backspace), \f (form feed), \n (line-feed/new-line), \r (carriage return), \t (tab), or \u followed by four hexadecimal digits.

    可以使用以下字符序列:\“\”、\/ \b (backspace)、\f (form feed)、\n (line-feed/new-line)、\r(回车)、\t (tab)、\u (\u)以及四个十六进制数字。

However, in Python you're going to have to double escape control characters (unless the string is raw) because Python also interprets those control characters.

但是,在Python中,您将不得不重复转义控制字符(除非字符串是原始的),因为Python也解释这些控制字符。

>>> s = ur"""{"desc": "\u73cd\u54c1\u7f51-\u5168\u7403\u6f6e\u6d41\u5962\u54c1\u7f51\u7edc\u96f6\u552e\u5546 <br \/>\r\nhttp:\/\/www.zhenpin.com\/ <br \/>\r\n<br \/>\r\n200\u591a\u4e2a\u56fd\u9645\u4e00\u7ebf\u54c1\u724c\uff0c\u9876\u7ea7\u4e70\u624b\u5168\u7403\u91c7\u8d2d\uff0c100%\u6b63\u54c1\u4fdd\u969c\uff0c7\u5929\u65e0\u6761\u2026"}"""
>>> json.loads(s)
{u'desc': u'\u73cd\u54c1\u7f51-\u5168\u7403\u6f6e\u6d41\u5962\u54c1\u7f51\u7edc\u96f6\u552e\u5546 <br />\r\nhttp://www.zhenpin.com/ <br />\r\n<br />\r\n200\u591a\u4e2a\u56fd\u9645\u4e00\u7ebf\u54c1\u724c\uff0c\u9876\u7ea7\u4e70\u624b\u5168\u7403\u91c7\u8d2d\uff0c100%\u6b63\u54c1\u4fdd\u969c\uff0c7\u5929\u65e0\u6761\u2026'}

References:

引用:

#4


7  

Try to escape your \n and \r:

试着逃离你的\n和\r:

s = s.replace('\r', '\\r').replace('\n', '\\n')
json.loads(s)
>>> {u'desc': u'\u73cd\u54c1\u7f51-\u5168\u7403\u6f6e\u6d41\u5962\u54c1\u7f51\u7edc\u96f6\u552e\u5546 <br />\r\nhttp://www.zhenpin.com/ <br />\r\n<br />\r\n200\u591a\u4e2a\u56fd\u9645\u4e00\u7ebf\u54c1\u724c\uff0c\u9876\u7ea7\u4e70\u624b\u5168\u7403\u91c7\u8d2d\uff0c100%\u6b63\u54c1\u4fdd\u969c\uff0c7\u5929\u65e0\u6761\u2026'}

#5


0  

In some cases, this error will be raised when the file actually contains a string with a whitespace in it. Deleting the whitespace will solve the problem.

在某些情况下,当文件实际包含一个带空格的字符串时,将引发此错误。删除空格将解决这个问题。