Python:转储到Json会添加额外的双引号和转义引号

时间:2022-09-15 13:02:06

I am retrieving Twitter-data with a python tool and dump them in the JSon format to my disk. I noticed an unintended escaping of the entire data-string for a tweet being enclosed in double quotes. Furthermore, all double quotes of the actual Json formatting are escaped with a backslash.

我正在使用python工具检索twitter数据,并将它们以JSon格式转储到我的磁盘中。我注意到,一条tweet被包含在双引号中,它无意中从整个数据字符串中泄漏出来。此外,实际Json格式的所有双引号都用反斜杠转义。

They look like this:

他们看起来像这样:

"{\"created_at\":\"Fri Aug 08 11:04:40 +0000 2014\",\"id\":497699913925292032,

"{\"created_at\":只\"Fri Aug 08 11:04:40 +0000 2014\" \"id\":497699913925292032,

How do I avoid that? It should be:

我怎么避免呢?应该是:

{"created_at":"Fri Aug 08 11:04:40 +0000 2014" .....

{“created_at”:“Fri Aug 08 11:04:40 +0000 2014”

My file-out code looks like this:

我的文件输出代码是这样的:

with io.open('data'+self.timestamp+'.txt', 'a', encoding='utf-8') as f:
            f.write(unicode(json.dumps(data, ensure_ascii=False)))
            f.write(unicode('\n'))

The unintended escaping causes problems wenn reading in the json file in a later processing step.

意想不到的转义会导致在稍后的处理步骤中读取json文件时出现问题。

1 个解决方案

#1


59  

You are doubly-encoding JSON strings. data is already a JSON string, and doesn't need to be encoded again:

您正在双编码JSON字符串。数据已经是一个JSON字符串,不需要再次编码:

>>> import json
>>> not_encoded = {"created_at":"Fri Aug 08 11:04:40 +0000 2014"}
>>> encoded_data = json.dumps(not_encoded)
>>> print encoded_data
{"created_at": "Fri Aug 08 11:04:40 +0000 2014"}
>>> double_encode = json.dumps(encoded_data)
>>> print double_encode
"{\"created_at\": \"Fri Aug 08 11:04:40 +0000 2014\"}"

Just write these directly to your file:

直接写在你的档案上:

with open('data{}.txt'.format(self.timestamp), 'a') as f:
    f.write(data + '\n')

#1


59  

You are doubly-encoding JSON strings. data is already a JSON string, and doesn't need to be encoded again:

您正在双编码JSON字符串。数据已经是一个JSON字符串,不需要再次编码:

>>> import json
>>> not_encoded = {"created_at":"Fri Aug 08 11:04:40 +0000 2014"}
>>> encoded_data = json.dumps(not_encoded)
>>> print encoded_data
{"created_at": "Fri Aug 08 11:04:40 +0000 2014"}
>>> double_encode = json.dumps(encoded_data)
>>> print double_encode
"{\"created_at\": \"Fri Aug 08 11:04:40 +0000 2014\"}"

Just write these directly to your file:

直接写在你的档案上:

with open('data{}.txt'.format(self.timestamp), 'a') as f:
    f.write(data + '\n')