使用Python解析多行JSON文件的问题

时间:2021-11-12 00:50:37

I am trying to parse a JSON multiline file using json library in Python 2.7. A simplified sample file is given below:

我试图在Python 2.7中使用json库解析JSON多行文件。简化的示例文件如下:

{
"observations": {
    "notice": [
        {
            "copyright": "Copyright Commonwealth of Australia 2015, Bureau of Meteorology. For more information see: http://www.bom.gov.au/other/copyright.shtml http://www.bom.gov.au/other/disclaimer.shtml",
            "copyright_url": "http://www.bom.gov.au/other/copyright.shtml",
            "disclaimer_url": "http://www.bom.gov.au/other/disclaimer.shtml",
            "feedback_url": "http://www.bom.gov.au/other/feedback"
        }
    ]
}
}

My code is as follows:

我的代码如下:

import json

with open('test.json', 'r') as jsonFile:
    for jf in jsonFile:
        jf = jf.replace('\n', '')
        jf = jf.strip()
        weatherData = json.loads(jf)
        print weatherData

Nevertheless, I get an error as shown below:

不过,我收到一个错误,如下所示:

Traceback (most recent call last):
File "test.py", line 8, in <module>
weatherData = json.loads(jf)
File "/home/usr/anaconda2/lib/python2.7/json/__init__.py", line 339, in loads
return _default_decoder.decode(s)
File "/home/usr/anaconda2/lib/python2.7/json/decoder.py", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/home/usr/anaconda2/lib/python2.7/json/decoder.py", line 380, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Expecting object: line 1 column 1 (char 0)

Just to do some testing, I modified the code such that after removing newlines and striping away the leading and trailing white spaces, I write the contents to another file (with the json extension). Surprisingly, when I read back the latter file, I do not get any error and the parsing is successful. The modified code is as follows:

为了做一些测试,我修改了代码,以便在删除换行符并去掉前导和尾随空格后,将内容写入另一个文件(带有json扩展名)。令人惊讶的是,当我回读后一个文件时,我没有收到任何错误,解析成功。修改后的代码如下:

import json

filewrite = open('out.json', 'w+')

with open('test.json', 'r') as jsonFile:
    for jf in jsonFile:
        jf = jf.replace('\n', '')
        jf = jf.strip()
        filewrite.write(jf)

filewrite.close()

with open('out.json', 'r') as newJsonFile:
    for line in newJsonFile:
        weatherData = json.loads(line)
        print weatherData

The output is as follows:

输出如下:

{u'observations': {u'notice': [{u'copyright_url': u'http://www.bom.gov.au/other/copyright.shtml', u'disclaimer_url': u'http://www.bom.gov.au/other/disclaimer.shtml', u'copyright': u'Copyright Commonwealth of Australia 2015, Bureau of Meteorology. For more information see: http://www.bom.gov.au/other/copyright.shtml http://www.bom.gov.au/other/disclaimer.shtml', u'feedback_url': u'http://www.bom.gov.au/other/feedback'}]}}

Any idea what might be going on when new lines and white spaces are stripped before using json library?

在使用json库之前删除新行和空格时可能会发生什么?

3 个解决方案

#1


4  

You will go crazy if you try to parse a json file line by line. The json module has helper methods to read file objects directly or strings i.e. the load and loads methods. load takes a file object (as shown below) for a file that contains json data, while loads takes a string that contains json data.

如果你尝试逐行解析json文件,你会发疯的。 json模块具有直接读取文件对象的辅助方法或字符串,即加载和加载方法。 load为包含json数据的文件提取文件对象(如下所示),而load则包含包含json数据的字符串。

Option 1: - Preferred

选项1: - 首选

import json
with open('test.json', 'r') as jf:
    weatherData = json.load(jf)
    print weatherData

Option 2:

import json
with open('test.json', 'r') as jf:
    weatherData = json.loads(jf.read())
    print weatherData

If you are looking for higher performance json parsing check out ujson

如果您正在寻找更高性能的json解析,请查看ujson

#2


5  

In the first snippet, you try to parse it line by line. You should parse it all at once. The easiest is to use json.load(jsonfile). (The jf variable name is misleading as it's a string). So the correct way to parse it:

在第一个代码段中,您尝试逐行解析它。你应该立刻解析它。最简单的方法是使用json.load(jsonfile)。 (jf变量名称具有误导性,因为它是一个字符串)。所以解析它的正确方法:

import json

with open('test.json', 'r') as jsonFile:
    weatherData = json.loads(jsonFile)

Although it's a good idea to store the json in one line, as it's more concise.

尽管将json存储在一行中是个好主意,因为它更简洁。

In the second snippet your problem is that you print it as unicode string which is and u'string here' is python specific. A valid json uses double quotation marks

在第二个片段中,您的问题是您将其打印为unicode字符串,而这里的字符串是特定于python的。有效的json使用双引号

#3


1  

FYI, you can have both files opened in single with statement:

仅供参考,您可以在单个声明中打开两个文件:

with open('file_A') as in_, open('file_B', 'w+') as out_:
    # logic here
    ...

#1


4  

You will go crazy if you try to parse a json file line by line. The json module has helper methods to read file objects directly or strings i.e. the load and loads methods. load takes a file object (as shown below) for a file that contains json data, while loads takes a string that contains json data.

如果你尝试逐行解析json文件,你会发疯的。 json模块具有直接读取文件对象的辅助方法或字符串,即加载和加载方法。 load为包含json数据的文件提取文件对象(如下所示),而load则包含包含json数据的字符串。

Option 1: - Preferred

选项1: - 首选

import json
with open('test.json', 'r') as jf:
    weatherData = json.load(jf)
    print weatherData

Option 2:

import json
with open('test.json', 'r') as jf:
    weatherData = json.loads(jf.read())
    print weatherData

If you are looking for higher performance json parsing check out ujson

如果您正在寻找更高性能的json解析,请查看ujson

#2


5  

In the first snippet, you try to parse it line by line. You should parse it all at once. The easiest is to use json.load(jsonfile). (The jf variable name is misleading as it's a string). So the correct way to parse it:

在第一个代码段中,您尝试逐行解析它。你应该立刻解析它。最简单的方法是使用json.load(jsonfile)。 (jf变量名称具有误导性,因为它是一个字符串)。所以解析它的正确方法:

import json

with open('test.json', 'r') as jsonFile:
    weatherData = json.loads(jsonFile)

Although it's a good idea to store the json in one line, as it's more concise.

尽管将json存储在一行中是个好主意,因为它更简洁。

In the second snippet your problem is that you print it as unicode string which is and u'string here' is python specific. A valid json uses double quotation marks

在第二个片段中,您的问题是您将其打印为unicode字符串,而这里的字符串是特定于python的。有效的json使用双引号

#3


1  

FYI, you can have both files opened in single with statement:

仅供参考,您可以在单个声明中打开两个文件:

with open('file_A') as in_, open('file_B', 'w+') as out_:
    # logic here
    ...