如何用pandas读取json-dictionary类型文件?

时间:2021-10-08 08:41:43

I have a long json like this: http://pastebin.com/gzhHEYGy

我有一个像这样的长json:http://pastebin.com/gzhHEYGy

I would like to place it into a pandas datframe in order to play with it, so by the documentation I do the following:

我想将它放入一个pandas数据框中以便使用它,因此通过文档我执行以下操作:

df = pd.read_json('/user/file.json')
print df

I got this traceback:

我得到了这个追溯:

  File "/Users/user/PycharmProjects/PAN-pruebas/json_2_dataframe.py", line 6, in <module>
    df = pd.read_json('/Users/user/Downloads/54db3923f033e1dd6a82222aa2604ab9.json')
  File "/usr/local/lib/python2.7/site-packages/pandas/io/json.py", line 198, in read_json
    date_unit).parse()
  File "/usr/local/lib/python2.7/site-packages/pandas/io/json.py", line 266, in parse
    self._parse_no_numpy()
  File "/usr/local/lib/python2.7/site-packages/pandas/io/json.py", line 483, in _parse_no_numpy
    loads(json, precise_float=self.precise_float), dtype=None)
  File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 203, in __init__
    mgr = self._init_dict(data, index, columns, dtype=dtype)
  File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 327, in _init_dict
    dtype=dtype)
  File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 4620, in _arrays_to_mgr
    index = extract_index(arrays)
  File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 4668, in extract_index
    raise ValueError('arrays must all be same length')
ValueError: arrays must all be same length

Then from a previous question I found that I need to do something like this:

然后从前一个问题我发现我需要做这样的事情:

d = dict( A = np.array([1,2]), B = np.array([1,2,3,4]) )

But I dont get how should I obtain the contents like a numpy array. How can I preserve the length of the arrays in a big file like this?. Thanks in advance.

但我不知道如何获得像numpy数组的内容。如何在这样的大文件中保留数组的长度?提前致谢。

1 个解决方案

#1


15  

The json method doesnt work as the json file is not in the format it expects. As we can easily load a json as a dict, let's try this way :

json方法不起作用,因为json文件不是它期望的格式。因为我们可以轻松地将json作为dict加载,所以让我们尝试这种方式:

import pandas as pd
import json
import os

os.chdir('/Users/nicolas/Downloads')

# Reading the json as a dict
with open('json_example.json') as json_data:
    data = json.load(json_data)

# using the from_dict load function. Note that the 'orient' parameter 
#is not using the default value (or it will give the same error than you had)
# We transpose the resulting df and set index column as its index to get this result
pd.DataFrame.from_dict(data, orient='index').T.set_index('index')   

output:

输出:

                                                                 data columns
index                                                                        
311210177061863424  [25-34\n, FEMALE, @bikewa absolutely the best....     age
310912785183813632  [25-34\n, FEMALE, Photo: I love the Burke-Gilm...  gender
311290293871849472  [25-34\n, FEMALE, Photo: Inhaled! #fitfoodie h...    text
309386414548717569  [25-34\n, FEMALE, Facebook Is Making The Most ...    None
312327801187495936  [25-34\n, FEMALE, Still upset about this &gt;&...    None
312249421079400449  [25-34\n, FEMALE, @JoeM_PM_UK @JonAntoine I've...    None
308692673194246145  [25-34\n, FEMALE, @Social_Freedom_ actually, t...    None
308995226633129984  [25-34\n, FEMALE, @seattleweekly that's more t...    None
308660851219501056  [25-34\n, FEMALE, @adamholdenbache I noticed 1...    None
308658690528014337  [25-34\n, FEMALE, @CEM_Social I am waiting pat...    None
309719798001070080  [25-34\n, FEMALE, Going to be watching Faceboo...    None
312349448049152002  [25-34\n, FEMALE, @anikamarketer I applied for...    None
312325152698404864  [25-34\n, FEMALE, @_chrisrojas_ wow, that's so...    None
310546490844135425  [25-34\n, FEMALE, Photo: Feeling like a bit of...    None

#1


15  

The json method doesnt work as the json file is not in the format it expects. As we can easily load a json as a dict, let's try this way :

json方法不起作用,因为json文件不是它期望的格式。因为我们可以轻松地将json作为dict加载,所以让我们尝试这种方式:

import pandas as pd
import json
import os

os.chdir('/Users/nicolas/Downloads')

# Reading the json as a dict
with open('json_example.json') as json_data:
    data = json.load(json_data)

# using the from_dict load function. Note that the 'orient' parameter 
#is not using the default value (or it will give the same error than you had)
# We transpose the resulting df and set index column as its index to get this result
pd.DataFrame.from_dict(data, orient='index').T.set_index('index')   

output:

输出:

                                                                 data columns
index                                                                        
311210177061863424  [25-34\n, FEMALE, @bikewa absolutely the best....     age
310912785183813632  [25-34\n, FEMALE, Photo: I love the Burke-Gilm...  gender
311290293871849472  [25-34\n, FEMALE, Photo: Inhaled! #fitfoodie h...    text
309386414548717569  [25-34\n, FEMALE, Facebook Is Making The Most ...    None
312327801187495936  [25-34\n, FEMALE, Still upset about this &gt;&...    None
312249421079400449  [25-34\n, FEMALE, @JoeM_PM_UK @JonAntoine I've...    None
308692673194246145  [25-34\n, FEMALE, @Social_Freedom_ actually, t...    None
308995226633129984  [25-34\n, FEMALE, @seattleweekly that's more t...    None
308660851219501056  [25-34\n, FEMALE, @adamholdenbache I noticed 1...    None
308658690528014337  [25-34\n, FEMALE, @CEM_Social I am waiting pat...    None
309719798001070080  [25-34\n, FEMALE, Going to be watching Faceboo...    None
312349448049152002  [25-34\n, FEMALE, @anikamarketer I applied for...    None
312325152698404864  [25-34\n, FEMALE, @_chrisrojas_ wow, that's so...    None
310546490844135425  [25-34\n, FEMALE, Photo: Feeling like a bit of...    None