如何组合循环生成的数据帧

时间:2022-11-11 22:57:22

I started a loop to generates dataframe from json in a folder.

我开始循环从文件夹中的json生成数据帧。

for filename in os.listdir('json1'):
with open(os.path.join('json1',filename),'r') as json_data:
    d=json.load(json_data)
    df2=pd.io.json.json_normalize(d)
    df2.columns = df2.columns.map(lambda x: x.split(".")[-1])
    df3=pd.io.json.json_normalize(d['Reviews'])
    df3.columns = df3.columns.map(lambda x: x.split(".")[-1])
    df4=pd.concat([df2]*df3.shape[0],ignore_index=True)
    df5=df4.join(df3)
    print(df5)

The result that I print contains the dataframe that generated for each json file in the folder. However, I am wondering how can I combine all of these dataframe into a single big dataframe. They all have similar columns head but may slightly different.

我打印的结果包含为文件夹中的每个json文件生成的数据帧。但是,我想知道如何将所有这些数据帧组合成一个大数据帧。他们都有相似的列头,但可能略有不同。

1 个解决方案

#1


0  

Try the following approach:

尝试以下方法:

def my_read_json(filename, **kwargs):
    # ...
    return df5

df = pd.concat([my_read_json(f) for f in files], ignore_index=True)

#1


0  

Try the following approach:

尝试以下方法:

def my_read_json(filename, **kwargs):
    # ...
    return df5

df = pd.concat([my_read_json(f) for f in files], ignore_index=True)