我怎样才能有效地从Pandas数据帧转移到JSON

时间:2022-03-11 07:36:13

I've started using pandas to do some aggregation by date. My goal is to count all of the instances of a measurement that occur on a particular day, and to then represent this in D3. To illustrate my workflow, I have a queryset (from Django) that looks like this:

我已经开始使用pandas按日期进行一些聚合。我的目标是计算特定日期发生的所有测量实例,然后在D3中表示。为了说明我的工作流程,我有一个查询集(来自Django),如下所示:

queryset = [{'created':"05-16-13", 'counter':1, 'id':13}, {'created':"05-16-13", 'counter':1, 'id':34}, {'created':"05-17-13", 'counter':1, 'id':12}, {'created':"05-16-13", 'counter':1, 'id':7}, {'created':"05-18-13", 'counter':1, 'id':6}]

I make a dataframe in pandas and aggregate the measure 'counter' by the day created:

我在pandas中创建了一个数据框,并在创建的那一天汇总了度量'counter':

import pandas as pd
queryset_df = pd.DataFrame.from_records(queryset).set_index('id')
aggregated_df = queryset_df.groupby('created').sum()

This gives me a dataframe like this:

这给了我一个像这样的数据帧:

          counter
created          
05-16-13        3
05-17-13        1
05-18-13        1

As I'm using D3 I thought that a JSON object would be the most useful. Using the Pandas to_json() function I convert my dataframe like this:

当我使用D3时,我认为JSON对象将是最有用的。使用Pandas to_json()函数我转换我的数据帧如下:

aggregated_df.to_json()

giving me the following JSON object

给我以下JSON对象

{"counter":{"05-16-13":3,"05-17-13":1,"05-18-13":1}}

This is not exactly what I want, as I would like to be able to access both the date, and the measurement. Is there a way that I can export the data such that I end up with something like this?

这不是我想要的,因为我希望能够同时访问日期和测量。有没有办法可以导出数据,以便我最终得到这样的东西?

data = {"c1":{"date":"05-16-13", "counter":3},"c2":{"date":"05-17-13", "counter":1}, "c3":{"date":"05-18-13", "counter":1}}

I thought that if I could structure this differently on the Python side, it would reduce the amount of data formatting I would need to do on the JS side as I planned to load the data doing something like this:

我认为如果我可以在Python方面以不同的方式构建它,它将减少我需要在JS端执行的数据格式化,因为我计划加载数据执行类似这样的操作:

  x.domain(d3.extent(data, function(d) { return d.date; }));
  y.domain(d3.extent(data, function(d) { return d.counter; }));

I'm very open to suggestions of better workflows overall as this is something I will need to do frequently but am unsure of the best way of handling the connection between D3 and pandas. (I have looked at several packages that combine both python and D3 directly, but that is not something that I am looking for as they seem to focus on static chart generation and not making an svg)

我对整体工作流程的建议非常开放,因为这是我需要经常做的事情,但我不确定处理D3和熊猫之间连接的最佳方法。 (我已经看过几个直接结合了python和D3的软件包,但这不是我想要的东西,因为它们似乎专注于静态图表生成而不是制作svg)

1 个解决方案

#1


23  

Transform your date index back into a simple data column with reset_index, and then generate your json object by using the orient='index' property:

使用reset_index将日期索引转换回简单数据列,然后使用orient ='index'属性生成json对象:

In [11]: aggregated_df.reset_index().to_json(orient='index')
Out[11]: '{"0":{"created":"05-16-13","counter":3},"1":{"created":"05-17-13","counter":1},"2":{"created":"05-18-13","counter":1}}'

#1


23  

Transform your date index back into a simple data column with reset_index, and then generate your json object by using the orient='index' property:

使用reset_index将日期索引转换回简单数据列,然后使用orient ='index'属性生成json对象:

In [11]: aggregated_df.reset_index().to_json(orient='index')
Out[11]: '{"0":{"created":"05-16-13","counter":3},"1":{"created":"05-17-13","counter":1},"2":{"created":"05-18-13","counter":1}}'