python使用groupby panda数据帧计算直方图值

时间:2022-12-27 21:22:44

I want to group data from a dataframe using dataframe and I want to compute the histogram of the grouped data : This is my dataframe :

我想使用数据框对数据框中的数据进行分组,我想计算分组数据的直方图:这是我的数据框:

    indicator
key        
14        1
14        2
14        3
15        1
16        2
16        5
16        6
17        1
18        3

And I want to get this result using groupby :

我想使用groupby得到这个结果:

       indicator
key        
14        1,2,3
15        1
16        2,5,6
17        1
18        3

and then compute the histogram of every key

然后计算每个键的直方图

1 个解决方案

#1


1  

numpy.histogram cannot deal with the array in an array. You need to format your data like this.

numpy.histogram无法处理数组中的数组。您需要像这样格式化数据。

import numpy as np
import pandas as pd 
dataf = pd.DataFrame()
dataf['key'] = range(14,25)
dataf['indicator'] = [1,1,2,1,3,4,7,15,23,43,67]
dataf.loc[11] = [14,2]
dataf.loc[12] = [14,3]
dataf.loc[13] = [16,5]
dataf.loc[14] = [16,6]

Because there is no raw data provided, I can only assume data maybe can be reformatted like this.

由于没有提供原始数据,我只能假设数据可以像这样重新格式化。

In [30]: dataf
Out[30]: 
    key  indicator
0    14          1
1    15          1
2    16          2
3    17          1
4    18          3
5    19          4
6    20          7
7    21         15
8    22         23
9    23         43
10   24         67
11   14          2
12   14          3
13   16          5
14   16          6

numpy.histogram already handled the groupby concept so you don't need to do groupby function in DataFrame. You just need to do np.histogram(dff['indicator'])

numpy.histogram已经处理了groupby概念,因此您不需要在DataFrame中执行groupby函数。你只需要做np.histogram(dff ['indicator'])

FYI, if you want to plot a histogram, you can also use DataFrame.hist()

仅供参考,如果你想绘制直方图,你也可以使用DataFrame.hist()

dataf.indicator.hist()
import matplotlib.pyplot as plt
plt.savefig('test.png')

#1


1  

numpy.histogram cannot deal with the array in an array. You need to format your data like this.

numpy.histogram无法处理数组中的数组。您需要像这样格式化数据。

import numpy as np
import pandas as pd 
dataf = pd.DataFrame()
dataf['key'] = range(14,25)
dataf['indicator'] = [1,1,2,1,3,4,7,15,23,43,67]
dataf.loc[11] = [14,2]
dataf.loc[12] = [14,3]
dataf.loc[13] = [16,5]
dataf.loc[14] = [16,6]

Because there is no raw data provided, I can only assume data maybe can be reformatted like this.

由于没有提供原始数据,我只能假设数据可以像这样重新格式化。

In [30]: dataf
Out[30]: 
    key  indicator
0    14          1
1    15          1
2    16          2
3    17          1
4    18          3
5    19          4
6    20          7
7    21         15
8    22         23
9    23         43
10   24         67
11   14          2
12   14          3
13   16          5
14   16          6

numpy.histogram already handled the groupby concept so you don't need to do groupby function in DataFrame. You just need to do np.histogram(dff['indicator'])

numpy.histogram已经处理了groupby概念,因此您不需要在DataFrame中执行groupby函数。你只需要做np.histogram(dff ['indicator'])

FYI, if you want to plot a histogram, you can also use DataFrame.hist()

仅供参考,如果你想绘制直方图,你也可以使用DataFrame.hist()

dataf.indicator.hist()
import matplotlib.pyplot as plt
plt.savefig('test.png')