没有应用函数的pandas数据帧中的数据透视表

时间:2022-11-29 21:27:08

I have a pandas dataframe that looks like that:

我有一个像这样的pandas数据框:

df = pd.DataFrame({ 'ID' : [2,2,2,2,2,4,4,3,3,3,6] , 'count' : [20,43,45,50,15,65,35,15,15,14,30]})
df
    ID  count
0    2     20
1    2     43 
2    2     45
3    2     50
4    2     15
5    4     65
6    4     35
7    3     15
8    3     15
9    3     14
10   6     30

I want to create a pivot table with the following output:

我想创建一个带有以下输出的数据透视表:

ID    1    2    3    4    5
 2   20   43   45   50   15
 4   65   35    0    0    0
 3   15   15   14    0    0
 6   30    0    0    0    0

I thought using the pivot function to the dataframe (df_pivot = df.pivot(index='ID', columns=..., values='count') but I am missing the columns index list. I thought applying a lambda function to the df to generate an additional column with the missing column names but I have 800M IDs and the apply function to a grouped dataframe is painfully slow. Is there a quick approach you might be aware off?

我认为使用数据框的pivot函数(df_pivot = df.pivot(index ='ID',columns = ...,values ='count')但我缺少列索引列表。我想应用lambda函数df生成一个带有缺少列名的附加列,但是我有800M ID,并且对分组数据帧的apply函数非常慢。有没有快速的方法你可能会注意到?

1 个解决方案

#1


2  

I would define a subindex for each group as:

我会为每个组定义一个子索引:

df['subindex'] = df.groupby('ID').cumcount() + 1

Then apply the pivot method setting the new subindex as columns and fill NaN values with 0:

然后应用pivot方法将新子索引设置为列,并使用0填充NaN值:

d = pd.pivot_table(df,index='ID',columns='subindex',values='count').fillna(0)

This returns:

subindex   1   2   3   4   5
ID                          
2         20  43  45  50  15
3         15  15  14   0   0
4         65  35   0   0   0
6         30   0   0   0   0

Hope that helps.

希望有所帮助。

#1


2  

I would define a subindex for each group as:

我会为每个组定义一个子索引:

df['subindex'] = df.groupby('ID').cumcount() + 1

Then apply the pivot method setting the new subindex as columns and fill NaN values with 0:

然后应用pivot方法将新子索引设置为列,并使用0填充NaN值:

d = pd.pivot_table(df,index='ID',columns='subindex',values='count').fillna(0)

This returns:

subindex   1   2   3   4   5
ID                          
2         20  43  45  50  15
3         15  15  14   0   0
4         65  35   0   0   0
6         30   0   0   0   0

Hope that helps.

希望有所帮助。