如何在调用groupby并从pandas转换时保留列顺序?

时间:2021-05-05 09:51:09

It seems that the columns get reordered by column index when calling pandas.DataFrame.groupby().shift(). The sort parameter applies only to rows.

在调用pandas.DataFrame.groupby()。shift()时,列似乎会被列索引重新排序。 sort参数仅适用于行。

Here is an example:

这是一个例子:

import pandas as pd
df = pd.DataFrame({'A': ['group1', 'group1', 'group2', 'group2', 'group3', 'group3'],
                   'E': ['a','b','c','d','e','f'],
                   'B': [10, 12, 10, 25, 10, 12],
                   'C': [100, 102, 100, 250, 100, 102],
                   'D': [1,2,3,4,5,6]
                  })

df.set_index('A',inplace=True)
df = df[['E','C','D','B']]
df

#         E     C   D    B
#     A            
#group1   a   100   1   10
#group1   b   102   2   12
#group2   c   100   3   10
#group2   d   250   4   25
#group3   e   100   5   10
#group3   f   102   6   12

Going from here, I want to achieve:

从这里开始,我希望实现:

#         E     C   D    B    C_s     D_s   B_s
#     A                     
#group1   a   100   1   10   102.0    2.0  12.0     
#group1   b   102   2   12     NaN    NaN   NaN     
#group2   c   100   3   10   250.0    4.0  25.0     
#group2   d   250   4   25     NaN    NaN   NaN     
#group3   e   100   5   10   102.0    6.0  12.0     
#group3   f   102   6   12     NaN    NaN   NaN

But

df[['C_s','D_s','B_s']]= df.groupby(level='A')[['C','D','B']].shift(-1)

Results in:

#         E     C   D    B    C_s     D_s   B_s
#     A                     
#group1   a   100   1   10   12.0   102.0   2.0
#group1   b   102   2   12    NaN     NaN   NaN
#group2   c   100   3   10   25.0   250.0   4.0
#group2   d   250   4   25    NaN     NaN   NaN
#group3   e   100   5   10   12.0   102.0   6.0
#group3   f   102   6   12    NaN     NaN   NaN

Introducing an artificial ordering of the columns helps to maintain the intrinsic logical connection of the columns:

引入列的人工排序有助于维护列的内在逻辑连接:

df = df.sort_index(axis=1)
df[['B_s','C_s','D_s']]= df.groupby(level='A')[['B','C','D']].shift(-1).sort_index(axis=1)
df
#         B    C  D  E   B_s   C_s   D_s
#     A              
#group1  10  100  1  a  12.0  102.0  2.0
#group1  12  102  2  b   NaN   NaN   NaN
#group2  10  100  3  c  25.0  250.0  4.0
#group2  25  250  4  d   NaN   NaN   NaN
#group3  10  100  5  e  12.0  102.0  6.0
#group3  12  102  6  f   NaN   NaN   NaN 

Why are the columns reordered in the first place?

为什么列首先重新排序?

1 个解决方案

#1


3  

In my opinion it is bug.

在我看来这是错误。

Working custom lambda function:

工作自定义lambda函数:

df[['C_s','D_s','B_s']] = df.groupby(level='A')['C','D','B'].apply(lambda x: x.shift(-1))
print (df)
        E    C  D   B    C_s  D_s   B_s
A                                      
group1  a  100  1  10  102.0  2.0  12.0
group1  b  102  2  12    NaN  NaN   NaN
group2  c  100  3  10  250.0  4.0  25.0
group2  d  250  4  25    NaN  NaN   NaN
group3  e  100  5  10  102.0  6.0  12.0
group3  f  102  6  12    NaN  NaN   NaN

Thank you @cᴏʟᴅsᴘᴇᴇᴅ for another solution:

谢谢@cᴏʟᴅsᴘᴇᴇᴅ的另一个解决方案:

df[['C_s','D_s','B_s']] = (df.groupby(level='A')['C','D','B']
                             .apply(pd.DataFrame.shift, periods=-1))

#1


3  

In my opinion it is bug.

在我看来这是错误。

Working custom lambda function:

工作自定义lambda函数:

df[['C_s','D_s','B_s']] = df.groupby(level='A')['C','D','B'].apply(lambda x: x.shift(-1))
print (df)
        E    C  D   B    C_s  D_s   B_s
A                                      
group1  a  100  1  10  102.0  2.0  12.0
group1  b  102  2  12    NaN  NaN   NaN
group2  c  100  3  10  250.0  4.0  25.0
group2  d  250  4  25    NaN  NaN   NaN
group3  e  100  5  10  102.0  6.0  12.0
group3  f  102  6  12    NaN  NaN   NaN

Thank you @cᴏʟᴅsᴘᴇᴇᴅ for another solution:

谢谢@cᴏʟᴅsᴘᴇᴇᴅ的另一个解决方案:

df[['C_s','D_s','B_s']] = (df.groupby(level='A')['C','D','B']
                             .apply(pd.DataFrame.shift, periods=-1))