基于列名堆叠pandas数据帧

时间:2022-03-02 15:47:33

I have following pandas Dataframe:

我有以下pandas Dataframe:

 ID   Year  Jan_salary  Jan_days  Feb_salary Feb_days Mar_salary Mar_days
  1   2016     4500        22         4200      18        4700       24
  2   2016     3800        23         3600      19        4400       23
  3   2016     5500        21         5200      17        5300       23

I want to convert this dataframe to following dataframe:

我想将此数据帧转换为以下数据帧:

ID     Year     month   salary   days
 1      2016      01     4500     22
 1      2016      02     4200     18
 1      2016      03     4700     24
 2      2016      01     3800     23
 2      2016      02     3600     19
 2      2016      03     4400     23
 3      2016      01     5500     21
 3      2016      02     5200     17
 3      2016      03     5300     23

I tried use pandas.DataFrame.stack but couldn't get the expected outcome. I am using Python 2.7 Please guide me to reshape this Pandas dataframe. Thanks.

我尝试使用pandas.DataFrame.stack但无法获得预期的结果。我正在使用Python 2.7请指导我重塑这个Pandas数据帧。谢谢。

2 个解决方案

#1


4  

df = df.set_index(['ID', 'Year'])
df.columns = df.columns.str.split('_', expand=True).rename('month', level=0)
df = df.stack(0).reset_index()
md = dict(Jan='01', Feb='02', Mar='03')
df.month = df.month.map(md)


df[['ID', 'Year', 'month', 'salary', 'days']]

基于列名堆叠pandas数据帧

#2


1  

I love pd.melt so that's what I used in this long-winded approach:

我喜欢pd.melt,这就是我在这个冗长的方法中使用的:

ldf = pd.melt(df,id_vars=['ID','Year'],
              value_vars=['Jan_salary','Feb_salary','Mar_salary'],
              var_name='month',value_name='salary')
rdf = pd.melt(df,id_vars=['ID','Year'],
              value_vars=['Jan_days','Feb_days','Mar_days'],
              value_name='days')
rdf.drop(['ID','Year','variable'],inplace=True,axis=1)
cdf = pd.concat([ldf,rdf],axis=1)
cdf['month'] = cdf['month'].str.replace('_salary','')
import calendar
def mapper(month_abbr):
    # from http://*.com/a/3418092/42346
    d = {v: str(k).zfill(2) for k,v in enumerate(calendar.month_abbr)}
    return d[month_abbr]
cdf['month'] = cdf['month'].apply(mapper)

Result:

结果:

>>> cdf
   ID  Year month  salary  days
0   1  2016    01    4500    22
1   2  2016    01    3800    23
2   3  2016    01    5500    21
3   1  2016    02    4200    18
4   2  2016    02    3600    19
5   3  2016    02    5200    17
6   1  2016    03    4700    24
7   2  2016    03    4400    23
8   3  2016    03    5300    23

#1


4  

df = df.set_index(['ID', 'Year'])
df.columns = df.columns.str.split('_', expand=True).rename('month', level=0)
df = df.stack(0).reset_index()
md = dict(Jan='01', Feb='02', Mar='03')
df.month = df.month.map(md)


df[['ID', 'Year', 'month', 'salary', 'days']]

基于列名堆叠pandas数据帧

#2


1  

I love pd.melt so that's what I used in this long-winded approach:

我喜欢pd.melt,这就是我在这个冗长的方法中使用的:

ldf = pd.melt(df,id_vars=['ID','Year'],
              value_vars=['Jan_salary','Feb_salary','Mar_salary'],
              var_name='month',value_name='salary')
rdf = pd.melt(df,id_vars=['ID','Year'],
              value_vars=['Jan_days','Feb_days','Mar_days'],
              value_name='days')
rdf.drop(['ID','Year','variable'],inplace=True,axis=1)
cdf = pd.concat([ldf,rdf],axis=1)
cdf['month'] = cdf['month'].str.replace('_salary','')
import calendar
def mapper(month_abbr):
    # from http://*.com/a/3418092/42346
    d = {v: str(k).zfill(2) for k,v in enumerate(calendar.month_abbr)}
    return d[month_abbr]
cdf['month'] = cdf['month'].apply(mapper)

Result:

结果:

>>> cdf
   ID  Year month  salary  days
0   1  2016    01    4500    22
1   2  2016    01    3800    23
2   3  2016    01    5500    21
3   1  2016    02    4200    18
4   2  2016    02    3600    19
5   3  2016    02    5200    17
6   1  2016    03    4700    24
7   2  2016    03    4400    23
8   3  2016    03    5300    23