如何连接具有不同大小的类似索引的两个pandas DataFrame

时间:2022-08-26 18:42:59

I want to combine two dataframes where individual indices exist in a sorted manner, but show up a different number of times in the dataframes that I want to combine.

我想组合两个数据帧,其中各个索引以排序的方式存在,但在我想要组合的数据帧中显示不同的次数。

frame1 = pd.DataFrame([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], index=['A','B','B','C','C','C','D','E','E','F'])
frame2 = pd.DataFrame([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], index=['A', 'A', 'B', 'C', 'C', 'D', 'D', 'E', 'F', 'F'])
frame1.columns =['Hi']
frame2.columns =['Bye']

frame1
Out[160]: 
   Hi
A   1
B   2
B   3
C   4
C   5
C   6
D   7
E   8
E   9
F  10

frame2
Out[161]: 
   Bye
A    1
A    2
B    3
C    4
C    5
D    6
D    7
E    8
F    9
F   10

Desired output:

    Bye    Hi
A   1.0   1.0
A   2.0   NaN
B   3.0   2.0
B   NaN   3.0
C   4.0   4.0
C   5.0   5.0
C   NaN   6.0
D   6.0   7.0
D   7.0   NaN
E   8.0   8.0
E   NaN   9.0
F   9.0  10.0
F  10.0   NaN

Can't seem to find any right combinations of concat or join to do this. Is there any way?

似乎无法找到任何正确的concat或join组合来执行此操作。有什么办法吗?

1 个解决方案

#1


3  

Ok, let us build a new key here by using comcount

好吧,让我们使用comcount在这里构建一个新密钥

s1=frame1.set_index(frame1.groupby(level=0).cumcount(),append=True)  
s2=frame2.set_index(frame2.groupby(level=0).cumcount(),append=True)

pd.concat([s2,s1],1).reset_index(level=1,drop=True)
Out[364]: 
    Bye    Hi
A   1.0   1.0
A   2.0   NaN
B   3.0   2.0
B   NaN   3.0
C   4.0   4.0
C   5.0   5.0
C   NaN   6.0
D   6.0   7.0
D   7.0   NaN
E   8.0   8.0
E   NaN   9.0
F   9.0  10.0
F  10.0   NaN

From piR (great solution with self-define func)

来自piR(具有自定义功能的强大解决方案)

def add_cumcount_level(df):
    return df.set_index(df.groupby(level=0).cumcount(), append=True)

pd.concat(map(add_cumcount_level, [frame1, frame2]), axis=1)

#1


3  

Ok, let us build a new key here by using comcount

好吧,让我们使用comcount在这里构建一个新密钥

s1=frame1.set_index(frame1.groupby(level=0).cumcount(),append=True)  
s2=frame2.set_index(frame2.groupby(level=0).cumcount(),append=True)

pd.concat([s2,s1],1).reset_index(level=1,drop=True)
Out[364]: 
    Bye    Hi
A   1.0   1.0
A   2.0   NaN
B   3.0   2.0
B   NaN   3.0
C   4.0   4.0
C   5.0   5.0
C   NaN   6.0
D   6.0   7.0
D   7.0   NaN
E   8.0   8.0
E   NaN   9.0
F   9.0  10.0
F  10.0   NaN

From piR (great solution with self-define func)

来自piR(具有自定义功能的强大解决方案)

def add_cumcount_level(df):
    return df.set_index(df.groupby(level=0).cumcount(), append=True)

pd.concat(map(add_cumcount_level, [frame1, frame2]), axis=1)