如何从pandas python中另一个数据帧的子集交集中减去一个数据帧?

时间:2022-01-23 04:47:29

I have a the following dataframes in python:

我在python中有以下数据帧:

dataframe 1

             1  2  3  4  5
dog   dog    0  1  1  0  1
      fox    1  0  0  0  0
      jumps  0  0  0  1  0
      over   1  0  1  0  1
      the    0  1  0  0  0
fox   dog    0  0  1  1  1
      fox    0  0  0  0  0
      jumps  0  0  1  0  1
      over   0  1  0  0  0
      the    0  0  0  1  1
jumps dog    0  0  0  0  0
      fox    0  1  0  1  1
      jumps  0  0  0  0  1
      over   1  0  1  0  0
      the    0  0  0  0  0
over  dog    0  0  1  0  0
      fox    0  1  0  1  1
      jumps  0  0  0  0  0
      over   0  1  0  1  0
      the    1  0  1  0  0
the   dog    0  0  1  0  0
      fox    0  0  0  0  1
      jumps  0  1  0  0  0
      over   0  0  1  1  0
      the    0  1  1  0  1

dataframe 2

             1  2  4  5
dog   dog    1  0  0  0
      fox    0  1  0  1
      jumps  0  1  1  0
      the    0  0  0  0
      horse  1  0  1  0
fox   dog    0  0  0  0
      fox    0  1  0  1
      over   0  0  0  0
      the    0  1  0  1
      cat    0  0  1  0

You can see that dataframe2 contains multiindexes of dataframe1 but it also contains additional multiindexes like horse and cat. Dataframe 2 also doesn't contain all the columns of dataframe 1 as you can see it misses column 3.

您可以看到dataframe2包含dataframe1的多索引,但它还包含其他多索引,如马和猫。 Dataframe 2也不包含数据帧1的所有列,因为您可以看到它错过了第3列。

I want to subtract dataframe 2 from dataframe 1 in such a way that the function only subtracts the data which is common in both and ignores the rest and the resulting dataframe is in shape of dataframe 2.

我想从数据帧1中减去数据帧2,使得函数只减去两者中常见的数据并忽略其余数据,结果数据帧的形状为数据帧2。

Does any know if pandas provides a builtin way of doing this or do I need to construct a function myself. If so, can you point me in the right direction? Any suggestions are highly appreciated. Thank you.

有没有人知道pandas是否提供了内置的方法,或者我是否需要自己构建一个函数。如果是这样,你能指出我正确的方向吗?任何建议都非常感谢。谢谢。

NOTE: This question is similar to another question I posted here apart from the fact that I am not wanting to compare these, instead wanting to do an arithmetic operation of subtraction.

注意:这个问题类似于我在这里发布的另一个问题,除了我不想比较这些问题,而是想要进行减法的算术运算。

4 个解决方案

#1


4  

I believe you simply want something like:

我相信你只想要这样的东西:

In [23]: (df2 - df1.drop('3', axis=1)).fillna(df2).dropna()
Out[23]:
             1    2    4    5
dog dog    1.0 -1.0  0.0 -1.0
    fox   -1.0  1.0  0.0  1.0
    horse  1.0  0.0  1.0  0.0
    jumps  0.0  1.0  0.0  0.0
    the    0.0 -1.0  0.0  0.0
fox cat    0.0  0.0  1.0  0.0
    dog    0.0  0.0 -1.0 -1.0
    fox    0.0  1.0  0.0  1.0
    over   0.0 -1.0  0.0  0.0
    the    0.0  1.0 -1.0  0.0

Pandas already automatically aligns on the index, that's part of it's magic, but you just have to fill/drop nans intelligently.

Pandas已经自动对齐索引,这是它的神奇之处,但你必须聪明地填充/删除nans。

Edit

Whoops, you actually want df1 - df2, but with the shape of df2, a little bit more tricky since then fillna(df1) would prevent us from dropping the right rows, however, you can just use multiply by -1!

哎呀,你真的想要df1 - df2,但是df2的形状,有点棘手,因为那时fillna(df1)会阻止我们放弃正确的行,但是,你可以使用乘以-1!

In [25]: (df2 - df1.drop('3', axis=1)).fillna(df2).dropna() * -1
Out[25]:
             1    2    4    5
dog dog   -1.0  1.0 -0.0  1.0
    fox    1.0 -1.0 -0.0 -1.0
    horse -1.0 -0.0 -1.0 -0.0
    jumps -0.0 -1.0 -0.0 -0.0
    the   -0.0  1.0 -0.0 -0.0
fox cat   -0.0 -0.0 -1.0 -0.0
    dog   -0.0 -0.0  1.0  1.0
    fox   -0.0 -1.0 -0.0 -1.0
    over  -0.0  1.0 -0.0 -0.0
    the   -0.0 -1.0  1.0 -0.0

Or, if those negative zeros bother you:

或者,如果那些负面零点打扰你:

In [31]: (-df2 + df1.drop('3', axis=1)).fillna(-df2).dropna()
Out[31]:
             1    2    4    5
dog dog   -1.0  1.0  0.0  1.0
    fox    1.0 -1.0  0.0 -1.0
    horse -1.0  0.0 -1.0  0.0
    jumps  0.0 -1.0  0.0  0.0
    the    0.0  1.0  0.0  0.0
fox cat    0.0  0.0 -1.0  0.0
    dog    0.0  0.0  1.0  1.0
    fox    0.0 -1.0  0.0 -1.0
    over   0.0  1.0  0.0  0.0
    the    0.0 -1.0  1.0  0.0

#2


4  

IIUC:

IIUC:

In [24]: r = d1.sub(d2, axis=0)

In [25]: r.loc[r.index.intersection(d2.index)]
Out[25]:
             1    2   3    4    5
dog dog   -1.0  1.0 NaN  0.0  1.0
    fox    1.0 -1.0 NaN  0.0 -1.0
    horse  NaN  NaN NaN  NaN  NaN
    jumps  0.0 -1.0 NaN  0.0  0.0
    the    0.0  1.0 NaN  0.0  0.0
fox cat    NaN  NaN NaN  NaN  NaN
    dog    0.0  0.0 NaN  1.0  1.0
    fox    0.0 -1.0 NaN  0.0 -1.0
    over   0.0  1.0 NaN  0.0  0.0
    the    0.0 -1.0 NaN  1.0  0.0

#3


3  

Let us do some thing like

让我们做一些事情

id=df2.index.values.tolist()
dd=df1.loc[list(set(df1.index.values.tolist())&set(id))]
(df2-dd).combine_first(df2).dropna(1)

             1    2    4    5
dog dog    1.0 -1.0  0.0 -1.0
    fox   -1.0  1.0  0.0  1.0
    horse  1.0  0.0  1.0  0.0
    jumps  0.0  1.0  0.0  0.0
    the    0.0 -1.0  0.0  0.0
fox cat    0.0  0.0  1.0  0.0
    dog    0.0  0.0 -1.0 -1.0
    fox    0.0  1.0  0.0  1.0
    over   0.0 -1.0  0.0  0.0
    the    0.0  1.0 -1.0  0.0

#4


2  

Use pd.DataFrame.align with the parameter 'inner' to reduce both dataframes to only the common indices. Then pass results to pd.DataFrame.sub

使用带有参数'inner'的pd.DataFrame.align将两个数据帧都减少为只有公共索引。然后将结果传递给pd.DataFrame.sub

pd.DataFrame.sub(*df1.align(df2, 'inner'))

           1  2  4  5
dog dog   -1  1  0  1
    fox    1 -1  0 -1
    jumps  0 -1  0  0
    the    0  1  0  0
fox dog    0  0  1  1
    fox    0 -1  0 -1
    over   0  1  0  0
    the    0 -1  1  0

Written in two lines

写成两行

a, b = df1.align(df2, 'inner')
a - b

#1


4  

I believe you simply want something like:

我相信你只想要这样的东西:

In [23]: (df2 - df1.drop('3', axis=1)).fillna(df2).dropna()
Out[23]:
             1    2    4    5
dog dog    1.0 -1.0  0.0 -1.0
    fox   -1.0  1.0  0.0  1.0
    horse  1.0  0.0  1.0  0.0
    jumps  0.0  1.0  0.0  0.0
    the    0.0 -1.0  0.0  0.0
fox cat    0.0  0.0  1.0  0.0
    dog    0.0  0.0 -1.0 -1.0
    fox    0.0  1.0  0.0  1.0
    over   0.0 -1.0  0.0  0.0
    the    0.0  1.0 -1.0  0.0

Pandas already automatically aligns on the index, that's part of it's magic, but you just have to fill/drop nans intelligently.

Pandas已经自动对齐索引,这是它的神奇之处,但你必须聪明地填充/删除nans。

Edit

Whoops, you actually want df1 - df2, but with the shape of df2, a little bit more tricky since then fillna(df1) would prevent us from dropping the right rows, however, you can just use multiply by -1!

哎呀,你真的想要df1 - df2,但是df2的形状,有点棘手,因为那时fillna(df1)会阻止我们放弃正确的行,但是,你可以使用乘以-1!

In [25]: (df2 - df1.drop('3', axis=1)).fillna(df2).dropna() * -1
Out[25]:
             1    2    4    5
dog dog   -1.0  1.0 -0.0  1.0
    fox    1.0 -1.0 -0.0 -1.0
    horse -1.0 -0.0 -1.0 -0.0
    jumps -0.0 -1.0 -0.0 -0.0
    the   -0.0  1.0 -0.0 -0.0
fox cat   -0.0 -0.0 -1.0 -0.0
    dog   -0.0 -0.0  1.0  1.0
    fox   -0.0 -1.0 -0.0 -1.0
    over  -0.0  1.0 -0.0 -0.0
    the   -0.0 -1.0  1.0 -0.0

Or, if those negative zeros bother you:

或者,如果那些负面零点打扰你:

In [31]: (-df2 + df1.drop('3', axis=1)).fillna(-df2).dropna()
Out[31]:
             1    2    4    5
dog dog   -1.0  1.0  0.0  1.0
    fox    1.0 -1.0  0.0 -1.0
    horse -1.0  0.0 -1.0  0.0
    jumps  0.0 -1.0  0.0  0.0
    the    0.0  1.0  0.0  0.0
fox cat    0.0  0.0 -1.0  0.0
    dog    0.0  0.0  1.0  1.0
    fox    0.0 -1.0  0.0 -1.0
    over   0.0  1.0  0.0  0.0
    the    0.0 -1.0  1.0  0.0

#2


4  

IIUC:

IIUC:

In [24]: r = d1.sub(d2, axis=0)

In [25]: r.loc[r.index.intersection(d2.index)]
Out[25]:
             1    2   3    4    5
dog dog   -1.0  1.0 NaN  0.0  1.0
    fox    1.0 -1.0 NaN  0.0 -1.0
    horse  NaN  NaN NaN  NaN  NaN
    jumps  0.0 -1.0 NaN  0.0  0.0
    the    0.0  1.0 NaN  0.0  0.0
fox cat    NaN  NaN NaN  NaN  NaN
    dog    0.0  0.0 NaN  1.0  1.0
    fox    0.0 -1.0 NaN  0.0 -1.0
    over   0.0  1.0 NaN  0.0  0.0
    the    0.0 -1.0 NaN  1.0  0.0

#3


3  

Let us do some thing like

让我们做一些事情

id=df2.index.values.tolist()
dd=df1.loc[list(set(df1.index.values.tolist())&set(id))]
(df2-dd).combine_first(df2).dropna(1)

             1    2    4    5
dog dog    1.0 -1.0  0.0 -1.0
    fox   -1.0  1.0  0.0  1.0
    horse  1.0  0.0  1.0  0.0
    jumps  0.0  1.0  0.0  0.0
    the    0.0 -1.0  0.0  0.0
fox cat    0.0  0.0  1.0  0.0
    dog    0.0  0.0 -1.0 -1.0
    fox    0.0  1.0  0.0  1.0
    over   0.0 -1.0  0.0  0.0
    the    0.0  1.0 -1.0  0.0

#4


2  

Use pd.DataFrame.align with the parameter 'inner' to reduce both dataframes to only the common indices. Then pass results to pd.DataFrame.sub

使用带有参数'inner'的pd.DataFrame.align将两个数据帧都减少为只有公共索引。然后将结果传递给pd.DataFrame.sub

pd.DataFrame.sub(*df1.align(df2, 'inner'))

           1  2  4  5
dog dog   -1  1  0  1
    fox    1 -1  0 -1
    jumps  0 -1  0  0
    the    0  1  0  0
fox dog    0  0  1  1
    fox    0 -1  0 -1
    over   0  1  0  0
    the    0 -1  1  0

Written in two lines

写成两行

a, b = df1.align(df2, 'inner')
a - b