pandas 前后行操作

一、前后行满足条件

问题：

各位老师好，我有一个dataframe

产品数据1 数据2

A 1 2

B 4 5

C 6 3

我想找出比如这一行数据1>数据2 AND 数据1的上一行<数据2的上一行

例如上例子，6>3 AND 4<5 则输出产品C

应该怎么写

回答：

df = pa.DataFrame({'产品': ['A','B','C'],

                   '数据1': [1, 4, 6],

                   '数据2': [2, 5, 3]})

df[(df['数据1'].shift(1) < df['数据2'].shift(1)) & (df['数据1'].shift(0) > df['数据2'].shift(0))]['产品']

说明：

选择行的最快的方法不是遍历行。而是，创建一个mask（即，布尔数组），然后调用df[mask]选择。

这里有一个问题：如何动态表示dataframe中的当前行、前一行？答案是用shift。

shift(0):当前行

shift(1):前一行

shift(n):往前第n行

若要满足多个条件

逻辑与&：

mask = ((...) & (...))

逻辑或|：

mask = ((...) | (...))

逻辑非~:

mask = ~(...)

例如:

In [75]: df = pd.DataFrame({'A':range(5), 'B':range(10,20,2)})

In [76]: df

Out[76]:

   A   B

0  0  10

1  1  12

2  2  14

3  3  16

4  4  18

In [77]: mask = (df['A'].shift(1) + df['B'].shift(2) > 12)

In [78]: mask

Out[78]:

0    False

1    False

2    False

3     True

4     True

dtype: bool

In [79]: df[mask]

Out[79]:

   A   B

3  3  16

4  4  18

二、前后行构造数据

问题：

If I have the following dataframe:

date A B M S

20150101 8 7 7.5 0

20150101 10 9 9.5 -1

20150102 9 8 8.5 1

20150103 11 11 11 0

20150104 11 10 10.5 0

20150105 12 10 11 -1

...

If I want to create another column 'cost' by the following rules:
if S < 0, cost = (M-B).shift(1)*S

if S > 0, cost = (M-A).shift(1)*S

if S == 0, cost=0
currently, I am using the following function:

def cost(df):

if df[3]<0:

return np.roll((df[2]-df[1]),1)df[3]

elif df[3]>0:

return np.roll((df[2]-df[0]),1)df[3]

else:

return 0

df['cost']=df.apply(cost,axis=0)

Is there any other way to do it? can I somehow use pandas shift function in user defined functions? thanks.

答案：

import numpy as np

import pandas as pd

df = pd.DataFrame({'date': ['20150101','20150102','20150103','20150104','20150105','20150106'],

                   'A': [8,10,9,11,11,12],

                   'B': [7,9,8,11,10,10],

                   'M': [7.5,9.5,8.5,11,10.5,11],

                   'S': [0,-1,1,0,0,-1]})

df = df.reindex(columns=['date','A','B','M','S'])

# 方法一

df['cost'] = np.where(df['S'] < 0,

                      np.roll((df['M']-df['B']), 1)*df['S'],

                      np.where(df['S'] > 0,

                               np.roll((df['M']-df['A']), 1)*df['S'],

                               0)

                     )

# 方法二

M, A, B, S = [df[col] for col in 'MABS']

conditions = [S < 0, S > 0]

choices = [(M-B).shift(1)*S, (M-A).shift(1)*S]

df['cost2'] = np.select(conditions, choices, default=0)

print(df)

秒客网

pandas 前后行操作

一、前后行满足条件

二、前后行构造数据

相关文章