I feel like this should be really simple but I'm having a hard time with it. Suppose I have this:
我觉得这应该很简单,但我很难用它。假设我有这个:
df1:
ticker hhmm <--- The hhmm value corresponds to the column in df2
====== ====
AAPL 0931
IBM 0930
XRX 1559
df2:
ticker 0930 0931 0932 ... 1559 <<---- 390 columns
====== ==== ==== ==== ... ====
AAPL 4.56 4.57 ... ...
IBM 7.98 ... ... ...
XRX 3.33 ... ... 3.78
The goal is to create a new column in df1 whose value is df2[df1['hhmm']].
目标是在df1中创建一个新列,其值为df2 [df1 ['hhmm']]。
For example:
df1:
ticker hhmm df2val
====== ==== ======
AAPL 0931 4.57
IBM 0930 7.98
XRX 1559 3.78
Both df's have 'ticker' as their index, so I could simply join them BUT assume that this uses too much memory (the dataframes I'm using are much larger than the examples shown here).
两个df都有'ticker'作为它们的索引,所以我可以简单地加入它们但是假设它使用了太多的内存(我使用的数据帧比这里显示的例子大得多)。
I've tried apply and it's slooooow (15 minutes to run).
我已经尝试过申请,这是懒惰(运行15分钟)。
What's the Pandas Way to do this? Thanks!
什么是熊猫的方法呢?谢谢!
3 个解决方案
#1
1
Here's a minimal example of what you are trying to do. Hope this gives you enough hint:
这是您尝试做的最小例子。希望这会给你足够的提示:
# sample data
df1 = pd.DataFrame({'ticker':['AAPL','IBM','XRX'], 'hhmm':['0931','0930','1559']})
df2 = pd.DataFrame({'ticker':['AAPL','IBM','XRX'],
'0931': [2,2,3],
'0930': [5,6,7],
'1559': [8,7,6]})
# melt to match the format to join
df2 = pd.melt(df2, id_vars='ticker',var_name='hhmm',value_name='df2val')
# join to df1
df1.merge(df2, on=['ticker','hhmm'])
hhmm ticker df2val
0 0931 AAPL 2
1 0930 IBM 6
2 1559 XRX 6
#2
1
There is a function called lookup
有一个叫做查找的函数
df1['val']=df2.set_index('ticker').lookup(df1.ticker,df1.hhmm)
df1
Out[290]:
ticker hhmm val
0 AAPL 0931 4.57
1 IBM 0930 7.98
2 XRX 1559 33.00# I make up this number
#3
1
Try
df2.set_index('ticker').stack().loc[df1.apply(tuple, axis = 1)]
ticker
AAPL 931 4.57
IBM 930 7.98
XRX 1559 3.78
#1
1
Here's a minimal example of what you are trying to do. Hope this gives you enough hint:
这是您尝试做的最小例子。希望这会给你足够的提示:
# sample data
df1 = pd.DataFrame({'ticker':['AAPL','IBM','XRX'], 'hhmm':['0931','0930','1559']})
df2 = pd.DataFrame({'ticker':['AAPL','IBM','XRX'],
'0931': [2,2,3],
'0930': [5,6,7],
'1559': [8,7,6]})
# melt to match the format to join
df2 = pd.melt(df2, id_vars='ticker',var_name='hhmm',value_name='df2val')
# join to df1
df1.merge(df2, on=['ticker','hhmm'])
hhmm ticker df2val
0 0931 AAPL 2
1 0930 IBM 6
2 1559 XRX 6
#2
1
There is a function called lookup
有一个叫做查找的函数
df1['val']=df2.set_index('ticker').lookup(df1.ticker,df1.hhmm)
df1
Out[290]:
ticker hhmm val
0 AAPL 0931 4.57
1 IBM 0930 7.98
2 XRX 1559 33.00# I make up this number
#3
1
Try
df2.set_index('ticker').stack().loc[df1.apply(tuple, axis = 1)]
ticker
AAPL 931 4.57
IBM 930 7.98
XRX 1559 3.78