在Pandas中,如何使用一个表中的值作为索引从另一个表中提取数据?

时间:2021-08-17 20:09:38

I feel like this should be really simple but I'm having a hard time with it. Suppose I have this:

我觉得这应该很简单,但我很难用它。假设我有这个:

df1:

ticker  hhmm <--- The hhmm value corresponds to the column in df2
======  ====
AAPL    0931
IBM     0930
XRX     1559

df2:

ticker  0930  0931  0932 ... 1559   <<---- 390 columns
======  ====  ====  ==== ... ====
AAPL    4.56  4.57  ...      ...     
IBM     7.98  ...   ...      ...
XRX     3.33  ...   ...      3.78

The goal is to create a new column in df1 whose value is df2[df1['hhmm']].

目标是在df1中创建一个新列,其值为df2 [df1 ['hhmm']]。

For example:

df1:

ticker  hhmm  df2val
======  ====  ======
AAPL    0931    4.57
IBM     0930    7.98
XRX     1559    3.78

Both df's have 'ticker' as their index, so I could simply join them BUT assume that this uses too much memory (the dataframes I'm using are much larger than the examples shown here).

两个df都有'ticker'作为它们的索引,所以我可以简单地加入它们但是假设它使用了太多的内存(我使用的数据帧比这里显示的例子大得多)。

I've tried apply and it's slooooow (15 minutes to run).

我已经尝试过申请,这是懒惰(运行15分钟)。

What's the Pandas Way to do this? Thanks!

什么是熊猫的方法呢?谢谢!

3 个解决方案

#1


1  

Here's a minimal example of what you are trying to do. Hope this gives you enough hint:

这是您尝试做的最小例子。希望这会给你足够的提示:

# sample data
df1 = pd.DataFrame({'ticker':['AAPL','IBM','XRX'], 'hhmm':['0931','0930','1559']})

df2 = pd.DataFrame({'ticker':['AAPL','IBM','XRX'],
                    '0931': [2,2,3],
                    '0930': [5,6,7],
                    '1559': [8,7,6]})

# melt to match the format to join
df2 = pd.melt(df2, id_vars='ticker',var_name='hhmm',value_name='df2val')

# join to df1
df1.merge(df2, on=['ticker','hhmm'])

    hhmm    ticker  df2val
0   0931    AAPL    2
1   0930    IBM     6
2   1559    XRX     6

#2


1  

There is a function called lookup

有一个叫做查找的函数

df1['val']=df2.set_index('ticker').lookup(df1.ticker,df1.hhmm)
df1
Out[290]: 
  ticker  hhmm    val
0   AAPL  0931   4.57
1    IBM  0930   7.98
2    XRX  1559  33.00# I make up this number

#3


1  

Try

df2.set_index('ticker').stack().loc[df1.apply(tuple, axis = 1)]

ticker      
AAPL    931     4.57
IBM     930     7.98
XRX     1559    3.78

#1


1  

Here's a minimal example of what you are trying to do. Hope this gives you enough hint:

这是您尝试做的最小例子。希望这会给你足够的提示:

# sample data
df1 = pd.DataFrame({'ticker':['AAPL','IBM','XRX'], 'hhmm':['0931','0930','1559']})

df2 = pd.DataFrame({'ticker':['AAPL','IBM','XRX'],
                    '0931': [2,2,3],
                    '0930': [5,6,7],
                    '1559': [8,7,6]})

# melt to match the format to join
df2 = pd.melt(df2, id_vars='ticker',var_name='hhmm',value_name='df2val')

# join to df1
df1.merge(df2, on=['ticker','hhmm'])

    hhmm    ticker  df2val
0   0931    AAPL    2
1   0930    IBM     6
2   1559    XRX     6

#2


1  

There is a function called lookup

有一个叫做查找的函数

df1['val']=df2.set_index('ticker').lookup(df1.ticker,df1.hhmm)
df1
Out[290]: 
  ticker  hhmm    val
0   AAPL  0931   4.57
1    IBM  0930   7.98
2    XRX  1559  33.00# I make up this number

#3


1  

Try

df2.set_index('ticker').stack().loc[df1.apply(tuple, axis = 1)]

ticker      
AAPL    931     4.57
IBM     930     7.98
XRX     1559    3.78