I have an indexed pandas dataframe. By searching through its index, I find a row of interest. How do I find out the iloc of this row?
我有一个索引熊猫数据档案。通过搜索它的索引,我发现了一行有趣的东西。如何求出这一行的iloc ?
Example:
例子:
dates = pd.date_range('1/1/2000', periods=8)
df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=['A', 'B', 'C', 'D'])
df
A B C D
2000-01-01 -0.077564 0.310565 1.112333 1.023472
2000-01-02 -0.377221 -0.303613 -1.593735 1.354357
2000-01-03 1.023574 -0.139773 0.736999 1.417595
2000-01-04 -0.191934 0.319612 0.606402 0.392500
2000-01-05 -0.281087 -0.273864 0.154266 0.374022
2000-01-06 -1.953963 1.429507 1.730493 0.109981
2000-01-07 0.894756 -0.315175 -0.028260 -1.232693
2000-01-08 -0.032872 -0.237807 0.705088 0.978011
window_stop_row = df[df.index < '2000-01-04'].iloc[-1]
window_stop_row
Timestamp('2000-01-08 00:00:00', offset='D')
#which is the iloc of window_stop_row?
4 个解决方案
#1
13
You want the .name
attribute and pass this to get_loc
:
需要.name属性并将其传递给get_loc:
In [131]:
dates = pd.date_range('1/1/2000', periods=8)
df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=['A', 'B', 'C', 'D'])
df
Out[131]:
A B C D
2000-01-01 0.095234 -1.000863 0.899732 -1.742152
2000-01-02 -0.517544 -1.274137 1.734024 -1.369487
2000-01-03 0.134112 1.964386 -0.120282 0.573676
2000-01-04 -0.737499 -0.581444 0.528500 -0.737697
2000-01-05 -1.777800 0.795093 0.120681 0.524045
2000-01-06 -0.048432 -0.751365 -0.760417 -0.181658
2000-01-07 -0.570800 0.248608 -1.428998 -0.662014
2000-01-08 -0.147326 0.717392 3.138620 1.208639
In [133]:
window_stop_row = df[df.index < '2000-01-04'].iloc[-1]
window_stop_row.name
Out[133]:
Timestamp('2000-01-03 00:00:00', offset='D')
In [134]:
df.index.get_loc(window_stop_row.name)
Out[134]:
2
get_loc
returns the ordinal position of the label in your index which is what you want:
get_loc返回标签在索引中的序号位置,这正是您想要的:
In [135]:
df.iloc[df.index.get_loc(window_stop_row.name)]
Out[135]:
A 0.134112
B 1.964386
C -0.120282
D 0.573676
Name: 2000-01-03 00:00:00, dtype: float64
if you just want to search the index then so long as it is sorted then you can use searchsorted
:
如果你只是想搜索索引,那么只要它被排序,那么你可以使用搜索排序:
In [142]:
df.index.searchsorted('2000-01-04') - 1
Out[142]:
2
#2
1
You could try looping through each row in the dataframe:
您可以尝试遍历dataframe中的每一行:
for row_number,row in dataframe.iterrows():
if row['column_header'] == YourValue:
print row_number
This will give you the row with which to use the iloc function
这将给出使用iloc函数的行
#3
1
IIUC you could call index for your case:
你可以为你的案例调用索引:
In [53]: df[df.index < '2000-01-04'].index[-1]
Out[53]: Timestamp('2000-01-03 00:00:00', offset='D')
EDIT
编辑
I think @EdChums answer is what you want. Alternatively you could filter your dataframe with values which you get, then use all
to find the row with that values and then pass it to the index
:
我认为@EdChums回答是你想要的。或者你也可以用你得到的值来过滤你的dataframe,然后使用所有的值来查找具有该值的行,然后将其传递给索引:
In [67]: df == window_stop_row
Out[67]:
A B C D
2000-01-01 False False False False
2000-01-02 False False False False
2000-01-03 True True True True
2000-01-04 False False False False
2000-01-05 False False False False
2000-01-06 False False False False
2000-01-07 False False False False
2000-01-08 False False False False
In [68]: (df == window_stop_row).all(axis=1)
Out[68]:
2000-01-01 False
2000-01-02 False
2000-01-03 True
2000-01-04 False
2000-01-05 False
2000-01-06 False
2000-01-07 False
2000-01-08 False
Freq: D, dtype: bool
In [69]: df.index[(df == window_stop_row).all(axis=1)]
Out[69]: DatetimeIndex(['2000-01-03'], dtype='datetime64[ns]', freq='D')
#4
0
While pandas.Index.get_loc()
will only work if you have a single key, the following paradigm will also work getting the iloc
of multiple elements:
虽然pandas.Index.get_loc()只在只有一个键的情况下才会生效,但是下面的范例也将在获取多个元素的iloc方面发挥作用:
np.argwhere(condition).flatten() # array of all iloc where condition is True
In your case, picking the latest element where df.index < '2000-01-04'
:
在你的例子中,选择df的最新元素。指数<“2000-01-04”:
np.argwhere(df.index < '2000-01-04').flatten()[-1] # returns 2
#1
13
You want the .name
attribute and pass this to get_loc
:
需要.name属性并将其传递给get_loc:
In [131]:
dates = pd.date_range('1/1/2000', periods=8)
df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=['A', 'B', 'C', 'D'])
df
Out[131]:
A B C D
2000-01-01 0.095234 -1.000863 0.899732 -1.742152
2000-01-02 -0.517544 -1.274137 1.734024 -1.369487
2000-01-03 0.134112 1.964386 -0.120282 0.573676
2000-01-04 -0.737499 -0.581444 0.528500 -0.737697
2000-01-05 -1.777800 0.795093 0.120681 0.524045
2000-01-06 -0.048432 -0.751365 -0.760417 -0.181658
2000-01-07 -0.570800 0.248608 -1.428998 -0.662014
2000-01-08 -0.147326 0.717392 3.138620 1.208639
In [133]:
window_stop_row = df[df.index < '2000-01-04'].iloc[-1]
window_stop_row.name
Out[133]:
Timestamp('2000-01-03 00:00:00', offset='D')
In [134]:
df.index.get_loc(window_stop_row.name)
Out[134]:
2
get_loc
returns the ordinal position of the label in your index which is what you want:
get_loc返回标签在索引中的序号位置,这正是您想要的:
In [135]:
df.iloc[df.index.get_loc(window_stop_row.name)]
Out[135]:
A 0.134112
B 1.964386
C -0.120282
D 0.573676
Name: 2000-01-03 00:00:00, dtype: float64
if you just want to search the index then so long as it is sorted then you can use searchsorted
:
如果你只是想搜索索引,那么只要它被排序,那么你可以使用搜索排序:
In [142]:
df.index.searchsorted('2000-01-04') - 1
Out[142]:
2
#2
1
You could try looping through each row in the dataframe:
您可以尝试遍历dataframe中的每一行:
for row_number,row in dataframe.iterrows():
if row['column_header'] == YourValue:
print row_number
This will give you the row with which to use the iloc function
这将给出使用iloc函数的行
#3
1
IIUC you could call index for your case:
你可以为你的案例调用索引:
In [53]: df[df.index < '2000-01-04'].index[-1]
Out[53]: Timestamp('2000-01-03 00:00:00', offset='D')
EDIT
编辑
I think @EdChums answer is what you want. Alternatively you could filter your dataframe with values which you get, then use all
to find the row with that values and then pass it to the index
:
我认为@EdChums回答是你想要的。或者你也可以用你得到的值来过滤你的dataframe,然后使用所有的值来查找具有该值的行,然后将其传递给索引:
In [67]: df == window_stop_row
Out[67]:
A B C D
2000-01-01 False False False False
2000-01-02 False False False False
2000-01-03 True True True True
2000-01-04 False False False False
2000-01-05 False False False False
2000-01-06 False False False False
2000-01-07 False False False False
2000-01-08 False False False False
In [68]: (df == window_stop_row).all(axis=1)
Out[68]:
2000-01-01 False
2000-01-02 False
2000-01-03 True
2000-01-04 False
2000-01-05 False
2000-01-06 False
2000-01-07 False
2000-01-08 False
Freq: D, dtype: bool
In [69]: df.index[(df == window_stop_row).all(axis=1)]
Out[69]: DatetimeIndex(['2000-01-03'], dtype='datetime64[ns]', freq='D')
#4
0
While pandas.Index.get_loc()
will only work if you have a single key, the following paradigm will also work getting the iloc
of multiple elements:
虽然pandas.Index.get_loc()只在只有一个键的情况下才会生效,但是下面的范例也将在获取多个元素的iloc方面发挥作用:
np.argwhere(condition).flatten() # array of all iloc where condition is True
In your case, picking the latest element where df.index < '2000-01-04'
:
在你的例子中,选择df的最新元素。指数<“2000-01-04”:
np.argwhere(df.index < '2000-01-04').flatten()[-1] # returns 2