Python:Pandas Dataframe使用通配符在列中查找字符串并保留行

时间:2021-10-22 04:28:51

I have a pandas data frame. Below is a sample table.

我有一个熊猫数据框。下面是一个示例表。

Event   Text
A       something/AWAIT hello          
B       la de la
C       AWAITING SHIP
D       yes NO AWAIT 

I want to only keep rows that contain some form of the word AWAIT in the Text column. Below is my desired table:

我想只在文本列中保留包含某种形式的单词AWAIT的行。下面是我想要的表格:

Event   Text
A       something/AWAIT hello          
C       AWAITING SHIP
D       yes NO AWAIT 

Below is the code I tried to capture strings that contain AWAIT in all possible circumstances.

下面是我试图捕获在所有可能情况下包含AWAIT的字符串的代码。

df_STH001_2 = df_STH001[df_STH001['Text'].str.contains("?AWAIT?") == True]

The error I get is as follows:

我得到的错误如下:

error: nothing to repeat at position 0

1 个解决方案

#1


0  

Series.str.contains(pat, case=True, flags=0, na=nan, regex=True) per default treats pat as a RegEx.

Series.str.contains(pat,case = True,flags = 0,na = nan,regex = True)默认情况下将pat视为RegEx。

The question mark (?) makes the preceding token in the regular expression optional, hence the error message.

问号(?)使正则表达式中的前一个标记可选,因此出现错误消息。

In [178]: d[d['Text'].str.contains('AWAIT')]
Out[178]:
  Event                   Text
0     A  something/AWAIT hello
2     C          AWAITING SHIP
3     D           yes NO AWAIT

#1


0  

Series.str.contains(pat, case=True, flags=0, na=nan, regex=True) per default treats pat as a RegEx.

Series.str.contains(pat,case = True,flags = 0,na = nan,regex = True)默认情况下将pat视为RegEx。

The question mark (?) makes the preceding token in the regular expression optional, hence the error message.

问号(?)使正则表达式中的前一个标记可选,因此出现错误消息。

In [178]: d[d['Text'].str.contains('AWAIT')]
Out[178]:
  Event                   Text
0     A  something/AWAIT hello
2     C          AWAITING SHIP
3     D           yes NO AWAIT