如何在字符串之间选择值并使用python中的regex放置在数据框的列中

时间:2023-02-02 22:55:27

I have a large dataframe containing a column titled "Comment"

我有一个包含标题为“评论”的列的大型数据框

within the comment section I need to pull out 3 values and place into separate columns i.e. (Duty cycle, gas, and pressure)

在评论部分,我需要提取3个值并放入单独的列,即(占空比,气体和压力)

"Data collection START for Duty Cycle: 0, Gas: Vacuum Pressure: 0.000028 Torr"

“数据采集START占空比:0,气体:真空压力:0.000028 Torr”

Currently i am using .split and .tolist to parse the string ->

目前我使用.split和.tolist来解析字符串 - >

#split string and sort into columns 
df1 = pd.DataFrame(eventsDf.comment.str.split().tolist(),columns="0 0 0 0 0 0 dutyCycle 0 Gas 0 Pressure 0 ".split())

#join dataFrames
eventsDf = pd.concat([eventsDf, df1], axis=1)

#drop columns not needed
eventsDf.drop(['comment','0',],axis=1,inplace=True)

I found this method rather "hacky" in that in the event the structure of the comment section changes my code would be useless... can anyone show me a more effecient/robust way to go about doing this?? Thank you so much!

我发现这个方法相当“hacky”,因为如果注释部分的结构发生变化,我的代码就会变得无用......任何人都可以向我展示一种更有效/更强大的方法吗?非常感谢!

1 个解决方案

#1


2  

use str.extract with a regex.

将str.extract与正则表达式一起使用。

regex = r'Duty Cycle: (?P<Duty_Cycle>\d+), Gas: (?P<Gas>\w+) Pressure: (?P<Pressure>\S+) Torr'
df1 = eventsDf.comment.str.extract(regex, expand=True)
df1

如何在字符串之间选择值并使用python中的regex放置在数据框的列中

#1


2  

use str.extract with a regex.

将str.extract与正则表达式一起使用。

regex = r'Duty Cycle: (?P<Duty_Cycle>\d+), Gas: (?P<Gas>\w+) Pressure: (?P<Pressure>\S+) Torr'
df1 = eventsDf.comment.str.extract(regex, expand=True)
df1

如何在字符串之间选择值并使用python中的regex放置在数据框的列中