python正则表达式 - 如何获取一行中的所有名称?

时间:2021-08-27 23:50:57

How do i get the names from the line like below, using regex ??

如何使用正则表达式从下面的行中获取名称?

line #1==> 
Elector's Name: Surpam Badurubai Elector's Name: Madavimaru Elector's Name: Madavitannubai 

line #2==>
Elector's Name: GEDAM KARNU Elector's Name: GEDAM BHEEM BAI Elector's Name: Surpam Rajeshwar Rav

I've tried

regex = "\s*Elector\'s\sName\:\s([[a-zA-z]*\s[a-zA-z]*\s*[a-zA-z]*]*)\s" 
re.findall(regex, line)

It was working for line 1 but is not able to fetch the last name. For line 2, it only fetched 'Surpam Rajeshwar' from the last name but it actually has 3 words in it.

它适用于第1行,但无法获取姓氏。对于第2行,它只从姓氏中获取了“Surpam Rajeshwar”,但它实际上有3个单词。

I Appreciate, if someone could help me with this or suggest me a different way to get the names. !!

我很感激,如果有人可以帮我这个或者建议我采用不同的方式获取名字。 !

4 个解决方案

#1


4  

You may do that without a regex by splitting with Elector's Name:, stripping the resulting items from whitespace and dropping all empty items:

您可以在没有正则表达式的情况下通过拆分Elector的名称来执行此操作:从空白处剥离结果项并删除所有空项:

ss = ["Elector's Name: Surpam Badurubai Elector's Name: Madavimaru Elector's Name: Madavitannubai",
   "Elector's Name: GEDAM KARNU Elector's Name: GEDAM BHEEM BAI Elector's Name: Surpam Rajeshwar Rav"]
for s in ss:
    print(filter(None, [x.strip() for x in s.split("Elector's Name:")]))

See a Python demo, output:

查看Python演示,输出:

['Surpam Badurubai', 'Madavimaru', 'Madavitannubai']
['GEDAM KARNU', 'GEDAM BHEEM BAI', 'Surpam Rajeshwar Rav']

Just in case you want to study regex, here is a possible regex based solution:

如果你想学习正则表达式,这里有一个可能的正则表达式解决方案:

re.findall(r"Elector's Name:\s*(.*?)(?=\s*Elector's Name:|$)", s) 

See another Python demo

看另一个Python演示

Pattern details

  • Elector's Name: - a literal substring
  • 选民姓名: - 文字子串

  • \s* - 0+ whitespaces
  • \ s * - 0+空格

  • (.*?) - Group 1 (this value is returned by re.findall): any 0+ chars other than line break chars (with re.DOTALL, including them) as few as possible
  • (。*?) - 组1(此值由re.findall返回):除了换行符之外的任何0+字符(使用re.DOTALL,包括它们)尽可能少

  • (?=\s*Elector's Name:|$) - a positive lookahead that requires 0+ whitespaces and Elector's Name: after them or the end of string ($) immediately to the right of the current location.
  • (?= \ s *选举人的姓名:| $) - 一个积极的先行者,需要0+空格和选民姓名:在他们之后或者当前位置右边的字符串($)的结尾。

#2


1  

Looks that it's more a job for re.split according on the "Elector's Name: " text (with optional spaces before or after), chained in a list comprehension to filter out empty fields:

根据“选民姓名:”文本(前后可选空格)看起来更像re.split的工作,链接在列表理解中以过滤掉空字段:

[x for x in re.split("\s*Elector's Name:\s*",l1) if x]

with your examples I get those outputs:

用你的例子我得到那些输出:

['GEDAM KARNU', 'GEDAM BHEEM BAI', 'Surpam Rajeshwar Rav']
['Surpam Badurubai', 'Madavimaru', 'Madavitannubai']

note that you can achieve this using str.split() chained to str.split() as well:

请注意,您可以使用链接到str.split()的str.split()来实现此目的:

[x.strip() for x in l1.split("Elector's Name:") if x]

#3


1  

If you need only to get all names maybe try .split() with delimiter Elector's Name:. Like :

如果您只需要获取所有名称,可以尝试.split()并使用分隔符Elector的名称:喜欢 :

names = line.split('Elector's Name:')
for i in names:
    print(i)

#4


0  

Jamie Zawinski:

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

有些人在遇到问题时会想“我知道,我会使用正则表达式”。现在他们有两个问题。

So, using python

所以,使用python

line = "Elector's Name: Surpam Badurubai Elector's Name: Madavimaru Elector's Name: Madavitannubai"
[name.strip() for name in line.split("Elector's Name:") if name != '']

#1


4  

You may do that without a regex by splitting with Elector's Name:, stripping the resulting items from whitespace and dropping all empty items:

您可以在没有正则表达式的情况下通过拆分Elector的名称来执行此操作:从空白处剥离结果项并删除所有空项:

ss = ["Elector's Name: Surpam Badurubai Elector's Name: Madavimaru Elector's Name: Madavitannubai",
   "Elector's Name: GEDAM KARNU Elector's Name: GEDAM BHEEM BAI Elector's Name: Surpam Rajeshwar Rav"]
for s in ss:
    print(filter(None, [x.strip() for x in s.split("Elector's Name:")]))

See a Python demo, output:

查看Python演示,输出:

['Surpam Badurubai', 'Madavimaru', 'Madavitannubai']
['GEDAM KARNU', 'GEDAM BHEEM BAI', 'Surpam Rajeshwar Rav']

Just in case you want to study regex, here is a possible regex based solution:

如果你想学习正则表达式,这里有一个可能的正则表达式解决方案:

re.findall(r"Elector's Name:\s*(.*?)(?=\s*Elector's Name:|$)", s) 

See another Python demo

看另一个Python演示

Pattern details

  • Elector's Name: - a literal substring
  • 选民姓名: - 文字子串

  • \s* - 0+ whitespaces
  • \ s * - 0+空格

  • (.*?) - Group 1 (this value is returned by re.findall): any 0+ chars other than line break chars (with re.DOTALL, including them) as few as possible
  • (。*?) - 组1(此值由re.findall返回):除了换行符之外的任何0+字符(使用re.DOTALL,包括它们)尽可能少

  • (?=\s*Elector's Name:|$) - a positive lookahead that requires 0+ whitespaces and Elector's Name: after them or the end of string ($) immediately to the right of the current location.
  • (?= \ s *选举人的姓名:| $) - 一个积极的先行者,需要0+空格和选民姓名:在他们之后或者当前位置右边的字符串($)的结尾。

#2


1  

Looks that it's more a job for re.split according on the "Elector's Name: " text (with optional spaces before or after), chained in a list comprehension to filter out empty fields:

根据“选民姓名:”文本(前后可选空格)看起来更像re.split的工作,链接在列表理解中以过滤掉空字段:

[x for x in re.split("\s*Elector's Name:\s*",l1) if x]

with your examples I get those outputs:

用你的例子我得到那些输出:

['GEDAM KARNU', 'GEDAM BHEEM BAI', 'Surpam Rajeshwar Rav']
['Surpam Badurubai', 'Madavimaru', 'Madavitannubai']

note that you can achieve this using str.split() chained to str.split() as well:

请注意,您可以使用链接到str.split()的str.split()来实现此目的:

[x.strip() for x in l1.split("Elector's Name:") if x]

#3


1  

If you need only to get all names maybe try .split() with delimiter Elector's Name:. Like :

如果您只需要获取所有名称,可以尝试.split()并使用分隔符Elector的名称:喜欢 :

names = line.split('Elector's Name:')
for i in names:
    print(i)

#4


0  

Jamie Zawinski:

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

有些人在遇到问题时会想“我知道,我会使用正则表达式”。现在他们有两个问题。

So, using python

所以,使用python

line = "Elector's Name: Surpam Badurubai Elector's Name: Madavimaru Elector's Name: Madavitannubai"
[name.strip() for name in line.split("Elector's Name:") if name != '']