从字符串中提取所有邮件,并使用regex[复制]

时间:2022-09-13 11:29:01

This question already has an answer here:

这个问题已经有了答案:

I want to extract all emails from a string. In this, I would expect a tuple

我想从字符串中提取所有电子邮件。在这里,我期望有一个元组。

(hello@gmail.com, aaaa@yahoo.com, no@yes.de, why@hotmail.com)

However, I am only getting back the first email from my function

但是,我只收到了我的第一封邮件

(hello@gmail.com)

What's going on?

这是怎么呢

import re

def getEmails(str):
    regex = r'([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]+){0,}'
    obj = re.search(regex, str, re.M|re.I)
    return obj.groups()

str = "hello@gmail.com;aaaa@yahoo.com no@yes.de, why@hotmail.com"
print(getEmails(str))

2 个解决方案

#1


2  

re.search() is not the function you want in this case. Instead, you should use re.findall():

在本例中,re.search()不是您想要的函数。相反,您应该使用re.findall():

import re

def getEmails(str):
    regex = r'([\w0-9._-]+@[\w0-9._-]+\.[\w0-9_-]+)'
    return re.findall(regex, str, re.M|re.I)

str = "hello@gmail.com;aaaa@yahoo.com no@yes.de, why@hotmail.com"
print(getEmails(str))

Try it online!

试一试在线!

I've replaced [A-Za-z] with [\w], since it makes the regex easier to read.I've also removed the flags, since they aren't going to be necessary for this particular regex. Most importantly, I've removed the {0,} quantifier from the regex, since you only want one email per iterable item, not consecutive emails or empty items.

我已经用[\w]替换[A-Za-z],因为它使regex更容易阅读。我还删除了这些标志,因为它们对于这个特定的regex来说不是必需的。最重要的是,我已经从regex中删除了{0,}量词,因为每个可迭代项只需要一个电子邮件,而不是连续的电子邮件或空项目。

It currently returns a list; if you want it to be a tuple, you can change the return statement to be return tuple(re.findall(regex, str, re.M|re.I))

它当前返回一个列表;如果希望它是一个元组,可以将return语句更改为return tuple(re)。findall(正则表达式、str re.M | re.I))


I will also note in closing that email address validation can be complicated.

我还将在结束语中指出,电子邮件地址验证可能非常复杂。

#2


0  

@jchi2241, you can also use re.finditer() to solve your problem.

@jchi2241,您还可以使用re.finditer()来解决您的问题。

Here is the code (with a little change in your code):

这是代码(你的代码有一点变化):

Try it online at: http://rextester.com/BST18087

在http://rextester.com/BST18087上试试

def getEmails(str):
    regex = r'([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]+){0,}'
    emails = tuple(m.group(0) for m in re.finditer(regex, str) if m.group(0))
    return emails

str = "hello@gmail.com;aaaa@yahoo.com no@yes.de, why@hotmail.com"
print(getEmails(str))

Output »

('hello@gmail.com', 'aaaa@yahoo.com', 'no@yes.de', 'why@hotmail.com')

#1


2  

re.search() is not the function you want in this case. Instead, you should use re.findall():

在本例中,re.search()不是您想要的函数。相反,您应该使用re.findall():

import re

def getEmails(str):
    regex = r'([\w0-9._-]+@[\w0-9._-]+\.[\w0-9_-]+)'
    return re.findall(regex, str, re.M|re.I)

str = "hello@gmail.com;aaaa@yahoo.com no@yes.de, why@hotmail.com"
print(getEmails(str))

Try it online!

试一试在线!

I've replaced [A-Za-z] with [\w], since it makes the regex easier to read.I've also removed the flags, since they aren't going to be necessary for this particular regex. Most importantly, I've removed the {0,} quantifier from the regex, since you only want one email per iterable item, not consecutive emails or empty items.

我已经用[\w]替换[A-Za-z],因为它使regex更容易阅读。我还删除了这些标志,因为它们对于这个特定的regex来说不是必需的。最重要的是,我已经从regex中删除了{0,}量词,因为每个可迭代项只需要一个电子邮件,而不是连续的电子邮件或空项目。

It currently returns a list; if you want it to be a tuple, you can change the return statement to be return tuple(re.findall(regex, str, re.M|re.I))

它当前返回一个列表;如果希望它是一个元组,可以将return语句更改为return tuple(re)。findall(正则表达式、str re.M | re.I))


I will also note in closing that email address validation can be complicated.

我还将在结束语中指出,电子邮件地址验证可能非常复杂。

#2


0  

@jchi2241, you can also use re.finditer() to solve your problem.

@jchi2241,您还可以使用re.finditer()来解决您的问题。

Here is the code (with a little change in your code):

这是代码(你的代码有一点变化):

Try it online at: http://rextester.com/BST18087

在http://rextester.com/BST18087上试试

def getEmails(str):
    regex = r'([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]+){0,}'
    emails = tuple(m.group(0) for m in re.finditer(regex, str) if m.group(0))
    return emails

str = "hello@gmail.com;aaaa@yahoo.com no@yes.de, why@hotmail.com"
print(getEmails(str))

Output »

('hello@gmail.com', 'aaaa@yahoo.com', 'no@yes.de', 'why@hotmail.com')