This question already has an answer here:
这个问题已经有了答案:
- Python - Using regex to find multiple matches and print them out 3 answers
- Python -使用regex查找多个匹配项并打印出3个答案
I want to extract all emails from a string. In this, I would expect a tuple
我想从字符串中提取所有电子邮件。在这里,我期望有一个元组。
(hello@gmail.com, aaaa@yahoo.com, no@yes.de, why@hotmail.com)
However, I am only getting back the first email from my function
但是,我只收到了我的第一封邮件
(hello@gmail.com)
What's going on?
这是怎么呢
import re
def getEmails(str):
regex = r'([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]+){0,}'
obj = re.search(regex, str, re.M|re.I)
return obj.groups()
str = "hello@gmail.com;aaaa@yahoo.com no@yes.de, why@hotmail.com"
print(getEmails(str))
2 个解决方案
#1
2
re.search()
is not the function you want in this case. Instead, you should use re.findall()
:
在本例中,re.search()不是您想要的函数。相反,您应该使用re.findall():
import re
def getEmails(str):
regex = r'([\w0-9._-]+@[\w0-9._-]+\.[\w0-9_-]+)'
return re.findall(regex, str, re.M|re.I)
str = "hello@gmail.com;aaaa@yahoo.com no@yes.de, why@hotmail.com"
print(getEmails(str))
试一试在线!
I've replaced [A-Za-z]
with [\w]
, since it makes the regex easier to read.I've also removed the flags, since they aren't going to be necessary for this particular regex. Most importantly, I've removed the {0,}
quantifier from the regex, since you only want one email per iterable item, not consecutive emails or empty items.
我已经用[\w]替换[A-Za-z],因为它使regex更容易阅读。我还删除了这些标志,因为它们对于这个特定的regex来说不是必需的。最重要的是,我已经从regex中删除了{0,}量词,因为每个可迭代项只需要一个电子邮件,而不是连续的电子邮件或空项目。
It currently returns a list; if you want it to be a tuple, you can change the return statement to be return tuple(re.findall(regex, str, re.M|re.I))
它当前返回一个列表;如果希望它是一个元组,可以将return语句更改为return tuple(re)。findall(正则表达式、str re.M | re.I))
I will also note in closing that email address validation can be complicated.
我还将在结束语中指出,电子邮件地址验证可能非常复杂。
#2
0
@jchi2241, you can also use re.finditer()
to solve your problem.
@jchi2241,您还可以使用re.finditer()来解决您的问题。
Here is the code (with a little change in your code):
这是代码(你的代码有一点变化):
Try it online at: http://rextester.com/BST18087
在http://rextester.com/BST18087上试试
def getEmails(str):
regex = r'([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]+){0,}'
emails = tuple(m.group(0) for m in re.finditer(regex, str) if m.group(0))
return emails
str = "hello@gmail.com;aaaa@yahoo.com no@yes.de, why@hotmail.com"
print(getEmails(str))
Output »
('hello@gmail.com', 'aaaa@yahoo.com', 'no@yes.de', 'why@hotmail.com')
#1
2
re.search()
is not the function you want in this case. Instead, you should use re.findall()
:
在本例中,re.search()不是您想要的函数。相反,您应该使用re.findall():
import re
def getEmails(str):
regex = r'([\w0-9._-]+@[\w0-9._-]+\.[\w0-9_-]+)'
return re.findall(regex, str, re.M|re.I)
str = "hello@gmail.com;aaaa@yahoo.com no@yes.de, why@hotmail.com"
print(getEmails(str))
试一试在线!
I've replaced [A-Za-z]
with [\w]
, since it makes the regex easier to read.I've also removed the flags, since they aren't going to be necessary for this particular regex. Most importantly, I've removed the {0,}
quantifier from the regex, since you only want one email per iterable item, not consecutive emails or empty items.
我已经用[\w]替换[A-Za-z],因为它使regex更容易阅读。我还删除了这些标志,因为它们对于这个特定的regex来说不是必需的。最重要的是,我已经从regex中删除了{0,}量词,因为每个可迭代项只需要一个电子邮件,而不是连续的电子邮件或空项目。
It currently returns a list; if you want it to be a tuple, you can change the return statement to be return tuple(re.findall(regex, str, re.M|re.I))
它当前返回一个列表;如果希望它是一个元组,可以将return语句更改为return tuple(re)。findall(正则表达式、str re.M | re.I))
I will also note in closing that email address validation can be complicated.
我还将在结束语中指出,电子邮件地址验证可能非常复杂。
#2
0
@jchi2241, you can also use re.finditer()
to solve your problem.
@jchi2241,您还可以使用re.finditer()来解决您的问题。
Here is the code (with a little change in your code):
这是代码(你的代码有一点变化):
Try it online at: http://rextester.com/BST18087
在http://rextester.com/BST18087上试试
def getEmails(str):
regex = r'([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]+){0,}'
emails = tuple(m.group(0) for m in re.finditer(regex, str) if m.group(0))
return emails
str = "hello@gmail.com;aaaa@yahoo.com no@yes.de, why@hotmail.com"
print(getEmails(str))
Output »
('hello@gmail.com', 'aaaa@yahoo.com', 'no@yes.de', 'why@hotmail.com')