与re.findall一起使用时,Python正则表达式返回匹配的一部分

时间:2023-01-25 22:33:29

I have been trying to teach myself Python and am currently on regular expressions. The instructional text I have been using seems to be aimed at teaching Perl or some other language that is not Python, so I have had to adapt the expressions a bit to fit Python. I'm not very experienced, however, and I've hit a snag trying to get an expression to work.

我一直在努力教自己Python,目前正在使用正则表达式。我一直在使用的教学文本似乎旨在教Perl或其他非Python语言,所以我不得不调整表达式以适应Python。然而,我并不是很有经验,而且我试图让表达起作用。

The problem involves searching a text for instances of prices, expressed either without decimals, $500, or with decimals, $500.10.

问题涉及在文本中搜索价格实例,表示没有小数,500美元或小数,500.10美元。

This is what the text recommends:

这是文本建议的内容:

\$[0-9]+(\.[0-9][0-9])?

Replicating the text, I use this code:

复制文本,我使用此代码:

import re

inputstring = "$500.01"

result = re.findall( r'\$[0-9]+(\.[0-9][0-9])?', inputstring)

if result:
    print(result)
else:
    print("No match.")

However, the result is not $500.01, but rather:

但结果不是$ 500.01,而是:

.01

I find this strange. If I remove the parentheses and the optional decimal portion, it works fine. So, using this:

我觉得这很奇怪。如果我删除括号和可选的小数部分,它可以正常工作。所以,使用这个:

\$[0-9]+\.[0-9][0-9]

I get:

$500.01

How can I get the regular expression to return values with and without decimal portions?

如何使正则表达式返回包含和不包含小数部分的值?

Thanks.

1 个解决方案

#1


4  

Use a non-capturing group:

使用非捕获组:

result = re.findall( r'\$[0-9]+(?:\.[0-9][0-9])?', inputstring)
                                ^^ 

The re.findall function returns the list of captured texts if there are any defined in the pattern, and you have one in yours. You need to get rid of it by turning it into a non-capturing one.

re.findall函数返回捕获的文本列表(如果模式中有任何已定义的文本,并且您的文本中有一个)。你需要通过把它变成一个非捕获它来摆脱它。

re.findall(pattern, string, flags=0)
If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group.

re.findall(pattern,string,flags = 0)如果模式中存在一个或多个组,则返回组列表;如果模式有多个组,这将是一个元组列表。

Update

You can shorten your regex a bit by using a limiting quantifier {2} that requires exactly 2 occurrences of the preceding subpattern:

您可以使用限制量词{2}来缩短正则表达式,该限制量词{2}只需要前两个子模式出现:

r'\$[0-9]+(?:\.[0-9]{2})?'
                    ^^^

Or even replace [0-9] with \d:

甚至用\ d替换[0-9]:

r'\$\d+(?:\.\d{2})?'

#1


4  

Use a non-capturing group:

使用非捕获组:

result = re.findall( r'\$[0-9]+(?:\.[0-9][0-9])?', inputstring)
                                ^^ 

The re.findall function returns the list of captured texts if there are any defined in the pattern, and you have one in yours. You need to get rid of it by turning it into a non-capturing one.

re.findall函数返回捕获的文本列表(如果模式中有任何已定义的文本,并且您的文本中有一个)。你需要通过把它变成一个非捕获它来摆脱它。

re.findall(pattern, string, flags=0)
If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group.

re.findall(pattern,string,flags = 0)如果模式中存在一个或多个组,则返回组列表;如果模式有多个组,这将是一个元组列表。

Update

You can shorten your regex a bit by using a limiting quantifier {2} that requires exactly 2 occurrences of the preceding subpattern:

您可以使用限制量词{2}来缩短正则表达式,该限制量词{2}只需要前两个子模式出现:

r'\$[0-9]+(?:\.[0-9]{2})?'
                    ^^^

Or even replace [0-9] with \d:

甚至用\ d替换[0-9]:

r'\$\d+(?:\.\d{2})?'