使用Python中的Regex提取浮点值

时间:2022-08-21 17:08:37

This is my string and I'm working on Python

这是我的字符串,我正在研究Python

Memoria RAM - 1.5GB 
Memoria RAM - 1 GB

This is the regex that I use to extract the value

这是我用来提取值的正则表达式

(\d{1,4})((,|.)(\d{1,2})){0,1}

The result is:

结果是:

MATCH 1 --> 1.5.5 
MATCH 2 --> 1

Of course only the second one is correct. The excepted output is:

当然只有第二个是正确的。例外输出是:

MATCH 1 --> 1.5
MATCH 2 --> 1

Why my regex catch another ".5" ?? How can I fix my regex?

为什么我的正则表达式会抓住另一个“.5”?我怎样才能修复我的正则表达式?

3 个解决方案

#1


1  

I've tried this example and it works (when using group(0)):

我已经尝试过这个例子并且它有效(当使用group(0)时):

Python 3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:43:06) [MSC v.1600 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> result = re.search('(\d{1,4})((,|.)(\d{1,2})){0,1}', 'Memoria RAM - 1.5GB')
>>> result.group(0)
'1.5'

However if you check groups() you'll get:

但是,如果你检查组(),你会得到:

>>> result.groups()
('1', '.5', '.', '5')

Why?

为什么?

You're capturing:

你正在捕捉:

1) The "1" ((\d{1,4}));

1)“1”((\ d {1,4}));

2) The "." or "," ((,|.), and btw should be (,|\.) because "." - matches any character except a newline see more here so you should use \.);

2)“。”或者“,”((,|。)和btw应该是(,| \。),因为“。” - 匹配除换行符之外的任何字符在这里看到更多,所以你应该使用\。);

3) The "5" ((\d{1,2});

3)“5”((\ d {1,2});

4) The.5 (When you use parenthesis around poins 2 and 3 ((,|.)(\d{1,2})));

4)The.5(在poins 2和3周围使用括号((,|。)(\ d {1,2})));

So you should remove the parenthesis in point 4, like this:

所以你应该删除第4点中的括号,如下所示:

>>> result = re.search('(\d{1,4})(,|\.)(\d{1,2}){0,1}', 'Memoria RAM - 1.5GB')
>>> result.group(0)
'1.5'
>>> result.groups()
('1', '.', '5')

#2


0  

If you need to only capture each part of the integer/decimal number the way you do with your regex, just make sure the decimal part is optional and use a non-capturing group:

如果您只需要像使用正则表达式一样捕获整数/十进制数的每个部分,只需确保小数部分是可选的并使用非捕获组:

(\d{1,4})(?:([,.])(\d{1,2}))?

See demo. I also replaced the (,|.) with [,.] since I guess your intention was to match either a comma or a dot, not a comma or any character but a newline.

见演示。我也用[,。]替换了(,|。)因为我猜你的意图是匹配逗号或点,而不是逗号或任何字符而是换行符。

IDEONE demo:

IDEONE演示:

import re
p = re.compile(r'(\d{1,4})(?:([,.])(\d{1,2}))?')
test_str = "Memoria RAM - 1.5GB \nMemoria RAM - 1 GB"
print ["".join(x) for x in re.findall(p, test_str)]

Alternatively, you can just use a regex to match the numbers:

或者,您可以使用正则表达式匹配数字:

\d+(?:\.\d+)?

If you need to match the numbers only before GB, use a look-ahead:

如果您需要仅在GB之前匹配数字,请使用前瞻:

\d+(?:\.\d+)?(?=\s*GB)

See the regex demo.

请参阅正则表达式演示。

Her is an IDEONE demo:

她是一个IDEONE演示:

import re
p = re.compile(r'\d+(?:\.\d+)?')
test_str = "Memoria RAM - 1.5GB \nMemoria RAM - 1 GB"
print (p.findall(test_str))
# => ['1.5', '1']

If you need to match the numbers only before GB, use a look-ahead:

如果您需要仅在GB之前匹配数字,请使用前瞻:

\d+(?:\.\d+)?(?=\s*GB)

See the regex demo

请参阅正则表达式演示

#3


0  

result = re.findall(r'(?<!\S)\d\.\d+|(?<!\S)\d',st)

(?<!\S) - not preceded by non-space

print(result)

['1.5', '1']

#1


1  

I've tried this example and it works (when using group(0)):

我已经尝试过这个例子并且它有效(当使用group(0)时):

Python 3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:43:06) [MSC v.1600 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> result = re.search('(\d{1,4})((,|.)(\d{1,2})){0,1}', 'Memoria RAM - 1.5GB')
>>> result.group(0)
'1.5'

However if you check groups() you'll get:

但是,如果你检查组(),你会得到:

>>> result.groups()
('1', '.5', '.', '5')

Why?

为什么?

You're capturing:

你正在捕捉:

1) The "1" ((\d{1,4}));

1)“1”((\ d {1,4}));

2) The "." or "," ((,|.), and btw should be (,|\.) because "." - matches any character except a newline see more here so you should use \.);

2)“。”或者“,”((,|。)和btw应该是(,| \。),因为“。” - 匹配除换行符之外的任何字符在这里看到更多,所以你应该使用\。);

3) The "5" ((\d{1,2});

3)“5”((\ d {1,2});

4) The.5 (When you use parenthesis around poins 2 and 3 ((,|.)(\d{1,2})));

4)The.5(在poins 2和3周围使用括号((,|。)(\ d {1,2})));

So you should remove the parenthesis in point 4, like this:

所以你应该删除第4点中的括号,如下所示:

>>> result = re.search('(\d{1,4})(,|\.)(\d{1,2}){0,1}', 'Memoria RAM - 1.5GB')
>>> result.group(0)
'1.5'
>>> result.groups()
('1', '.', '5')

#2


0  

If you need to only capture each part of the integer/decimal number the way you do with your regex, just make sure the decimal part is optional and use a non-capturing group:

如果您只需要像使用正则表达式一样捕获整数/十进制数的每个部分,只需确保小数部分是可选的并使用非捕获组:

(\d{1,4})(?:([,.])(\d{1,2}))?

See demo. I also replaced the (,|.) with [,.] since I guess your intention was to match either a comma or a dot, not a comma or any character but a newline.

见演示。我也用[,。]替换了(,|。)因为我猜你的意图是匹配逗号或点,而不是逗号或任何字符而是换行符。

IDEONE demo:

IDEONE演示:

import re
p = re.compile(r'(\d{1,4})(?:([,.])(\d{1,2}))?')
test_str = "Memoria RAM - 1.5GB \nMemoria RAM - 1 GB"
print ["".join(x) for x in re.findall(p, test_str)]

Alternatively, you can just use a regex to match the numbers:

或者,您可以使用正则表达式匹配数字:

\d+(?:\.\d+)?

If you need to match the numbers only before GB, use a look-ahead:

如果您需要仅在GB之前匹配数字,请使用前瞻:

\d+(?:\.\d+)?(?=\s*GB)

See the regex demo.

请参阅正则表达式演示。

Her is an IDEONE demo:

她是一个IDEONE演示:

import re
p = re.compile(r'\d+(?:\.\d+)?')
test_str = "Memoria RAM - 1.5GB \nMemoria RAM - 1 GB"
print (p.findall(test_str))
# => ['1.5', '1']

If you need to match the numbers only before GB, use a look-ahead:

如果您需要仅在GB之前匹配数字,请使用前瞻:

\d+(?:\.\d+)?(?=\s*GB)

See the regex demo

请参阅正则表达式演示

#3


0  

result = re.findall(r'(?<!\S)\d\.\d+|(?<!\S)\d',st)

(?<!\S) - not preceded by non-space

print(result)

['1.5', '1']