Python正则表达式从字符串中提取数字

时间:2022-09-13 11:06:24

I would like to extract a number from a large html file with python. My idea was to use regex like this:

我想用python从一个大的html文件中提取一个数字。我的想法是使用这样的正则表达式:

import re
text = 'gfgfdAAA1234ZZZuijjk'
try:
    found = re.search('AAA(.+?)ZZZ', text).group(1)
except AttributeError:
    found = ''

found

But unfortunately i'm not used to regex and i fail to adapt this example to extract 0,54125 from:

但不幸的是,我不习惯正则表达式,我不能适应这个例子从0提取0,54125:

(...)<div class="vk_ans vk_bk">0,54125 count id</div>(...)

Is there an other way to extract the number or could some one help me with the regex?

有没有其他方法来提取数字或者有人可以帮助我使用正则表达式?

2 个解决方案

#1


0  

If you want output 0,54125(or \d+,\d+), then you need to set some conditions for the output.

如果需要输出0,54125(或\ d +,\ d +),则需要为输出设置一些条件。

From the following input,

从以下输入,

 (...)<div class="vk_ans vk_bk">0,54125 count id</div>(...)

If you want to extract 0,54125, it seems you can try several regexs like follows,

如果你想提取0,54125,你似乎可以试试几个正则表达式如下,

(?<=\>)\d+,\d+

Demo

or,

(?<=\<div class=\"vk_ans vk_bk\"\>)\d+,\d+

Demo

, etc..

#2


0  

You can replace some characters in your text before searching it. For example, to capture numbers like 12,34 you can do this:

在搜索之前,您可以替换文本中的某些字符。例如,要捕获12,34之类的数字,您可以执行以下操作:

text = 'gfgfdAAA12,34ZZZuijjk'
try:
    text = text.replace(',', '')
    found = re.search('AAA(\d+)ZZZ', text).group(1)
except AttributeError:
    found = ''

print found
# 1234

If you need to capture the digits inside a line, you can make your pattern more general, like this:

如果你需要捕获一行内的数字,你可以使你的模式更通用,如下所示:

text = '<div class="vk_ans vk_bk">0,54125 count id</div>'
text = text.replace(',', '')
found = re.search('(\d+)', text).group(1)

print found
# 054125

#1


0  

If you want output 0,54125(or \d+,\d+), then you need to set some conditions for the output.

如果需要输出0,54125(或\ d +,\ d +),则需要为输出设置一些条件。

From the following input,

从以下输入,

 (...)<div class="vk_ans vk_bk">0,54125 count id</div>(...)

If you want to extract 0,54125, it seems you can try several regexs like follows,

如果你想提取0,54125,你似乎可以试试几个正则表达式如下,

(?<=\>)\d+,\d+

Demo

or,

(?<=\<div class=\"vk_ans vk_bk\"\>)\d+,\d+

Demo

, etc..

#2


0  

You can replace some characters in your text before searching it. For example, to capture numbers like 12,34 you can do this:

在搜索之前,您可以替换文本中的某些字符。例如,要捕获12,34之类的数字,您可以执行以下操作:

text = 'gfgfdAAA12,34ZZZuijjk'
try:
    text = text.replace(',', '')
    found = re.search('AAA(\d+)ZZZ', text).group(1)
except AttributeError:
    found = ''

print found
# 1234

If you need to capture the digits inside a line, you can make your pattern more general, like this:

如果你需要捕获一行内的数字,你可以使你的模式更通用,如下所示:

text = '<div class="vk_ans vk_bk">0,54125 count id</div>'
text = text.replace(',', '')
found = re.search('(\d+)', text).group(1)

print found
# 054125