查找特定字符串后的所有数字

时间:2021-09-20 18:31:34

I am trying to get all the digits from following string after the word classes (or its variations)

我正在尝试从单词类(或它的变体)后面的字符串中获取所有的数字

Accepted for all the goods and services in classes 16 and 41.

expected output:

预期的输出:

16
41

I have multiple strings which follows this pattern and some others such as:

我有多个字符串遵循这个模式,还有一些其他的比如:

classes 5 et 30 # expected output 5, 30
class(es) 32,33 # expected output 32, 33
class 16        # expected output 5

Here is what I have tried so far: https://regex101.com/r/eU7dF6/3

下面是我到目前为止所尝试的:https://regex101.com/r/eU7dF6/3

(class[\(es\)]*)([and|et|,|\s]*(\d{1,}))+

But I am able to get only the last matched digit i.e. 41 in the above example.

但是我只能得到最后一个匹配的数字也就是上面例子中的41。

2 个解决方案

#1


1  

I suggest grabbing all the substring with numbers after class or classes/class(es) and then get all the numbers from those:

我建议在课后用数字抓取所有的子串,然后从这些数字中获取所有的数字:

import re
p = re.compile(r'\bclass(?:\(?es\)?)?(?:\s*(?:and|et|[,\s])?\s*\d+)+')
test_str = "Accepted for all the goods and services in classes 16 and 41."
results = [re.findall(r"\d+", x) for x in p.findall(test_str)]
print([x for l in results for x in l])
# => ['16', '41']

See IDEONE demo

看到IDEONE演示

As \G construct is not supported, nor can you access the captures stack using Python re module, you cannot use your approach.

由于不支持\G结构,也不能使用Python re模块访问捕获堆栈,因此不能使用您的方法。

However, you can do it the way you did with PyPi regex module.

但是,您可以像使用PyPi regex模块那样做。

>>> import regex
>>> test_str = "Accepted for all the goods and services in classes 16 and 41."
>>> rx = r'\bclass(?:\(?es\)?)?(?:\s*(?:and|et|[,\s])?\s*(?P<num>\d+))+'
>>> res = []
>>> for x in regex.finditer(rx, test_str):
        res.extend(x.captures("num"))
>>> print res
['16', '41']

#2


1  

You can do it in 2 steps.Regex engine remebers only the last group in continous groups.

你可以用两个步骤来做。Regex引擎只记住连续组中的最后一组。

x="""Accepted for all the goods and services in classes 16 and 41."""
print re.findall(r"\d+",re.findall(r"class[\(es\)]*\s*(\d+(?:(?:and|et|,|\s)*\d+)*)",x)[0])

Output:['16', '41']

输出:[' 16 ',' 41 ']

If you dont want string use

如果你不想用字符串

print map(ast.literal_eval,re.findall(r"\d+",re.findall(r"class[\(es\)]*\s*(\d+(?:(?:and|et|,|\s)*\d+)*)",x)[0]))

Output:[16, 41]

输出(16日,41):

If you have to do it in one regex use regex module

如果你必须在一个regex中使用regex模块。

import regex
x="""Accepted for all the goods and services in classes 16 and 41."""
print [ast.literal_eval(i) for i in regex.findall(r"class[\(es\)]*|\G(?:and|et|,|\s)*(\d+)",x,regex.VERSION1) if i]

Output:[16, 41]

输出(16日,41):

#1


1  

I suggest grabbing all the substring with numbers after class or classes/class(es) and then get all the numbers from those:

我建议在课后用数字抓取所有的子串,然后从这些数字中获取所有的数字:

import re
p = re.compile(r'\bclass(?:\(?es\)?)?(?:\s*(?:and|et|[,\s])?\s*\d+)+')
test_str = "Accepted for all the goods and services in classes 16 and 41."
results = [re.findall(r"\d+", x) for x in p.findall(test_str)]
print([x for l in results for x in l])
# => ['16', '41']

See IDEONE demo

看到IDEONE演示

As \G construct is not supported, nor can you access the captures stack using Python re module, you cannot use your approach.

由于不支持\G结构,也不能使用Python re模块访问捕获堆栈,因此不能使用您的方法。

However, you can do it the way you did with PyPi regex module.

但是,您可以像使用PyPi regex模块那样做。

>>> import regex
>>> test_str = "Accepted for all the goods and services in classes 16 and 41."
>>> rx = r'\bclass(?:\(?es\)?)?(?:\s*(?:and|et|[,\s])?\s*(?P<num>\d+))+'
>>> res = []
>>> for x in regex.finditer(rx, test_str):
        res.extend(x.captures("num"))
>>> print res
['16', '41']

#2


1  

You can do it in 2 steps.Regex engine remebers only the last group in continous groups.

你可以用两个步骤来做。Regex引擎只记住连续组中的最后一组。

x="""Accepted for all the goods and services in classes 16 and 41."""
print re.findall(r"\d+",re.findall(r"class[\(es\)]*\s*(\d+(?:(?:and|et|,|\s)*\d+)*)",x)[0])

Output:['16', '41']

输出:[' 16 ',' 41 ']

If you dont want string use

如果你不想用字符串

print map(ast.literal_eval,re.findall(r"\d+",re.findall(r"class[\(es\)]*\s*(\d+(?:(?:and|et|,|\s)*\d+)*)",x)[0]))

Output:[16, 41]

输出(16日,41):

If you have to do it in one regex use regex module

如果你必须在一个regex中使用regex模块。

import regex
x="""Accepted for all the goods and services in classes 16 and 41."""
print [ast.literal_eval(i) for i in regex.findall(r"class[\(es\)]*|\G(?:and|et|,|\s)*(\d+)",x,regex.VERSION1) if i]

Output:[16, 41]

输出(16日,41):