如何在正则表达式中查找所有匹配项

时间:2021-08-05 05:15:35

My expression is able to do for one occurence, but if multiple occurences are given it is capturing the whole part.

我的表达式可以用于一次出现,但是如果给出多次出现,则捕获整个部分。

My regex is

我的正则表达式是

[=:]\s*[\"\']?(.*=_ash)[\"\']?

I tried with both regex.findall and search I am getting the entire part when multiple occurences are there.

我尝试使用regex.findall和搜索当多个出现时我得到了整个部分。

Do I need to set any flags for searching multiple occurences are is there a problem with my regex itself.

我需要设置任何标志来搜索多次出现我的正则表达式本身是否存在问题。

First three lines are working but

前三行正在运行但是

sample_string = 'asdfanksdfkjasdf_ash'

sample_str = "asdfasdfasdf_ash"

sample_st = assdfvb/23+sdf_ash

sample_s : 'assdfvb/23+sdf_ash'

sample = {'sample' : { 'hi' : 'asdfasdf+/asdf+_ash' , 'hello' : 'asdfasf+/asdf+v_ash' }} 

I need only the value part here

我只需要这里的价值部分

2 个解决方案

#1


1  

The problem with your pattern is the .*.

你的模式的问题是。*。

By default, the regex engine is greedy, .* consumes as much as it can. To change this behaviour, you can use a lazy quantifier. Adding the extra "?" in .*? makes it repeat as few as possible.

默认情况下,正则表达式引擎是贪婪的。*消耗尽可能多的。要更改此行为,您可以使用延迟量词。添加额外的“?” in。*?让它重复尽可能少。

Also, you may want to make it fail in cases where a value doesn't end in "_ash", checking for quotes in quoted text, or a space if it's not quoted:

此外,您可能希望在值未以“_ash”结束,检查引用文本中的引号或未引用的空格的情况下使其失败:

Regex:

正则表达式:

[=:]\s*(?:(["'])((?:(?!\1).)*_ash)\1|(\S*_ash)(?!\S))

regex101 Demo

regex101演示

  • (["']) captures the quote in group 1
  • ([“'])捕获组1中的引用
  • (?:(?!\1).)* matches any char except the quote captures in group 1
  • (?:(?!\ 1)。)*匹配除组1中的引号捕获之外的任何字符
  • \1 matches the closing quote (the same used as opening quote)
  • \ 1匹配收盘价(与开盘价相同)
  • \S* with unquoted text, match anything except spaces
  • \ S *带有不带引号的文本,匹配除空格之外的任何内容
  • (?!\S) check the value ends there
  • (?!\ S)检查值结束那里

The values are captured in .group(2) if they're in quotes, or in .group(3) if unquoted.

如果它们在引号中,则在.group(2)中捕获值;如果不引用,则在.group(3)中捕获。

Code:

码:

#python 2.7.10
import re

text = """sample = {'sample' : { 'hi' : 'asdfasdf+/asdf+_ash' , 'hello' : 'asdfasf+/asdf+v_ash' }}"""
n = 0

pattern = re.compile( r'[=:]\s*(?:(["\'])((?:(?!\1).)*_ash)\1|(\S*_ash))')

#loop all matches
for match in pattern.finditer(text):
    n += 1
    print '\nMatch #%s:' % n

    #Show groups 2 and 3 captures
    for i in range(2,4):
        print 'Group %s - [%s:%s]:  %s' % (i, match.start(i), match.end(i), match.group(i))

ideone Demo

ideone演示

#2


0  

I think you need to change your regex to:

我认为你需要改变你的正则表达式:

[=:]\s*['"]?([^\s\'\"=:]*?_ash)['"]?

[Regex Demo]

[正则表达式演示]

#1


1  

The problem with your pattern is the .*.

你的模式的问题是。*。

By default, the regex engine is greedy, .* consumes as much as it can. To change this behaviour, you can use a lazy quantifier. Adding the extra "?" in .*? makes it repeat as few as possible.

默认情况下,正则表达式引擎是贪婪的。*消耗尽可能多的。要更改此行为,您可以使用延迟量词。添加额外的“?” in。*?让它重复尽可能少。

Also, you may want to make it fail in cases where a value doesn't end in "_ash", checking for quotes in quoted text, or a space if it's not quoted:

此外,您可能希望在值未以“_ash”结束,检查引用文本中的引号或未引用的空格的情况下使其失败:

Regex:

正则表达式:

[=:]\s*(?:(["'])((?:(?!\1).)*_ash)\1|(\S*_ash)(?!\S))

regex101 Demo

regex101演示

  • (["']) captures the quote in group 1
  • ([“'])捕获组1中的引用
  • (?:(?!\1).)* matches any char except the quote captures in group 1
  • (?:(?!\ 1)。)*匹配除组1中的引号捕获之外的任何字符
  • \1 matches the closing quote (the same used as opening quote)
  • \ 1匹配收盘价(与开盘价相同)
  • \S* with unquoted text, match anything except spaces
  • \ S *带有不带引号的文本,匹配除空格之外的任何内容
  • (?!\S) check the value ends there
  • (?!\ S)检查值结束那里

The values are captured in .group(2) if they're in quotes, or in .group(3) if unquoted.

如果它们在引号中,则在.group(2)中捕获值;如果不引用,则在.group(3)中捕获。

Code:

码:

#python 2.7.10
import re

text = """sample = {'sample' : { 'hi' : 'asdfasdf+/asdf+_ash' , 'hello' : 'asdfasf+/asdf+v_ash' }}"""
n = 0

pattern = re.compile( r'[=:]\s*(?:(["\'])((?:(?!\1).)*_ash)\1|(\S*_ash))')

#loop all matches
for match in pattern.finditer(text):
    n += 1
    print '\nMatch #%s:' % n

    #Show groups 2 and 3 captures
    for i in range(2,4):
        print 'Group %s - [%s:%s]:  %s' % (i, match.start(i), match.end(i), match.group(i))

ideone Demo

ideone演示

#2


0  

I think you need to change your regex to:

我认为你需要改变你的正则表达式:

[=:]\s*['"]?([^\s\'\"=:]*?_ash)['"]?

[Regex Demo]

[正则表达式演示]