使用RegEx来平衡匹配括号

时间:2022-02-18 21:44:57

I am trying to create a .NET RegEx expression that will properly balance out my parenthesis. I have the following RegEx expression:

我正在尝试创建一个. net RegEx表达式,它将正确地平衡我的括号。我有以下RegEx表达式:

func([a-zA-Z_][a-zA-Z0-9_]*)\(.*\)

The string I am trying to match is this:

我要匹配的字符串是:

"test -> funcPow((3),2) * (9+1)"

What should happen is Regex should match everything from funcPow until the second closing parenthesis. It should stop after the second closing parenthesis. Instead, it is matching all the way to the very last closing parenthesis. RegEx is returning this:

应该发生的是Regex应该匹配从funcPow到第二个闭括号中的所有内容。它应该在第二个结束括号之后停止。相反,它会一直匹配到最后一个结束括号。正则表达式返回:

"funcPow((3),2) * (9+1)"

It should return this:

它应该返回:

"funcPow((3),2)"

Any help on this would be appreciated.

如有任何帮助,我们将不胜感激。

4 个解决方案

#1


40  

Regular Expressions can definitely do balanced parentheses matching. It can be tricky, and requires a couple of the more advanced Regex features, but it's not too hard.

正则表达式可以实现平衡括号匹配。这可能很棘手,需要一些更高级的Regex特性,但并不太难。

Example:

例子:

var r = new Regex(@"
    func([a-zA-Z_][a-zA-Z0-9_]*) # The func name

    \(                      # First '('
        (?:                 
        [^()]               # Match all non-braces
        |
        (?<open> \( )       # Match '(', and capture into 'open'
        |
        (?<-open> \) )      # Match ')', and delete the 'open' capture
        )+
        (?(open)(?!))       # Fails if 'open' stack isn't empty!

    \)                      # Last ')'
", RegexOptions.IgnorePatternWhitespace);

Balanced matching groups have a couple of features, but for this example, we're only using the capture deleting feature. The line (?<-open> \) ) will match a ) and delete the previous "open" capture.

平衡匹配组有几个特性,但是在本例中,我们只使用捕获删除特性。线(?<-open> \))将匹配a)并删除之前的“open”捕获。

The trickiest line is (?(open)(?!)), so let me explain it. (?(open) is a conditional expression that only matches if there is an "open" capture. (?!) is a negative expression that always fails. Therefore, (?(open)(?!)) says "if there is an open capture, then fail".

最棘手的问题是(?)(开放)(!)(?(open)是一个条件表达式,只有在有“open”捕获时才匹配。(?!)是一个否定的表达,总是失败。因此,(?)(open)(?!)表示“如果有一个open capture,则失败”。

Microsoft's documentation was pretty helpful too.

微软的文档也很有帮助。

#2


18  

Using balanced groups, it is:

使用平衡组,它是:

Regex rx = new Regex(@"func([a-zA-Z_][a-zA-Z0-9_]*)\(((?<BR>\()|(?<-BR>\))|[^()]*)+\)");

var match = rx.Match("funcPow((3),2) * (9+1)");

var str = match.Value; // funcPow((3),2)

(?<BR>\()|(?<-BR>\)) are a Balancing Group (the BR I used for the name is for Brackets). It's more clear in this way (?<BR>\()|(?<-BR>\)) perhaps, so that the \( and \) are more "evident".

(?
\()|(?<-BR>\))是一个平衡组(我用来命名的BR是括号)。这样比较清楚(?
\()|(?<-BR>\)),这样,\(和\)就更“明显”了。

If you really hate yourself (and the world/your fellow co-programmers) enough to use these things, I suggest using the RegexOptions.IgnorePatternWhitespace and "sprinkling" white space everywhere :-)

如果你真的非常讨厌自己(以及世界/你的共同程序员)来使用这些东西,我建议使用RegexOptions。忽略模式空白和“喷洒”空白到处:-)

#3


0  

Regular Expressions only work on Regular Languages. This means that a regular expression can find things of the sort "any combination of a's and b's".(ab or babbabaaa etc) But they can't find "n a's, one b, n a's".(a^n b a^n) Regular expressions can't guarantee that the first set of a's matches the second set of a's.

正则表达式只适用于正则语言。这意味着正则表达式可以找到“a和b的任意组合”之类的东西。(ab或babbabaaa等)但是他们找不到“n a, 1 b, n a”。(a n ^ ^ n)正则表达式不能保证第一组与第二组的。

Because of this, they aren't able to match equal numbers of opening and closing parenthesis. It would be easy enough to write a function that traverses the string one character at a time. Have two counters, one for opening paren, one for closing. increment the pointers as you traverse the string, if opening_paren_count != closing_parent_count return false.

正因为如此,他们不能匹配相同数量的开口和结束括号。编写一次遍历字符串一个字符的函数就足够简单了。有两个柜台,一个用来开门,一个用来关门。如果opening__count != closing_parent_count返回false,则在遍历字符串时增加指针。

#4


-1  

func[a-zA-Z0-9_]*\((([^()])|(\([^()]*\)))*\)

You can use that, but if you're working with .NET, there may be better alternatives.

您可以使用它,但是如果您使用。net,可能有更好的替代方法。

This part you already know:

这一部分你已经知道:

 func[a-zA-Z0-9_]*\( --weird part-- \)

The --weird part-- part just means; ( allow any character ., or | any section (.*) to exist as many times as it wants )*. The only issue is, you can't match any character ., you have to use [^()] to exclude the parenthesis.

这个奇怪的部分只是表示;(允许任何字符,或|任何部分(.*)尽可能多地存在)*。唯一的问题是,你不能匹配任何字符,必须使用[^()]排除了括号。

(([^()])|(\([^()]*\)))*

#1


40  

Regular Expressions can definitely do balanced parentheses matching. It can be tricky, and requires a couple of the more advanced Regex features, but it's not too hard.

正则表达式可以实现平衡括号匹配。这可能很棘手,需要一些更高级的Regex特性,但并不太难。

Example:

例子:

var r = new Regex(@"
    func([a-zA-Z_][a-zA-Z0-9_]*) # The func name

    \(                      # First '('
        (?:                 
        [^()]               # Match all non-braces
        |
        (?<open> \( )       # Match '(', and capture into 'open'
        |
        (?<-open> \) )      # Match ')', and delete the 'open' capture
        )+
        (?(open)(?!))       # Fails if 'open' stack isn't empty!

    \)                      # Last ')'
", RegexOptions.IgnorePatternWhitespace);

Balanced matching groups have a couple of features, but for this example, we're only using the capture deleting feature. The line (?<-open> \) ) will match a ) and delete the previous "open" capture.

平衡匹配组有几个特性,但是在本例中,我们只使用捕获删除特性。线(?<-open> \))将匹配a)并删除之前的“open”捕获。

The trickiest line is (?(open)(?!)), so let me explain it. (?(open) is a conditional expression that only matches if there is an "open" capture. (?!) is a negative expression that always fails. Therefore, (?(open)(?!)) says "if there is an open capture, then fail".

最棘手的问题是(?)(开放)(!)(?(open)是一个条件表达式,只有在有“open”捕获时才匹配。(?!)是一个否定的表达,总是失败。因此,(?)(open)(?!)表示“如果有一个open capture,则失败”。

Microsoft's documentation was pretty helpful too.

微软的文档也很有帮助。

#2


18  

Using balanced groups, it is:

使用平衡组,它是:

Regex rx = new Regex(@"func([a-zA-Z_][a-zA-Z0-9_]*)\(((?<BR>\()|(?<-BR>\))|[^()]*)+\)");

var match = rx.Match("funcPow((3),2) * (9+1)");

var str = match.Value; // funcPow((3),2)

(?<BR>\()|(?<-BR>\)) are a Balancing Group (the BR I used for the name is for Brackets). It's more clear in this way (?<BR>\()|(?<-BR>\)) perhaps, so that the \( and \) are more "evident".

(?
\()|(?<-BR>\))是一个平衡组(我用来命名的BR是括号)。这样比较清楚(?
\()|(?<-BR>\)),这样,\(和\)就更“明显”了。

If you really hate yourself (and the world/your fellow co-programmers) enough to use these things, I suggest using the RegexOptions.IgnorePatternWhitespace and "sprinkling" white space everywhere :-)

如果你真的非常讨厌自己(以及世界/你的共同程序员)来使用这些东西,我建议使用RegexOptions。忽略模式空白和“喷洒”空白到处:-)

#3


0  

Regular Expressions only work on Regular Languages. This means that a regular expression can find things of the sort "any combination of a's and b's".(ab or babbabaaa etc) But they can't find "n a's, one b, n a's".(a^n b a^n) Regular expressions can't guarantee that the first set of a's matches the second set of a's.

正则表达式只适用于正则语言。这意味着正则表达式可以找到“a和b的任意组合”之类的东西。(ab或babbabaaa等)但是他们找不到“n a, 1 b, n a”。(a n ^ ^ n)正则表达式不能保证第一组与第二组的。

Because of this, they aren't able to match equal numbers of opening and closing parenthesis. It would be easy enough to write a function that traverses the string one character at a time. Have two counters, one for opening paren, one for closing. increment the pointers as you traverse the string, if opening_paren_count != closing_parent_count return false.

正因为如此,他们不能匹配相同数量的开口和结束括号。编写一次遍历字符串一个字符的函数就足够简单了。有两个柜台,一个用来开门,一个用来关门。如果opening__count != closing_parent_count返回false,则在遍历字符串时增加指针。

#4


-1  

func[a-zA-Z0-9_]*\((([^()])|(\([^()]*\)))*\)

You can use that, but if you're working with .NET, there may be better alternatives.

您可以使用它,但是如果您使用。net,可能有更好的替代方法。

This part you already know:

这一部分你已经知道:

 func[a-zA-Z0-9_]*\( --weird part-- \)

The --weird part-- part just means; ( allow any character ., or | any section (.*) to exist as many times as it wants )*. The only issue is, you can't match any character ., you have to use [^()] to exclude the parenthesis.

这个奇怪的部分只是表示;(允许任何字符,或|任何部分(.*)尽可能多地存在)*。唯一的问题是,你不能匹配任何字符,必须使用[^()]排除了括号。

(([^()])|(\([^()]*\)))*