
时间:2022-02-18 21:44:57

I am trying to create a .NET RegEx expression that will properly balance out my parenthesis. I have the following RegEx expression:

我正在尝试创建一个. net RegEx表达式,它将正确地平衡我的括号。我有以下RegEx表达式:


The string I am trying to match is this:


"test -> funcPow((3),2) * (9+1)"

What should happen is Regex should match everything from funcPow until the second closing parenthesis. It should stop after the second closing parenthesis. Instead, it is matching all the way to the very last closing parenthesis. RegEx is returning this:


"funcPow((3),2) * (9+1)"

It should return this:



Any help on this would be appreciated.


4 个解决方案



Regular Expressions can definitely do balanced parentheses matching. It can be tricky, and requires a couple of the more advanced Regex features, but it's not too hard.




var r = new Regex(@"
    func([a-zA-Z_][a-zA-Z0-9_]*) # The func name

    \(                      # First '('
        [^()]               # Match all non-braces
        (?<open> \( )       # Match '(', and capture into 'open'
        (?<-open> \) )      # Match ')', and delete the 'open' capture
        (?(open)(?!))       # Fails if 'open' stack isn't empty!

    \)                      # Last ')'
", RegexOptions.IgnorePatternWhitespace);

Balanced matching groups have a couple of features, but for this example, we're only using the capture deleting feature. The line (?<-open> \) ) will match a ) and delete the previous "open" capture.

平衡匹配组有几个特性,但是在本例中,我们只使用捕获删除特性。线(?<-open> \))将匹配a)并删除之前的“open”捕获。

The trickiest line is (?(open)(?!)), so let me explain it. (?(open) is a conditional expression that only matches if there is an "open" capture. (?!) is a negative expression that always fails. Therefore, (?(open)(?!)) says "if there is an open capture, then fail".

最棘手的问题是(?)(开放)(!)(?(open)是一个条件表达式,只有在有“open”捕获时才匹配。(?!)是一个否定的表达,总是失败。因此,(?)(open)(?!)表示“如果有一个open capture,则失败”。

Microsoft's documentation was pretty helpful too.




Using balanced groups, it is:


Regex rx = new Regex(@"func([a-zA-Z_][a-zA-Z0-9_]*)\(((?<BR>\()|(?<-BR>\))|[^()]*)+\)");

var match = rx.Match("funcPow((3),2) * (9+1)");

var str = match.Value; // funcPow((3),2)

(?<BR>\()|(?<-BR>\)) are a Balancing Group (the BR I used for the name is for Brackets). It's more clear in this way (?<BR>\()|(?<-BR>\)) perhaps, so that the \( and \) are more "evident".


If you really hate yourself (and the world/your fellow co-programmers) enough to use these things, I suggest using the RegexOptions.IgnorePatternWhitespace and "sprinkling" white space everywhere :-)




Regular Expressions only work on Regular Languages. This means that a regular expression can find things of the sort "any combination of a's and b's".(ab or babbabaaa etc) But they can't find "n a's, one b, n a's".(a^n b a^n) Regular expressions can't guarantee that the first set of a's matches the second set of a's.

正则表达式只适用于正则语言。这意味着正则表达式可以找到“a和b的任意组合”之类的东西。(ab或babbabaaa等)但是他们找不到“n a, 1 b, n a”。(a n ^ ^ n)正则表达式不能保证第一组与第二组的。

Because of this, they aren't able to match equal numbers of opening and closing parenthesis. It would be easy enough to write a function that traverses the string one character at a time. Have two counters, one for opening paren, one for closing. increment the pointers as you traverse the string, if opening_paren_count != closing_parent_count return false.

正因为如此,他们不能匹配相同数量的开口和结束括号。编写一次遍历字符串一个字符的函数就足够简单了。有两个柜台,一个用来开门,一个用来关门。如果opening__count != closing_parent_count返回false,则在遍历字符串时增加指针。




You can use that, but if you're working with .NET, there may be better alternatives.


This part you already know:


 func[a-zA-Z0-9_]*\( --weird part-- \)

The --weird part-- part just means; ( allow any character ., or | any section (.*) to exist as many times as it wants )*. The only issue is, you can't match any character ., you have to use [^()] to exclude the parenthesis.





Regular Expressions can definitely do balanced parentheses matching. It can be tricky, and requires a couple of the more advanced Regex features, but it's not too hard.




var r = new Regex(@"
    func([a-zA-Z_][a-zA-Z0-9_]*) # The func name

    \(                      # First '('
        [^()]               # Match all non-braces
        (?<open> \( )       # Match '(', and capture into 'open'
        (?<-open> \) )      # Match ')', and delete the 'open' capture
        (?(open)(?!))       # Fails if 'open' stack isn't empty!

    \)                      # Last ')'
", RegexOptions.IgnorePatternWhitespace);

Balanced matching groups have a couple of features, but for this example, we're only using the capture deleting feature. The line (?<-open> \) ) will match a ) and delete the previous "open" capture.

平衡匹配组有几个特性,但是在本例中,我们只使用捕获删除特性。线(?<-open> \))将匹配a)并删除之前的“open”捕获。

The trickiest line is (?(open)(?!)), so let me explain it. (?(open) is a conditional expression that only matches if there is an "open" capture. (?!) is a negative expression that always fails. Therefore, (?(open)(?!)) says "if there is an open capture, then fail".

最棘手的问题是(?)(开放)(!)(?(open)是一个条件表达式,只有在有“open”捕获时才匹配。(?!)是一个否定的表达,总是失败。因此,(?)(open)(?!)表示“如果有一个open capture,则失败”。

Microsoft's documentation was pretty helpful too.




Using balanced groups, it is:


Regex rx = new Regex(@"func([a-zA-Z_][a-zA-Z0-9_]*)\(((?<BR>\()|(?<-BR>\))|[^()]*)+\)");

var match = rx.Match("funcPow((3),2) * (9+1)");

var str = match.Value; // funcPow((3),2)

(?<BR>\()|(?<-BR>\)) are a Balancing Group (the BR I used for the name is for Brackets). It's more clear in this way (?<BR>\()|(?<-BR>\)) perhaps, so that the \( and \) are more "evident".


If you really hate yourself (and the world/your fellow co-programmers) enough to use these things, I suggest using the RegexOptions.IgnorePatternWhitespace and "sprinkling" white space everywhere :-)




Regular Expressions only work on Regular Languages. This means that a regular expression can find things of the sort "any combination of a's and b's".(ab or babbabaaa etc) But they can't find "n a's, one b, n a's".(a^n b a^n) Regular expressions can't guarantee that the first set of a's matches the second set of a's.

正则表达式只适用于正则语言。这意味着正则表达式可以找到“a和b的任意组合”之类的东西。(ab或babbabaaa等)但是他们找不到“n a, 1 b, n a”。(a n ^ ^ n)正则表达式不能保证第一组与第二组的。

Because of this, they aren't able to match equal numbers of opening and closing parenthesis. It would be easy enough to write a function that traverses the string one character at a time. Have two counters, one for opening paren, one for closing. increment the pointers as you traverse the string, if opening_paren_count != closing_parent_count return false.

正因为如此,他们不能匹配相同数量的开口和结束括号。编写一次遍历字符串一个字符的函数就足够简单了。有两个柜台,一个用来开门,一个用来关门。如果opening__count != closing_parent_count返回false,则在遍历字符串时增加指针。




You can use that, but if you're working with .NET, there may be better alternatives.


This part you already know:


 func[a-zA-Z0-9_]*\( --weird part-- \)

The --weird part-- part just means; ( allow any character ., or | any section (.*) to exist as many times as it wants )*. The only issue is, you can't match any character ., you have to use [^()] to exclude the parenthesis.

