Regex在括号之间提取字符串,该字符串还包含其他括号。

时间:2022-09-13 16:45:02

I've been trying to figure this out, but I don't think I understand Regex well enough to get to where I need to.

我一直在努力想办法解决这个问题,但我不认为我能很好地理解Regex,从而到达我需要的地方。

I have string that resemble these:

我有一个类似的字符串:

filename.txt(1)attribute, 2)attribute(s), more!)
otherfile.txt(abc, def)

Basically, a string that always starts with a filename, then has some text between parentheses. And I'm trying to extract that part which is between the main parentheses, but the text that's there can contain absolutely anything, even some more parentheses (it often does.)

基本上,一个总是以文件名开头的字符串,然后在括号之间有一些文本。我试着把那个在主圆括号中间的部分提取出来,但是那里的文本可以包含任何内容,甚至更多的圆括号(它通常是这样的)

Originally, there was a 'hacky' expression made like this:

最初,有一个“陈腐”的表达是这样的:

/\(([^@]+)\)\g

And it worked, until we ran into a case where the input string contained a @ and we were stuck. Obviously...

它起作用了,直到我们遇到一个输入字符串包含@的情况,我们被困住了。很显然……

I can't change the way the strings are generated, it's always a filename, then some parentheses and something of unknown length and content inside.

我无法改变字符串的生成方式,它总是文件名,然后是一些圆括号,以及一些长度和内容未知的东西。

I'm hoping for a simple Regex expression, since I need this to work in both C# and in Perl -- is such a thing possible? Or does this require something more complex, like its own parsing method?

我希望有一个简单的Regex表达式,因为我需要它在c#和Perl中工作——这样的事情可能吗?或者,这是否需要更复杂的东西,比如它自己的解析方法?

2 个解决方案

#1


2  

You can change exception for @ symbol in your regex to regex matches any characters and add quantifier that matches from 0 to infinity symbols. And also simplify your regex by deleting group construction:

您可以将regex中的@符号的异常更改为regex匹配任何字符,并添加从0到∞匹配的量词。还可以通过删除组结构来简化您的regex:

\(.*\)

Here is the explanation for the regular expression:

下面是正则表达式的解释:

  • Symbol \( matches the character ( literally.
  • 符号\(匹配字符)
  • .* matches any character (except for line terminators)
  • .*匹配任何字符(行终止符除外)
  • * quantifier matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
  • *量词匹配在零到无限次之间,尽可能多地匹配,根据需要返回(贪婪)
  • \) matches the character ) literally.
  • \)匹配字符)字面上。

You can use regex101 to compose and debug your regular expressions.

可以使用regex101编写和调试正则表达式。

#2


0  

Regex seems overkill to me in this case. Can be more reliably achieved using string manipulation methods.

在我看来,Regex似乎有点过头了。使用字符串操作方法可以更可靠地实现。

int first = str.IndexOf("(");
int last = str.LastIndexOf(")");
if (first != -1 && last != -1)
{
    string subString = str.Substring(first + 1, last - first - 1);
}

I've never used Perl, but I'll venture a guess that it has equivalent methods.

我从未使用过Perl,但我可以大胆地猜测它有相同的方法。

#1


2  

You can change exception for @ symbol in your regex to regex matches any characters and add quantifier that matches from 0 to infinity symbols. And also simplify your regex by deleting group construction:

您可以将regex中的@符号的异常更改为regex匹配任何字符,并添加从0到∞匹配的量词。还可以通过删除组结构来简化您的regex:

\(.*\)

Here is the explanation for the regular expression:

下面是正则表达式的解释:

  • Symbol \( matches the character ( literally.
  • 符号\(匹配字符)
  • .* matches any character (except for line terminators)
  • .*匹配任何字符(行终止符除外)
  • * quantifier matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
  • *量词匹配在零到无限次之间,尽可能多地匹配,根据需要返回(贪婪)
  • \) matches the character ) literally.
  • \)匹配字符)字面上。

You can use regex101 to compose and debug your regular expressions.

可以使用regex101编写和调试正则表达式。

#2


0  

Regex seems overkill to me in this case. Can be more reliably achieved using string manipulation methods.

在我看来,Regex似乎有点过头了。使用字符串操作方法可以更可靠地实现。

int first = str.IndexOf("(");
int last = str.LastIndexOf(")");
if (first != -1 && last != -1)
{
    string subString = str.Substring(first + 1, last - first - 1);
}

I've never used Perl, but I'll venture a guess that it has equivalent methods.

我从未使用过Perl,但我可以大胆地猜测它有相同的方法。