如何用正则表达式删除括号内的文本?

时间:2022-12-15 15:59:53

I'm trying to handle a bunch of files, and I need to alter then to remove extraneous information in the filenames; notably, I'm trying to remove text inside parentheses. For example:

我正在处理一堆文件,我需要修改然后删除文件名中的无关信息;值得注意的是,我试图删除括号内的文本。例如:

filename = "Example_file_(extra_descriptor).ext"

and I want to regex a whole bunch of files where the parenthetical expression might be in the middle or at the end, and of variable length.

我想要regex一系列文件括号表达式可能在中间或结尾,长度可变。

What would the regex look like? Perl or Python syntax would be preferred.

regex是什么样子的?最好使用Perl或Python语法。

9 个解决方案

#1


75  

s/\([^)]*\)//

So in Python, you'd do:

在Python中,你会这样做:

re.sub(r'\([^)]*\)', '', filename)

#2


19  

I would use:

我将使用:

\([^)]*\)

#3


12  

The pattern that matches substrings in paretheses having no other ( and ) characters in between (like (xyz 123) in Text (abc(xyz 123)) is

在paretheses中与子字符串相匹配的模式在文本中没有其他(和)字符(如xyz 123)。

\([^()]*\)

Details:

细节:

  • \( - an opening round bracket (note that in POSIX BRE, ( should be used, see sed example below)
  • \(-左括号)(注意在POSIX BRE中(应该使用,见下面的sed示例)
  • [^()]* - zero or more (due to the * Kleene star quantifier) characters other than those defined in the negated character class/POSIX bracket expression, that is, any chars other than ( and )
  • [^())* -零个或多个(由于*克林星号量词)字符以外在否定中定义字符类/ POSIX括号表达式,即,(和)以外的任何字符
  • \) - a closing round bracket (no escaping in POSIX BRE allowed)
  • \) -结束轮括号(不允许转义为POSIX BRE)

Removing code snippets:

删除代码片段:

  • JavaScript: string.replace(/\([^()]*\)/g, '')
  • JavaScript:. replace(/ \[^())* \)/ g,”)
  • PHP: preg_replace('~\([^()]*\)~', '', $string)
  • PHP:preg_replace(~ \([^())* \)~”,”,美元字符串)
  • Perl: $s =~ s/\([^()]*\)//g
  • Perl:$ = ~年代/ \[^())* \)/ / g
  • Python: re.sub(r'\([^()]*\)', '', s)
  • Python:re.sub(r的\[^())* \)“,”)
  • C#: Regex.Replace(str, @"\([^()]*\)", string.Empty)
  • c#:正则表达式。替换(str,@”\[^())* \)”,string.Empty)
  • VB.NET: Regex.Replace(str, "\([^()]*\)", "")
  • VB。净:正则表达式。替换(str,“\([^())* \)””、“)
  • Java: s.replaceAll("\\([^()]*\\)", "")
  • Java:s.replaceAll(“\ \[^())* \ \)”、“”)
  • Ruby: s.gsub(/\([^()]*\)/, '')
  • Ruby:s.gsub(/ \[^())* \)/,”)
  • R: gsub("\\([^()]*\\)", "", x)
  • R:gsub(“\ \[^())* \ \)”、“”,x)
  • Lua: string.gsub(s, "%([^()]*%)", "")
  • Lua:字符串。gsub(年代,“%([^())* %)””、“)
  • Bash/sed: sed 's/([^()]*)//g'
  • Bash / sed:sed /((^())*)/ / g’
  • Tcl: regsub -all {\([^()]*\)} $s "" result
  • Tcl:regsub - { \([^())* \)} $ s”的结果
  • C++ std::regex: std::regex_replace(s, std::regex(R"(\([^()]*\))"), "")
  • c++ std::正则表达式:std::regex_replace(年代,std::正则表达式(R”(\([^())* \))”)," ")
  • Objective-C:
    NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:@"\\([^()]*\\)" options:NSRegularExpressionCaseInsensitive error:&error]; NSString *modifiedString = [regex stringByReplacingMatchesInString:string options:0 range:NSMakeRange(0, [string length]) withTemplate:@""];
  • objective - c:NSRegularExpression * regex =[NSRegularExpression regularExpressionWithPattern:@”\ \([^())* \ \)”选项:NSRegularExpressionCaseInsensitive错误:& error);NSString *modifiedString = [regex stringByReplacingMatchesInString:string options:0 range:NSMakeRange(0, [string length]) withTemplate:@"];

#4


5  

If you don't absolutely need to use a regex, use consider using Perl's Text::Balanced to remove the parenthesis.

如果您不是绝对需要使用regex,请考虑使用Perl的Text:::Balanced来删除括号。

use Text::Balanced qw(extract_bracketed);

my ($extracted, $remainder, $prefix) = extract_bracketed( $filename, '()', '[^(]*' );

{   no warnings 'uninitialized';

    $filename = (defined $prefix or defined $remainder)
                ? $prefix . $remainder
                : $extracted;
}

You may be thinking, "Why do all this when a regex does the trick in one line?"

您可能会想,“为什么regex在一行中完成所有这些操作?”

$filename =~ s/\([^}]*\)//;

Text::Balanced handles nested parenthesis. So $filename = 'foo_(bar(baz)buz)).foo' will be extracted properly. The regex based solutions offered here will fail on this string. The one will stop at the first closing paren, and the other will eat them all.

文本:平衡处理嵌套的括号。所以$ filename = foo_(bar(baz)蜂鸣器))。foo'将被正确提取。这里提供的基于regex的解决方案将在这个字符串上失败。一个会在第一个关闭窗口停止,另一个会把它们全部吃掉。

$filename =~ s/([^}]*)//; # returns 'foo_buz)).foo'

$ = ~ s /文件名([^ }]*)/ /;#返回“foo_buz)). foo”

$filename =~ s/(.*)//; # returns 'foo_.foo'

$ filename = ~ s /(. *)/ /;#返回“foo_.foo”

# text balanced example returns 'foo_).foo'

#文本平衡示例返回'foo_).foo'

If either of the regex behaviors is acceptable, use a regex--but document the limitations and the assumptions being made.

如果任何一个regex行为都是可接受的,请使用regex——但要记录所做的限制和假设。

#5


3  

If a path may contain parentheses then the r'\(.*?\)' regex is not enough:

如果路径可能包含圆括号,则r'\(.*?\)' regex不够:

import os, re

def remove_parenthesized_chunks(path, safeext=True, safedir=True):
    dirpath, basename = os.path.split(path) if safedir else ('', path)
    name, ext = os.path.splitext(basename) if safeext else (basename, '')
    name = re.sub(r'\(.*?\)', '', name)
    return os.path.join(dirpath, name+ext)

By default the function preserves parenthesized chunks in directory and extention parts of the path.

默认情况下,该函数在路径的目录和扩展部分保留括号中的块。

Example:

例子:

>>> f = remove_parenthesized_chunks
>>> f("Example_file_(extra_descriptor).ext")
'Example_file_.ext'
>>> path = r"c:\dir_(important)\example(extra).ext(untouchable)"
>>> f(path)
'c:\\dir_(important)\\example.ext(untouchable)'
>>> f(path, safeext=False)
'c:\\dir_(important)\\example.ext'
>>> f(path, safedir=False)
'c:\\dir_\\example.ext(untouchable)'
>>> f(path, False, False)
'c:\\dir_\\example.ext'
>>> f(r"c:\(extra)\example(extra).ext", safedir=False)
'c:\\\\example.ext'

#6


2  

If you can stand to use sed (possibly execute from within your program, it'd be as simple as:

如果您可以使用sed(可能从您的程序中执行,它将简单到:

sed 's/(.*)//g'

#7


0  

>>> import re
>>> filename = "Example_file_(extra_descriptor).ext"
>>> p = re.compile(r'\([^)]*\)')
>>> re.sub(p, '', filename)
'Example_file_.ext'

#8


0  

Java code:

Java代码:

Pattern pattern1 = Pattern.compile("(\\_\\(.*?\\))");
System.out.println(fileName.replace(matcher1.group(1), ""));

#9


0  

For those who want to use Python, here's a simple routine that removes parenthesized substrings, including those with nested parentheses. Okay, it's not a regex, but it'll do the job!

对于那些想要使用Python的人来说,这里有一个简单的例程,可以删除括号括起来的子字符串。好吧,这不是regex,但它会起作用的!

def remove_nested_parens(input_str):
    """Returns a copy of 'input_str' with any parenthesized text removed. Nested parentheses are handled."""
    result = ''
    paren_level = 0
    for ch in input_str:
        if ch == '(':
            paren_level += 1
        elif (ch == ')') and paren_level:
            paren_level -= 1
        elif not paren_level:
            result += ch
    return result

remove_nested_parens('example_(extra(qualifier)_text)_test(more_parens).ext')

#1


75  

s/\([^)]*\)//

So in Python, you'd do:

在Python中,你会这样做:

re.sub(r'\([^)]*\)', '', filename)

#2


19  

I would use:

我将使用:

\([^)]*\)

#3


12  

The pattern that matches substrings in paretheses having no other ( and ) characters in between (like (xyz 123) in Text (abc(xyz 123)) is

在paretheses中与子字符串相匹配的模式在文本中没有其他(和)字符(如xyz 123)。

\([^()]*\)

Details:

细节:

  • \( - an opening round bracket (note that in POSIX BRE, ( should be used, see sed example below)
  • \(-左括号)(注意在POSIX BRE中(应该使用,见下面的sed示例)
  • [^()]* - zero or more (due to the * Kleene star quantifier) characters other than those defined in the negated character class/POSIX bracket expression, that is, any chars other than ( and )
  • [^())* -零个或多个(由于*克林星号量词)字符以外在否定中定义字符类/ POSIX括号表达式,即,(和)以外的任何字符
  • \) - a closing round bracket (no escaping in POSIX BRE allowed)
  • \) -结束轮括号(不允许转义为POSIX BRE)

Removing code snippets:

删除代码片段:

  • JavaScript: string.replace(/\([^()]*\)/g, '')
  • JavaScript:. replace(/ \[^())* \)/ g,”)
  • PHP: preg_replace('~\([^()]*\)~', '', $string)
  • PHP:preg_replace(~ \([^())* \)~”,”,美元字符串)
  • Perl: $s =~ s/\([^()]*\)//g
  • Perl:$ = ~年代/ \[^())* \)/ / g
  • Python: re.sub(r'\([^()]*\)', '', s)
  • Python:re.sub(r的\[^())* \)“,”)
  • C#: Regex.Replace(str, @"\([^()]*\)", string.Empty)
  • c#:正则表达式。替换(str,@”\[^())* \)”,string.Empty)
  • VB.NET: Regex.Replace(str, "\([^()]*\)", "")
  • VB。净:正则表达式。替换(str,“\([^())* \)””、“)
  • Java: s.replaceAll("\\([^()]*\\)", "")
  • Java:s.replaceAll(“\ \[^())* \ \)”、“”)
  • Ruby: s.gsub(/\([^()]*\)/, '')
  • Ruby:s.gsub(/ \[^())* \)/,”)
  • R: gsub("\\([^()]*\\)", "", x)
  • R:gsub(“\ \[^())* \ \)”、“”,x)
  • Lua: string.gsub(s, "%([^()]*%)", "")
  • Lua:字符串。gsub(年代,“%([^())* %)””、“)
  • Bash/sed: sed 's/([^()]*)//g'
  • Bash / sed:sed /((^())*)/ / g’
  • Tcl: regsub -all {\([^()]*\)} $s "" result
  • Tcl:regsub - { \([^())* \)} $ s”的结果
  • C++ std::regex: std::regex_replace(s, std::regex(R"(\([^()]*\))"), "")
  • c++ std::正则表达式:std::regex_replace(年代,std::正则表达式(R”(\([^())* \))”)," ")
  • Objective-C:
    NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:@"\\([^()]*\\)" options:NSRegularExpressionCaseInsensitive error:&error]; NSString *modifiedString = [regex stringByReplacingMatchesInString:string options:0 range:NSMakeRange(0, [string length]) withTemplate:@""];
  • objective - c:NSRegularExpression * regex =[NSRegularExpression regularExpressionWithPattern:@”\ \([^())* \ \)”选项:NSRegularExpressionCaseInsensitive错误:& error);NSString *modifiedString = [regex stringByReplacingMatchesInString:string options:0 range:NSMakeRange(0, [string length]) withTemplate:@"];

#4


5  

If you don't absolutely need to use a regex, use consider using Perl's Text::Balanced to remove the parenthesis.

如果您不是绝对需要使用regex,请考虑使用Perl的Text:::Balanced来删除括号。

use Text::Balanced qw(extract_bracketed);

my ($extracted, $remainder, $prefix) = extract_bracketed( $filename, '()', '[^(]*' );

{   no warnings 'uninitialized';

    $filename = (defined $prefix or defined $remainder)
                ? $prefix . $remainder
                : $extracted;
}

You may be thinking, "Why do all this when a regex does the trick in one line?"

您可能会想,“为什么regex在一行中完成所有这些操作?”

$filename =~ s/\([^}]*\)//;

Text::Balanced handles nested parenthesis. So $filename = 'foo_(bar(baz)buz)).foo' will be extracted properly. The regex based solutions offered here will fail on this string. The one will stop at the first closing paren, and the other will eat them all.

文本:平衡处理嵌套的括号。所以$ filename = foo_(bar(baz)蜂鸣器))。foo'将被正确提取。这里提供的基于regex的解决方案将在这个字符串上失败。一个会在第一个关闭窗口停止,另一个会把它们全部吃掉。

$filename =~ s/([^}]*)//; # returns 'foo_buz)).foo'

$ = ~ s /文件名([^ }]*)/ /;#返回“foo_buz)). foo”

$filename =~ s/(.*)//; # returns 'foo_.foo'

$ filename = ~ s /(. *)/ /;#返回“foo_.foo”

# text balanced example returns 'foo_).foo'

#文本平衡示例返回'foo_).foo'

If either of the regex behaviors is acceptable, use a regex--but document the limitations and the assumptions being made.

如果任何一个regex行为都是可接受的,请使用regex——但要记录所做的限制和假设。

#5


3  

If a path may contain parentheses then the r'\(.*?\)' regex is not enough:

如果路径可能包含圆括号,则r'\(.*?\)' regex不够:

import os, re

def remove_parenthesized_chunks(path, safeext=True, safedir=True):
    dirpath, basename = os.path.split(path) if safedir else ('', path)
    name, ext = os.path.splitext(basename) if safeext else (basename, '')
    name = re.sub(r'\(.*?\)', '', name)
    return os.path.join(dirpath, name+ext)

By default the function preserves parenthesized chunks in directory and extention parts of the path.

默认情况下,该函数在路径的目录和扩展部分保留括号中的块。

Example:

例子:

>>> f = remove_parenthesized_chunks
>>> f("Example_file_(extra_descriptor).ext")
'Example_file_.ext'
>>> path = r"c:\dir_(important)\example(extra).ext(untouchable)"
>>> f(path)
'c:\\dir_(important)\\example.ext(untouchable)'
>>> f(path, safeext=False)
'c:\\dir_(important)\\example.ext'
>>> f(path, safedir=False)
'c:\\dir_\\example.ext(untouchable)'
>>> f(path, False, False)
'c:\\dir_\\example.ext'
>>> f(r"c:\(extra)\example(extra).ext", safedir=False)
'c:\\\\example.ext'

#6


2  

If you can stand to use sed (possibly execute from within your program, it'd be as simple as:

如果您可以使用sed(可能从您的程序中执行,它将简单到:

sed 's/(.*)//g'

#7


0  

>>> import re
>>> filename = "Example_file_(extra_descriptor).ext"
>>> p = re.compile(r'\([^)]*\)')
>>> re.sub(p, '', filename)
'Example_file_.ext'

#8


0  

Java code:

Java代码:

Pattern pattern1 = Pattern.compile("(\\_\\(.*?\\))");
System.out.println(fileName.replace(matcher1.group(1), ""));

#9


0  

For those who want to use Python, here's a simple routine that removes parenthesized substrings, including those with nested parentheses. Okay, it's not a regex, but it'll do the job!

对于那些想要使用Python的人来说,这里有一个简单的例程,可以删除括号括起来的子字符串。好吧,这不是regex,但它会起作用的!

def remove_nested_parens(input_str):
    """Returns a copy of 'input_str' with any parenthesized text removed. Nested parentheses are handled."""
    result = ''
    paren_level = 0
    for ch in input_str:
        if ch == '(':
            paren_level += 1
        elif (ch == ')') and paren_level:
            paren_level -= 1
        elif not paren_level:
            result += ch
    return result

remove_nested_parens('example_(extra(qualifier)_text)_test(more_parens).ext')