用通配符匹配任何字符的正则表达式

时间:2021-06-04 09:38:17

I am new to regex and I am trying to come up with something that will match a text like below:

我是regex的新手,我正在尝试找到一些匹配如下文本的东西:

ABC: (z) jan 02 1999 \n

ABC: (z) 1999年1月02日\n

Notes:

注:

  • text will always begin with "ABC:"
  • 文本总是以“ABC:”开头
  • there may be zero, one or more spaces between ':' and (z).
  • 在':'和(z)之间可能有一个或多个空格。
  • Variations of (z) also possible - (zz), (zzzzzz).. etc but always a non-digit character enclosed in "()"
  • (z)的变化也可能——(zz)、(zzzzzz)…但总是包含在"()"中的非数字字符
  • there may be zero,one or more spaces between (z) and jan
  • 在(z)和jan之间可能有一个或多个空格。
  • jan could be jan, january, etc
  • jan可以是jan, january等等
  • date couldbe in any format and may/may not contain other text as part of it so I would really like to know if there is a regex I can use to capture anything and everything that is found between '(z)' and '\n'
  • 日期可以是任何格式,也可以/不可以包含其他文本作为它的一部分,所以我很想知道是否有一个regex可以用于捕获'(z)'和'\n'之间的所有内容

Any help is greatly appreciated! Thank you

非常感谢您的帮助!谢谢你!

3 个解决方案

#1


30  

The following should work:

以下工作:

ABC: *\([a-zA-Z]+\) *(.+)

Explanation:

解释:

ABC:            # match literal characters 'ABC:'
 *              # zero or more spaces
\([a-zA-Z]+\)   # one or more letters inside of parentheses
 *              # zero or more spaces
(.+)            # capture one or more of any character (except newlines)

To get your desired grouping based on the comments below, you can use the following:

要根据下面的评论获得所需的分组,您可以使用以下方法:

(ABC:) *(\([a-zA-Z]+\).+)

#2


4  

Without knowing the exact regex implementation you're making use of, I can only give general advice. (The syntax I will be perl as that's what I know, some languages will require tweaking)

如果不知道您正在使用的确切的regex实现,我只能给出一般的建议。(我将使用perl作为语法,因为我知道,有些语言需要进行调整)

Looking at ABC: (z) jan 02 1999 \n

看ABC: (z) 1999年1月02日

  • The first thing to match is ABC: So using our regex is /ABC:/
  • 首先要匹配的是ABC:所以使用我们的regex是/ABC:/。
  • You say ABC is always at the start of the string so /^ABC/ will ensure that ABC is at the start of the string.

    你说美国广播公司总是在字符串的开始/ ^校正/将确保在字符串的开始。

  • You can match spaces with the \s (note the case) directive. With all directives you can match one or more with + (or 0 or more with *)

    您可以将空格与\s(注意到案例)指令相匹配。使用所有指令,您可以将一个或多个指令与+(或多个指令与*)匹配

  • You need to escape the usage of ( and ) as it's a reserved character. so \(\)

    您需要避免使用(and),因为它是一个保留字符。所以\(\)

  • You can match any non space or newline character with .

    您可以匹配任何非空格或换行字符。

  • You can match anything at all with .* but you need to be careful you're not too greedy and capture everything.

    你可以和任何东西搭配。*但是你需要小心,你不是太贪心,什么都能捕捉到。

So in order to capture what you've asked. I would use /^ABC:\s*\(.+?\)\s*(.+)$/

为了抓住你的问题。我将使用/ ^ ABC:\ s * \(+ ? \)\ s *(. +)/美元

Which I read as:

我理解为:

Begins with ABC:

始于美国广播公司(ABC):

May have some spaces

可能有一些空间

has (

有(

has some characters

有一些字符

has )

)

may have some spaces

可能有一些空间

then capture everything until the end of the line (which is $).

然后捕获所有内容,直到行尾(即$)。

I highly recommend keeping a copy of the following laying about http://www.cheatography.com/davechild/cheat-sheets/regular-expressions/

我强烈建议保存以下关于http://www.cheatography.com/davechild/cheat-sheets/regular-expressions/的文章

#3


0  

This should fulfill your requirements.

这应该满足您的要求。

ABC:\s*(\(\D+\)\s*.*?)\\n

美国广播公司(ABC):\ s *(\ \ D + \ \ s *。* ?)\ \ n

Here it is with some tests http://www.regexplanet.com/cookbook/ahJzfnJlZ2V4cGxhbmV0LWhyZHNyDgsSBlJlY2lwZRiEjiUM/index.html

这里有一些测试http://www.reg解释器.com/cookbook/ahjzfnjlz2v4cgxhbmv0lwhyzhnydgbljly2lwzriejium/index.html

Futher reading on regular expressions: http://www.regular-expressions.info/characters.html

进一步阅读正则表达式:http://www.regular-expressions.info/characters.html

#1


30  

The following should work:

以下工作:

ABC: *\([a-zA-Z]+\) *(.+)

Explanation:

解释:

ABC:            # match literal characters 'ABC:'
 *              # zero or more spaces
\([a-zA-Z]+\)   # one or more letters inside of parentheses
 *              # zero or more spaces
(.+)            # capture one or more of any character (except newlines)

To get your desired grouping based on the comments below, you can use the following:

要根据下面的评论获得所需的分组,您可以使用以下方法:

(ABC:) *(\([a-zA-Z]+\).+)

#2


4  

Without knowing the exact regex implementation you're making use of, I can only give general advice. (The syntax I will be perl as that's what I know, some languages will require tweaking)

如果不知道您正在使用的确切的regex实现,我只能给出一般的建议。(我将使用perl作为语法,因为我知道,有些语言需要进行调整)

Looking at ABC: (z) jan 02 1999 \n

看ABC: (z) 1999年1月02日

  • The first thing to match is ABC: So using our regex is /ABC:/
  • 首先要匹配的是ABC:所以使用我们的regex是/ABC:/。
  • You say ABC is always at the start of the string so /^ABC/ will ensure that ABC is at the start of the string.

    你说美国广播公司总是在字符串的开始/ ^校正/将确保在字符串的开始。

  • You can match spaces with the \s (note the case) directive. With all directives you can match one or more with + (or 0 or more with *)

    您可以将空格与\s(注意到案例)指令相匹配。使用所有指令,您可以将一个或多个指令与+(或多个指令与*)匹配

  • You need to escape the usage of ( and ) as it's a reserved character. so \(\)

    您需要避免使用(and),因为它是一个保留字符。所以\(\)

  • You can match any non space or newline character with .

    您可以匹配任何非空格或换行字符。

  • You can match anything at all with .* but you need to be careful you're not too greedy and capture everything.

    你可以和任何东西搭配。*但是你需要小心,你不是太贪心,什么都能捕捉到。

So in order to capture what you've asked. I would use /^ABC:\s*\(.+?\)\s*(.+)$/

为了抓住你的问题。我将使用/ ^ ABC:\ s * \(+ ? \)\ s *(. +)/美元

Which I read as:

我理解为:

Begins with ABC:

始于美国广播公司(ABC):

May have some spaces

可能有一些空间

has (

有(

has some characters

有一些字符

has )

)

may have some spaces

可能有一些空间

then capture everything until the end of the line (which is $).

然后捕获所有内容,直到行尾(即$)。

I highly recommend keeping a copy of the following laying about http://www.cheatography.com/davechild/cheat-sheets/regular-expressions/

我强烈建议保存以下关于http://www.cheatography.com/davechild/cheat-sheets/regular-expressions/的文章

#3


0  

This should fulfill your requirements.

这应该满足您的要求。

ABC:\s*(\(\D+\)\s*.*?)\\n

美国广播公司(ABC):\ s *(\ \ D + \ \ s *。* ?)\ \ n

Here it is with some tests http://www.regexplanet.com/cookbook/ahJzfnJlZ2V4cGxhbmV0LWhyZHNyDgsSBlJlY2lwZRiEjiUM/index.html

这里有一些测试http://www.reg解释器.com/cookbook/ahjzfnjlz2v4cgxhbmv0lwhyzhnydgbljly2lwzriejium/index.html

Futher reading on regular expressions: http://www.regular-expressions.info/characters.html

进一步阅读正则表达式:http://www.regular-expressions.info/characters.html