使用regex在字符串中提取数字

时间:2022-09-13 16:20:08

I have a string as below

下面是一个字符串

"Temporada 2015"

“Temporada 2015”

and also I get string as

还有字符串a

"Temporada 8"

“Temporada 8”

I need to match and extract only numbers from the string 2015 and 8. How do i do it using regex. I tried like below

我需要匹配并从2015和8的字符串中提取数字。如何使用regex进行操作。我试着像下面

doc.text_at('header.headerInfo > h4 > b').match(/(Tempo).*(\d+)/)[2]

But it returned only 5 for first one instead of 2015. How do I match both and return only nos.??

但在2015年之前,它只获得了5分。我如何匹配两个和只返回no。?

5 个解决方案

#1


1  

You should add a ? to make the regex non-greedy:

你应该加上a吗?使regex不贪婪:

doc.text_at('header.headerInfo > h4 > b').match(/(Tempo).*?(\d+)/)[2];

Here is a sample program for verification.

这是一个用于验证的示例程序。

#2


2  

The .* is "greedy". It matches as many characters as it can. So it leaves just one digit for the \d+.

. *是“贪婪”。它匹配尽可能多的字符。它只留下一个数字作为\d+。

If your strings are known to contain no other numbers, you can just do

如果已知您的字符串不包含其他数字,您可以这样做

.scan(/\d+/).first

otherwise you can just match non-digit

否则你只能匹配非数字

.match(/(Tempo)[^\d]*(\d+)/)[2]

#3


1  

Because .* is greedy which matches all the characters as much as possible, so that it returns you the last digit where all the previous characters are greedily matched. By turning greedy .* to non-greedy .*?, it will do a shortest possible match which inturn give you the last number.

因为。*是贪心的,它尽可能地匹配所有的字符,这样它就会返回前一个字符贪婪匹配的最后一个数字。变成贪婪。*对非贪婪。*?,它会做一个最短的可能匹配,然后依次给你最后一个数字。

doc.text_at('header.headerInfo > h4 > b').match(/(Tempo).*?(\d+)/)[2]

#4


1  

You can scan directly for digits:

你可以直接扫描数字:

"Temporada 2015".scan(/\d+/)
# => ["2015"]
"Temporada 8".scan(/\d+/)
# => ["8"]

If you want to include Temp in regex:

如果你想在regex中包含临时雇员:

"Temporada 2015".scan(/Temp.*?(\d+)/)
# => [["2015"]]

Non regex way:

非正则表达式:

"Temporada 2015".split.detect{|e| e.to_i.to_s == e }
# => "2015"
"Temporada 8".split.detect{|e| e.to_i.to_s == e }
# => "8"

#5


0  

I'd write it thus:

我写:

r = /
    \b    # match a word-break (possibly beginning of string)
    Tempo # match these characters
    \D+   # match one or more characters other than digits
    \K    # forget everything matched so far
    \d+   # match one or more digits
   /x

"Temporada 2015"[r] #=> 2015
"Temporada 8"[r]    #=> 8
"Temporary followed by something else 21 then more"[r]
  #=> 21

If 'Tempo' must be at the beginning of the string, write r = /Tempo.... or r = /\s*Tempo... if it can be preceded by whitespace. I've written \D+ rather than \D* on the assumption that there should be at least one space.

如果“节奏”必须在字符串的开头,写r = /节奏....或者r = / \ s *节奏……如果前面可以有空格。我写了\D+而不是\D*,假设至少应该有一个空格。

I don't understand why 'Tempo' is in a capture group. Have I missed something?

我不明白为什么" Tempo "是在一个捕捉组里。我错过了什么吗?

#1


1  

You should add a ? to make the regex non-greedy:

你应该加上a吗?使regex不贪婪:

doc.text_at('header.headerInfo > h4 > b').match(/(Tempo).*?(\d+)/)[2];

Here is a sample program for verification.

这是一个用于验证的示例程序。

#2


2  

The .* is "greedy". It matches as many characters as it can. So it leaves just one digit for the \d+.

. *是“贪婪”。它匹配尽可能多的字符。它只留下一个数字作为\d+。

If your strings are known to contain no other numbers, you can just do

如果已知您的字符串不包含其他数字,您可以这样做

.scan(/\d+/).first

otherwise you can just match non-digit

否则你只能匹配非数字

.match(/(Tempo)[^\d]*(\d+)/)[2]

#3


1  

Because .* is greedy which matches all the characters as much as possible, so that it returns you the last digit where all the previous characters are greedily matched. By turning greedy .* to non-greedy .*?, it will do a shortest possible match which inturn give you the last number.

因为。*是贪心的,它尽可能地匹配所有的字符,这样它就会返回前一个字符贪婪匹配的最后一个数字。变成贪婪。*对非贪婪。*?,它会做一个最短的可能匹配,然后依次给你最后一个数字。

doc.text_at('header.headerInfo > h4 > b').match(/(Tempo).*?(\d+)/)[2]

#4


1  

You can scan directly for digits:

你可以直接扫描数字:

"Temporada 2015".scan(/\d+/)
# => ["2015"]
"Temporada 8".scan(/\d+/)
# => ["8"]

If you want to include Temp in regex:

如果你想在regex中包含临时雇员:

"Temporada 2015".scan(/Temp.*?(\d+)/)
# => [["2015"]]

Non regex way:

非正则表达式:

"Temporada 2015".split.detect{|e| e.to_i.to_s == e }
# => "2015"
"Temporada 8".split.detect{|e| e.to_i.to_s == e }
# => "8"

#5


0  

I'd write it thus:

我写:

r = /
    \b    # match a word-break (possibly beginning of string)
    Tempo # match these characters
    \D+   # match one or more characters other than digits
    \K    # forget everything matched so far
    \d+   # match one or more digits
   /x

"Temporada 2015"[r] #=> 2015
"Temporada 8"[r]    #=> 8
"Temporary followed by something else 21 then more"[r]
  #=> 21

If 'Tempo' must be at the beginning of the string, write r = /Tempo.... or r = /\s*Tempo... if it can be preceded by whitespace. I've written \D+ rather than \D* on the assumption that there should be at least one space.

如果“节奏”必须在字符串的开头,写r = /节奏....或者r = / \ s *节奏……如果前面可以有空格。我写了\D+而不是\D*,假设至少应该有一个空格。

I don't understand why 'Tempo' is in a capture group. Have I missed something?

我不明白为什么" Tempo "是在一个捕捉组里。我错过了什么吗?