如何在PostgreSQL中将包含数值的字符串拆分为三个部分？

I want to split a given string which could possibly contain a numeric value, using regexp_matches(). It should identify the first occurrence of a numeric value containing an optional sign and optional decimal places. The non matching parts should be returned as well - as first and last positions of the array.

我想使用regexp_matches()拆分一个可能包含数值的给定字符串。它应该标识第一次出现的包含可选符号和可选小数位的数值。应该返回不匹配的部分 - 作为数组的第一个和最后一个位置。

Some example input and expected output values:

一些示例输入和预期输出值:

'hello+111123.454545world' -> {hello,+111123.454545,world}
'he-lo+111123.454545world' -> {he-lo,+111123.454545,world}
'hel123.5lo+111123.454545world' -> {hel,123.5,lo+111123.454545world}
'hello+111123.454545world' -> {hello,+111123.454545,world}
'hello+111123.454545world' -> {hello,+111123.454545,world}
'1111.15' -> {"",1111.15,""}
'-.234' -> {"",-.234,""}
'hello-.234' -> {hello,-.234,""}

I'm having trouble with the first part of the match group in the following expression represented by 'TODO'. It is supposed to match anything that cannot be identified as a numeric value.

我在以下由'TODO'表示的表达式中遇到匹配组的第一部分时遇到问题。它应该匹配任何无法识别为数值的东西。

select regexp_matches('input', '(TODO)((?:\+|-)?(?:\d*(?:(?:\.)?\d+)))(.*)')

The match group represented by '(TODO)' needs to be the negation of the regular expression in the second match group. (As the result is required to be returned). The regex for matching the numeric value works fine, and what I need is how to match the first part of the string which is not a numeric value.

由'(TODO)'表示的匹配组需要是第二个匹配组中正则表达式的否定。 (因此需要返回结果)。匹配数值的正则表达式工作正常,我需要的是如何匹配字符串的第一部分,而不是数值。

4 个解决方案

#1

regexp_matches(input, '(^.*?)([+-]?\d*\.?\d+)(.*$)') AS result_arr

1st match: (^.*?)
Anchored to the start of the string with ^. The non-greedy quantifier *? is crucial.
It actually doesn't have to be the negation of the regular expression in the second match group because the rest of the regular expression is greedy. So the first part is what remains, defined by the rest.

第一场比赛:(^。*?)用^锚定到字符串的开头。非贪心量词*?至关重要。它实际上不必是第二个匹配组中正则表达式的否定,因为正则表达式的其余部分是贪婪的。所以第一部分是剩下的,由其他部分定义。
2nd match: ([+-]?\d*?\.?\d+)
I simplified your expression somewhat. In particular a character class [+-] is shorter and faster than two branches in non-capturing parentheses (?:\+|-).
Non-capturing parentheses are important. (You already had that.)
_{Simplified \d* after comment from @maraca.}

第二场比赛:([+ - ]?\ d *?\。?\ d +)我在某种程度上简化了你的表达。特别是字符类[+ - ]比非捕获括号中的两个分支(?:\ + | - )更短更快。非捕获括号很重要。 (你已经有了。)在@maraca评论后简化\ d *。
3rd match: (.*$)
Anchored to the end of the string with $. For the last match, make the quantifier greedy.

第3场比赛:(。* $)用$固定到字符串的末尾。对于最后一场比赛,让量词变得贪婪。

SQL Fiddle with extended test case.

SQL小提琴扩展测试用例。

#2

I think this regex will give you what you want: /'(.*?)([+\-]?[0-9\.]+)(.*?)'/g

我认为这个正则表达式会给你你想要的东西:/'(.*?)([+ - ]?[0-9 \。] + )(.*?)'/ g

Example at: https://regex101.com/r/nF5qV7/1

示例:https://regex101.com/r/nF5qV7/1

#3

Try this:

(.*?)((?:\+|-)?(?:\d*(?:(?:\.)?\d+)))(.*)

#4

Here is the correct regex, assuming there has to be at least one digit after the dot:

这是正确的正则表达式,假设点后面必须至少有一个数字:

(.*?)([+-]?[0-9]*\.[0-9]+)(.*)

Or with optional dot, matches 1. , .7 , +.8, -4 , 0.0 , 42 , ...

或者使用可选的点,匹配1.,。7,+ 8,-4,0.0,42,......

(.*?)([+-]?(?:\.[0-9]+|[0-9]+\.?[0-9]*))(.*)

#1

regexp_matches(input, '(^.*?)([+-]?\d*\.?\d+)(.*$)') AS result_arr

1st match: (^.*?)
Anchored to the start of the string with ^. The non-greedy quantifier *? is crucial.
It actually doesn't have to be the negation of the regular expression in the second match group because the rest of the regular expression is greedy. So the first part is what remains, defined by the rest.

第一场比赛:(^。*?)用^锚定到字符串的开头。非贪心量词*?至关重要。它实际上不必是第二个匹配组中正则表达式的否定,因为正则表达式的其余部分是贪婪的。所以第一部分是剩下的,由其他部分定义。
2nd match: ([+-]?\d*?\.?\d+)
I simplified your expression somewhat. In particular a character class [+-] is shorter and faster than two branches in non-capturing parentheses (?:\+|-).
Non-capturing parentheses are important. (You already had that.)
_{Simplified \d* after comment from @maraca.}

第二场比赛:([+ - ]?\ d *?\。?\ d +)我在某种程度上简化了你的表达。特别是字符类[+ - ]比非捕获括号中的两个分支(?:\ + | - )更短更快。非捕获括号很重要。 (你已经有了。)在@maraca评论后简化\ d *。
3rd match: (.*$)
Anchored to the end of the string with $. For the last match, make the quantifier greedy.

第3场比赛:(。* $)用$固定到字符串的末尾。对于最后一场比赛,让量词变得贪婪。