字符串之后/之前的R正则表达式提取数字

时间:2022-09-13 13:31:03

I am trying to construct a regex expression to identify string where the work "pack"/"pck"/"packs"/"Set" (case INsensitive) and if so (word exists), extract the number that precedes or follows this word. Examples:

我试图构建一个正则表达式的表达来识别字符串,其中作品“包” /“PCK” /“包” /“设置”(不区分大小写),如果是(字存在),提取之前或之后这个词的数量。例子:

"Fregon EcoClean Multipurpose Scrubber For Pots, Pans, Kitchen, and Bathroom, Green, 3-Pack" -> 3
Bathroom, Green, 3 Pack" -> 3
"Franklin Sports NHL Mini Hockey Goal Set of 2" ->2
"Make: Electronics Components Pack 2" -> 2
"Make: Electronics Components Pack of 2 -> 2

I tried using the following expression:

我尝试使用以下表达式:

sub(".*pack(\\d+).*", "\\1", "inflow100 pack6 distance12")

However, it doesn't comply to all cases mentioned above. Any ideas?

但是,它不符合上述所有情况。有任何想法吗?

2 个解决方案

#1


5  

The following regex matches all of the examples:

以下正则表达式匹配所有示例:

\b(?:(\d+)[-\s][Pp]ack|(?:[Pp]ack|[Ss]et)\s?(?:of\s)?(\d+))

See https://regex101.com/r/jZ4vE2/1

请参阅https://regex101.com/r/jZ4vE2/1

If you use it, you'll notice that the number is put in either \1 or \2. The only thing left to do, is to get rid of the preceding or following spaces.

如果您使用它,您会注意到该号码放在\ 1或\ 2中。剩下要做的唯一事情就是摆脱前面或后面的空格。

> gsub(".*\\b(?:(\\d+)[-\\s][Pp]ack|(?:[Pp]ack|[Ss]et)\\s?(?:of\\s)?(\\d+)).*", "\\1 \\2", "inflow100 pack6 distance12", perl=TRUE)
[1] " 6"

#2


1  

Just fetch the last number.

只需获取最后一个号码。

sub(".*\\b(\\d+).*", "\\1", str)

or

要么

sub("(\\d+)\\D*$|.", "\\1", str)

#1


5  

The following regex matches all of the examples:

以下正则表达式匹配所有示例:

\b(?:(\d+)[-\s][Pp]ack|(?:[Pp]ack|[Ss]et)\s?(?:of\s)?(\d+))

See https://regex101.com/r/jZ4vE2/1

请参阅https://regex101.com/r/jZ4vE2/1

If you use it, you'll notice that the number is put in either \1 or \2. The only thing left to do, is to get rid of the preceding or following spaces.

如果您使用它,您会注意到该号码放在\ 1或\ 2中。剩下要做的唯一事情就是摆脱前面或后面的空格。

> gsub(".*\\b(?:(\\d+)[-\\s][Pp]ack|(?:[Pp]ack|[Ss]et)\\s?(?:of\\s)?(\\d+)).*", "\\1 \\2", "inflow100 pack6 distance12", perl=TRUE)
[1] " 6"

#2


1  

Just fetch the last number.

只需获取最后一个号码。

sub(".*\\b(\\d+).*", "\\1", str)

or

要么

sub("(\\d+)\\D*$|.", "\\1", str)