在R的两个特定单词之间提取一串单词。

时间:2022-09-13 11:41:42

I have the following string : "PRODUCT colgate good but not goodOKAY"

我有以下字符串:“高露洁产品不错,但不是goodOKAY”

I want to extract all the words between PRODUCT and OKAY

我想把所有的单词都提取出来。

4 个解决方案

#1


16  

This can be done with sub:

这可以用sub:

s <- "PRODUCT colgate good but not goodOKAY"
sub(".*PRODUCT *(.*?) *OKAY.*", "\\1", s)

giving:

给:

[1] "colgate good but not good"

No packages are needed.

不需要包。

Here is a visualization of the regular expression:

下面是正则表达式的可视化:

.*PRODUCT *(.*?) *OKAY.*

在R的两个特定单词之间提取一串单词。

Debuggex Demo

Debuggex演示

#2


12  

You can use gsub:

您可以使用gsub:

vec <- "PRODUCT colgate good but not goodOKAY"

gsub(".*PRODUCT\\s*|OKAY.*", "", vec)
# [1] "colgate good but not good"

#3


11  

x = "PRODUCT colgate good but not goodOKAY"
library(stringr)
str_extract(string = x, pattern = perl("(?<=PRODUCT).*(?=OKAY)"))

(?<=PRODUCT) -- look behind the match for PRODUCT

(?<=PRODUCT)——查找匹配的产品

.* match everything except new lines.

.*除了换行外,其他都匹配。

(?=OKAY) -- look ahead to match OKAY.

(?=好的)——向前看,好。

I should add you don't need the stingr package for this, the base functions sub and gsub work fine. I use stringr for it's consistency of syntax: whether I'm extracting, replacing, detecting etc. the function names are predictable and understandable, and the arguments are in a consistent order. I use stringr because it saves me from needing the documentation every time.

我应该补充一点,你不需要这个吝啬鬼包,base function sub和gsub都很好。我使用stringr表示语法的一致性:是否提取、替换、检测等等,函数名是可以预测和理解的,参数是一致的。我使用stringr是因为它避免了我每次都需要文档。

#4


7  

You could use the rm_between function from the qdapRegex package. It takes a string and a left and right boundary as follows:

您可以使用qdapRegex包中的rm_between函数。它取一个字符串和一个左右边界如下:

x <- "PRODUCT colgate good but not goodOKAY"

library(qdapRegex)
rm_between(x, "PRODUCT", "OKAY", extract=TRUE)

## [[1]]
## [1] "colgate good but not good"

#1


16  

This can be done with sub:

这可以用sub:

s <- "PRODUCT colgate good but not goodOKAY"
sub(".*PRODUCT *(.*?) *OKAY.*", "\\1", s)

giving:

给:

[1] "colgate good but not good"

No packages are needed.

不需要包。

Here is a visualization of the regular expression:

下面是正则表达式的可视化:

.*PRODUCT *(.*?) *OKAY.*

在R的两个特定单词之间提取一串单词。

Debuggex Demo

Debuggex演示

#2


12  

You can use gsub:

您可以使用gsub:

vec <- "PRODUCT colgate good but not goodOKAY"

gsub(".*PRODUCT\\s*|OKAY.*", "", vec)
# [1] "colgate good but not good"

#3


11  

x = "PRODUCT colgate good but not goodOKAY"
library(stringr)
str_extract(string = x, pattern = perl("(?<=PRODUCT).*(?=OKAY)"))

(?<=PRODUCT) -- look behind the match for PRODUCT

(?<=PRODUCT)——查找匹配的产品

.* match everything except new lines.

.*除了换行外,其他都匹配。

(?=OKAY) -- look ahead to match OKAY.

(?=好的)——向前看,好。

I should add you don't need the stingr package for this, the base functions sub and gsub work fine. I use stringr for it's consistency of syntax: whether I'm extracting, replacing, detecting etc. the function names are predictable and understandable, and the arguments are in a consistent order. I use stringr because it saves me from needing the documentation every time.

我应该补充一点,你不需要这个吝啬鬼包,base function sub和gsub都很好。我使用stringr表示语法的一致性:是否提取、替换、检测等等,函数名是可以预测和理解的,参数是一致的。我使用stringr是因为它避免了我每次都需要文档。

#4


7  

You could use the rm_between function from the qdapRegex package. It takes a string and a left and right boundary as follows:

您可以使用qdapRegex包中的rm_between函数。它取一个字符串和一个左右边界如下:

x <- "PRODUCT colgate good but not goodOKAY"

library(qdapRegex)
rm_between(x, "PRODUCT", "OKAY", extract=TRUE)

## [[1]]
## [1] "colgate good but not good"