r中的正则表达式替换没有特殊字符的字符串

时间:2022-09-13 07:44:23

i'm practicing my regex with r on a football schedule and can't figure this out

我正在练习我的正则表达式与足球时间表r,并不能解决这个问题

I'm essentially trying to change any home game to the string HOME. here is a snippet of the schedule_team dataframe that I am using:

我基本上试图将任何家庭游戏改为字符串HOME。这是我正在使用的schedule_team数据框的片段:

  Team   w1   w2   w3   w4   w5   w6   w7   w8   w9  w10  w11  w12  w13  w14
1  ARI   SD @NYG   SF  BYE @DEN  WSH @OAK  PHI @DAL  STL  DET @SEA @ATL   KC
2  ATL   NO @CIN   TB @MIN @NYG  CHI @BAL  DET  BYE  @TB @CAR  CLE  ARI  @GB
3  BAL  CIN  PIT @CLE  CAR @IND  @TB  ATL @CIN @PIT  TEN  BYE  @NO   SD @MIA

non home teams have a @ symbol to begin the string. home teams do not. using regex in python I believe all home teams can be selected with regex like: ^([A-Z])\w+ .. essentially saying begins with a capital. this doesn't work in R because of the \w among other errors.

非主队有一个@符号来开始字符串。主队没有。在python中使用正则表达式我相信所有的主队都可以选择正则表达式:^([A-Z])\ w + ..本质上是以大写字母开头。这在R中不起作用,因为\ w和其他错误一样。

Here is what I tried (and failed):

这是我尝试过的(并且失败了):

str_replace_all(as.matrix(schedule_teams), "[[^([A-Z])\w+]]", "HOME")

is there an easier way to change all home teams to HOME?

是否有更简单的方法将所有主队换成HOME?

thanks in advance

提前致谢

1 个解决方案

#1


5  

Your regular expression syntax is incorrect, you have it wrapped inside of cascading character classes and you are trying to use a capturing group inside of the class which causes the pattern to fail when it reaches the closing )

您的正则表达式语法不正确,您将它包含在级联字符类中,并且您尝试在类中使用捕获组,这会导致模式在到达结束时失败)

To be concise, your regular expression currently defines a set of characters (not what you want) then fails.

简而言之,您的正则表达式当前定义了一组字符(不是您想要的)然后失败。

[[^([A-Z]  # any character of: '[', '^', '(', '[', 'A' to 'Z' 

To fix this issue you need to remove the character classes and the capturing group that you have placed inside, making sure you double escape \w in your regular expression pattern and then it should work for you.

要解决此问题,您需要删除已放置的字符类和捕获组,确保在正则表达式模式中双重转义\ w,然后它应该适合您。

I tested this on my console and it worked fine.

我在我的控制台上测试了它,它工作正常。

> df[,-1] <- str_replace_all(as.matrix(df[,-1]), '^[A-Z]\\w+', 'HOME')
##   Team   w1   w2   w3   w4   w5   w6   w7   w8   w9  w10  w11  w12  w13  w14
## 1  ARI HOME @NYG HOME HOME @DEN HOME @OAK HOME @DAL HOME HOME @SEA @ATL HOME
## 2  ATL HOME @CIN HOME @MIN @NYG HOME @BAL HOME HOME  @TB @CAR HOME HOME  @GB
## 3  BAL HOME HOME @CLE HOME @IND  @TB HOME @CIN @PIT HOME HOME  @NO HOME @MIA

Aside from using the stringr library, you can do this using sub if you insist using a regular expression.

除了使用stringr库之外,如果您坚持使用正则表达式,则可以使用sub执行此操作。

> df[,-1] <- sub('^[A-Z]\\w+', 'HOME', as.matrix(df[,-1]))

And here is an approach without using regular expression:

这是一种不使用正则表达式的方法:

> m <- as.matrix(df[-1])
> m[substr(m,0,1) != '@'] <- 'HOME'
> cbind(df[1], m)
##   Team   w1   w2   w3   w4   w5   w6   w7   w8   w9  w10  w11  w12  w13  w14
## 1  ARI HOME @NYG HOME HOME @DEN HOME @OAK HOME @DAL HOME HOME @SEA @ATL HOME
## 2  ATL HOME @CIN HOME @MIN @NYG HOME @BAL HOME HOME  @TB @CAR HOME HOME  @GB
## 3  BAL HOME HOME @CLE HOME @IND  @TB HOME @CIN @PIT HOME HOME  @NO HOME @MIA

#1


5  

Your regular expression syntax is incorrect, you have it wrapped inside of cascading character classes and you are trying to use a capturing group inside of the class which causes the pattern to fail when it reaches the closing )

您的正则表达式语法不正确,您将它包含在级联字符类中,并且您尝试在类中使用捕获组,这会导致模式在到达结束时失败)

To be concise, your regular expression currently defines a set of characters (not what you want) then fails.

简而言之,您的正则表达式当前定义了一组字符(不是您想要的)然后失败。

[[^([A-Z]  # any character of: '[', '^', '(', '[', 'A' to 'Z' 

To fix this issue you need to remove the character classes and the capturing group that you have placed inside, making sure you double escape \w in your regular expression pattern and then it should work for you.

要解决此问题,您需要删除已放置的字符类和捕获组,确保在正则表达式模式中双重转义\ w,然后它应该适合您。

I tested this on my console and it worked fine.

我在我的控制台上测试了它,它工作正常。

> df[,-1] <- str_replace_all(as.matrix(df[,-1]), '^[A-Z]\\w+', 'HOME')
##   Team   w1   w2   w3   w4   w5   w6   w7   w8   w9  w10  w11  w12  w13  w14
## 1  ARI HOME @NYG HOME HOME @DEN HOME @OAK HOME @DAL HOME HOME @SEA @ATL HOME
## 2  ATL HOME @CIN HOME @MIN @NYG HOME @BAL HOME HOME  @TB @CAR HOME HOME  @GB
## 3  BAL HOME HOME @CLE HOME @IND  @TB HOME @CIN @PIT HOME HOME  @NO HOME @MIA

Aside from using the stringr library, you can do this using sub if you insist using a regular expression.

除了使用stringr库之外,如果您坚持使用正则表达式,则可以使用sub执行此操作。

> df[,-1] <- sub('^[A-Z]\\w+', 'HOME', as.matrix(df[,-1]))

And here is an approach without using regular expression:

这是一种不使用正则表达式的方法:

> m <- as.matrix(df[-1])
> m[substr(m,0,1) != '@'] <- 'HOME'
> cbind(df[1], m)
##   Team   w1   w2   w3   w4   w5   w6   w7   w8   w9  w10  w11  w12  w13  w14
## 1  ARI HOME @NYG HOME HOME @DEN HOME @OAK HOME @DAL HOME HOME @SEA @ATL HOME
## 2  ATL HOME @CIN HOME @MIN @NYG HOME @BAL HOME HOME  @TB @CAR HOME HOME  @GB
## 3  BAL HOME HOME @CLE HOME @IND  @TB HOME @CIN @PIT HOME HOME  @NO HOME @MIA