使用R分析条目,如果是数字,则删除后缀

时间:2022-12-30 22:54:47

I'm a beginner in R programming language, and I'm using RStudio to work on this project I have. My dataframe has a column for the zone of the mall, but some zones are actually subzones of a bigger zone, so they are called something like: Ikea 1, Ikea 2, Ikea 3, etc. I want to create a new column with the bigger zone for each entry.

我是R编程语言的初学者,我正在使用RStudio来处理我的这个项目。我的数据框有一个商场区域的列,但有些区域实际上是更大区域的子区域,因此它们被称为:Ikea 1,Ikea 2,Ikea 3等。我想创建一个新的列,每个条目的更大区域。

The dataframe looks like this:

数据框如下所示:

ID    ENTRY      ZONE                       
1     13:39:40   Casual Dinnerware
2     15:28:43   Van Thiel 3   
3     10:41:05   Caracole 7
4     16:37:31   Entrance

I want to add a new column that has the "mother" zone, in case it is a subzone, for the given example, I want something like:

我想添加一个具有“母亲”区域的新列,如果它是一个子区域,对于给定的示例,我想要类似的东西:

ID    ENTRY      ZONE                NEW ZONE        
1     13:39:40   Casual Dinnerware   Casual Dinneware
2     15:28:43   Van Thiel 3         Van Thiel
3     10:41:05   Caracole 7          Caracole
4     16:37:31   Entrance            Entrance

Note that not every zone is a subzone!

请注意,并非每个区域都是子区域!

My ideia was to analyse each entry and if the zone ended with a number, I would remove the number and write the rest in the new column. I already read a few questions that I thought that would help, related to regular expressions and all (like this one), but I couldn't get this to work.

我的意思是分析每个条目,如果区域以数字结尾,我会删除该数字并将其余部分写入新列。我已经阅读了一些我认为会有所帮助的问题,这些问题与正则表达式和所有问题(比如这个问题)有关,但我无法解决这个问题。

Thank you for your time, if you have any questions, let me know!

感谢您的时间,如果您有任何疑问,请告诉我们!

1 个解决方案

#1


2  

As brittenb said:

正如brittenb所说:

df$NEW_ZONE = gsub("\\s\\d+$", "", df$ZONE) will do the trick for you. \\s is a space, \\d is a number, and $ indicates the end of the string, which is important to ensure that numbers which are part of the bigger zone aren't included.

df $ NEW_ZONE = gsub(“\\ s \\ d + $”,“”,df $ ZONE)会为你做的伎俩。 \\ s是空格,\\ d是数字,$表示字符串的结尾,这对于确保不包括属于较大区域的数字很重要。

This solved my problem, thank you.

这解决了我的问题,谢谢。

#1


2  

As brittenb said:

正如brittenb所说:

df$NEW_ZONE = gsub("\\s\\d+$", "", df$ZONE) will do the trick for you. \\s is a space, \\d is a number, and $ indicates the end of the string, which is important to ensure that numbers which are part of the bigger zone aren't included.

df $ NEW_ZONE = gsub(“\\ s \\ d + $”,“”,df $ ZONE)会为你做的伎俩。 \\ s是空格,\\ d是数字,$表示字符串的结尾,这对于确保不包括属于较大区域的数字很重要。

This solved my problem, thank you.

这解决了我的问题,谢谢。