如何在R中添加数据帧中的列

时间:2022-11-29 09:12:02

I have imported data from a file into a data frame in R. It is something like this.

我已经从一个文件中导入数据到r中的一个数据框架中。

Name      Count   Category
A         100     Cat1
C         10      Cat2
D         40      Cat1 
E         30      Cat3
H         3       Cat3
Z         20      Cat2
M         50      Cat10

So now i want to add the Category column depending on the values in the column Name. So something like if Name = (A, D), Category = 'Cat1' etc.

现在我要根据列名中的值添加Category列。比如Name = (A, D), Category = 'Cat1'等等。

This is only a simple example I am giving. I have a large number of Names and Categories so I want a compact syntax. How can I do this?

这只是我举的一个简单的例子。我有大量的名称和类别,所以我想要一个简洁的语法。我该怎么做呢?

Edit: I've changed the example to better suit my needs as the name can be anything not numeric. Sorry for not being too clear before.

编辑:我已经更改了示例,以更好地满足我的需要,因为名称可以不是数字的。对不起,我之前讲得不太清楚。

5 个解决方案

#1


2  

You can use a map. (UPDATED to use stringsAsFactors = FALSE)

你可以用地图。(更新为使用stringsAsFactors = FALSE)

df <- data.frame( Name = c('A', 'C', 'D', 'E', 'H', 'Z', 'M'), 
                  Count = c(100,10,40,30,3,20,50), stringsAsFactors = FALSE)
Categories <- list(Cat1 = c('A','D'), 
                   Cat2 = c('C','Z'), 
                   Cat3 = c('E','H'), 
                   Cat10 = 'M')
nams <- names( Categories )
nums <- sapply(Categories, length)
CatMap <- unlist( Map( rep, nams, nums ) )
names(CatMap) <- unlist( Categories )

df <- transform( df, Category = CatMap[ Name ])

#2


3  

You can use ifelse. If your data frame were called df you would do:

您可以使用ifelse。如果你的数据框被称为df,你会这样做:

df$cat <- ifelse(df$name<100, "Ones", "Hundreds")
df$cat <- ifelse(df$name<1000, df$cat, "Thousands")

#3


2  

[Update following the OP's comment and altered Q]

[更新如下OP的评论和修改的Q]

DF <- data.frame(Name = c("A","C","D","E","H","Z","M"),
                 Count = c(100,10,40,30,3,20,50), stringsAsFactors = FALSE)
lookup <- data.frame(Name = c("A","C","D","E","H","Z","M"),
                     Category = paste("Cat", c(1,2,1,3,3,2,10), sep = ""),
                     stringsAsFactors = FALSE)

Using the above data frames, we can do a data base merge. You need to set-up lookup for the Name Category combinations you want, which is OK if there aren't a very large number of Names (At least you only need to list them once each in lookup and you don't have to do it in order - list all Cat1 Names first, etc):

使用上面的数据帧,我们可以进行数据库合并。你需要设置查找类别名称组合你想要的,这是好如果没有大量的名称(至少你只需要列出他们曾经在查找每个和你不需要这样做,所有Cat1名单第一,等等):

> merge(DF, lookup, by = "Name")
  Name Count Category
1    A   100     Cat1
2    C    10     Cat2
3    D    40     Cat1
4    E    30     Cat3
5    H     3     Cat3
6    M    50    Cat10
7    Z    20     Cat2
> merge(DF, lookup, by = "Name", sort = FALSE)
  Name Count Category
1    A   100     Cat1
2    C    10     Cat2
3    D    40     Cat1
4    E    30     Cat3
5    H     3     Cat3
6    Z    20     Cat2
7    M    50    Cat10

One option is indexing:

一种选择是索引:

foo <- function(x) {
    out <- character(length = length(x))
    chars <- c("Ones", "Tens", "Hundreds", "Thousands")
    out[x < 10] <- chars[1]
    out[x >= 10 & x < 100] <- chars[2]
    out[x >= 100 & x < 1000] <- chars[3]
    out[x >= 1000 & x < 10000] <- chars[4]
    return(factor(out, levels = chars))
}

An alternative that scales better is,

一个更好的选择是,

bar <- function(x, cats = c("Ones", "Tens", "Hundreds", "Thousands")) {
    out <- cats[floor(log10(x)) + 1]
    factor(out, levels = cats)
}

#4


0  

check out:

查看:

  • cut()
  • 削减()
  • recode() in the car package
  • 在汽车的包装上

#5


0  

Perhaps simpler and more readable using ifelse and %in%:

使用ifelse和%可能更简单,可读性更好:

df <- data.frame( Name = c('A', 'C', 'D', 'E', 'H', 'Z', 'M'), 
Count =c(100,10,40,30,3,20,50), stringsAsFactors = FALSE)

cat1 = c("A","D")
cat2 = c("C","Z")
cat3 = c("E","H")
cat10 = c("M")

df$Category = ifelse(df$Name %in% cat1, "Cat1",
              ifelse(df$Name %in% cat2, "Cat2",
              ifelse(df$Name %in% cat3, "Cat3",
              ifelse(df$Name %in% cat10, "Cat10",
              NA))))

   Name Count Category
1    A   100     Cat1
2    C    10     Cat2
3    D    40     Cat1
4    E    30     Cat3
5    H     3     Cat3
6    Z    20     Cat2
7    M    50    Cat10

#1


2  

You can use a map. (UPDATED to use stringsAsFactors = FALSE)

你可以用地图。(更新为使用stringsAsFactors = FALSE)

df <- data.frame( Name = c('A', 'C', 'D', 'E', 'H', 'Z', 'M'), 
                  Count = c(100,10,40,30,3,20,50), stringsAsFactors = FALSE)
Categories <- list(Cat1 = c('A','D'), 
                   Cat2 = c('C','Z'), 
                   Cat3 = c('E','H'), 
                   Cat10 = 'M')
nams <- names( Categories )
nums <- sapply(Categories, length)
CatMap <- unlist( Map( rep, nams, nums ) )
names(CatMap) <- unlist( Categories )

df <- transform( df, Category = CatMap[ Name ])

#2


3  

You can use ifelse. If your data frame were called df you would do:

您可以使用ifelse。如果你的数据框被称为df,你会这样做:

df$cat <- ifelse(df$name<100, "Ones", "Hundreds")
df$cat <- ifelse(df$name<1000, df$cat, "Thousands")

#3


2  

[Update following the OP's comment and altered Q]

[更新如下OP的评论和修改的Q]

DF <- data.frame(Name = c("A","C","D","E","H","Z","M"),
                 Count = c(100,10,40,30,3,20,50), stringsAsFactors = FALSE)
lookup <- data.frame(Name = c("A","C","D","E","H","Z","M"),
                     Category = paste("Cat", c(1,2,1,3,3,2,10), sep = ""),
                     stringsAsFactors = FALSE)

Using the above data frames, we can do a data base merge. You need to set-up lookup for the Name Category combinations you want, which is OK if there aren't a very large number of Names (At least you only need to list them once each in lookup and you don't have to do it in order - list all Cat1 Names first, etc):

使用上面的数据帧,我们可以进行数据库合并。你需要设置查找类别名称组合你想要的,这是好如果没有大量的名称(至少你只需要列出他们曾经在查找每个和你不需要这样做,所有Cat1名单第一,等等):

> merge(DF, lookup, by = "Name")
  Name Count Category
1    A   100     Cat1
2    C    10     Cat2
3    D    40     Cat1
4    E    30     Cat3
5    H     3     Cat3
6    M    50    Cat10
7    Z    20     Cat2
> merge(DF, lookup, by = "Name", sort = FALSE)
  Name Count Category
1    A   100     Cat1
2    C    10     Cat2
3    D    40     Cat1
4    E    30     Cat3
5    H     3     Cat3
6    Z    20     Cat2
7    M    50    Cat10

One option is indexing:

一种选择是索引:

foo <- function(x) {
    out <- character(length = length(x))
    chars <- c("Ones", "Tens", "Hundreds", "Thousands")
    out[x < 10] <- chars[1]
    out[x >= 10 & x < 100] <- chars[2]
    out[x >= 100 & x < 1000] <- chars[3]
    out[x >= 1000 & x < 10000] <- chars[4]
    return(factor(out, levels = chars))
}

An alternative that scales better is,

一个更好的选择是,

bar <- function(x, cats = c("Ones", "Tens", "Hundreds", "Thousands")) {
    out <- cats[floor(log10(x)) + 1]
    factor(out, levels = cats)
}

#4


0  

check out:

查看:

  • cut()
  • 削减()
  • recode() in the car package
  • 在汽车的包装上

#5


0  

Perhaps simpler and more readable using ifelse and %in%:

使用ifelse和%可能更简单,可读性更好:

df <- data.frame( Name = c('A', 'C', 'D', 'E', 'H', 'Z', 'M'), 
Count =c(100,10,40,30,3,20,50), stringsAsFactors = FALSE)

cat1 = c("A","D")
cat2 = c("C","Z")
cat3 = c("E","H")
cat10 = c("M")

df$Category = ifelse(df$Name %in% cat1, "Cat1",
              ifelse(df$Name %in% cat2, "Cat2",
              ifelse(df$Name %in% cat3, "Cat3",
              ifelse(df$Name %in% cat10, "Cat10",
              NA))))

   Name Count Category
1    A   100     Cat1
2    C    10     Cat2
3    D    40     Cat1
4    E    30     Cat3
5    H     3     Cat3
6    Z    20     Cat2
7    M    50    Cat10