使用特定的列表将变量分组到一个dataframe R中

时间:2022-09-20 07:39:44

I have the following lists:

我有以下清单:

  group1<-c("A", "B", "D")
  group2<-c("C", "E")
  group3<-c("F")

and a dataframe with values and corresponding names:

和具有值和相应名称的数据aframe:

  df <- data.frame (name=c("A","B","C","D","E","F"),value=c(1,2,3,4,5,6))
  df
    name value
  1    A     1
  2    B     2
  3    C     3
  4    D     4
  5    E     5
  6    F     6

I'd like to group the data based on the lists, using the name column;

我想根据列表来分组数据,使用name列;

  df
    name value    group
  1    A     1   group1
  2    B     2   group1
  3    C     3   group2
  4    D     4   group1
  5    E     5   group2
  6    F     6   group3

and sum the values for each group.

并将每个组的值相加。

  df
       group sum
  1   group1   7
  2   group2   8
  3   group3   6

I've searched for similar posts, but failed using them for my problem.

我搜索过类似的帖子,但是用它们来解决我的问题失败了。

3 个解决方案

#1


1  

Here's an approach. First, use ifelse to assign groups to each name, then use aggregate to get the sum for each group.

这里有一个方法。首先,使用ifelse为每个名称分配组,然后使用聚合获取每个组的和。

> df$group <- with(df, ifelse(name %in% group1, "group1",
                              ifelse(name %in% group2, "group2", "group3" )))
> aggregate(value ~ group, sum, data=df)
   group value
1 group1     7
2 group2     8
3 group3     6

#2


1  

I would suggest having your grouping as a data.frame, something along these lines -

我建议把你的分组作为数据。帧,沿着这些线

grouping <- data.frame(name=c("A","B","C","D","E","F"),groupno=c(1,1,1,2,2,3))
df2 <- merge(df,grouping, by = 'name')
aggregate(value ~ groupno, sum, data=df2)

#3


1  

Another idea:

另一个想法:

df$X <- factor(df$name)
levels(df$X) <- list(group1 = group1, group2 = group2, group3 = group3)
aggregate(df$value, list(group = df$X), sum)
#   group x
#1 group1 7
#2 group2 8
#3 group3 6

EDIT

编辑

As noted by @thelatemail in the comments below you can mget -in a list- all the objects in your workspace called "group_", like this:

@thelatemail在下面的评论中提到,你可以在一个列表中添加mget——你的工作空间中所有的对象都被称为group_,如下所示:

mget(ls(pattern="group\\d+"))

In case, though, you have loaded -say- a function called "group4", this function will be selected too in ls(). A way to avoid this is to use something like:

但是,如果您已经加载了-say-一个名为“group4”的函数,那么这个函数也将在ls()中被选中。避免这种情况的一种方法是:

.ls <- ls(pattern="group\\d+")
mget(.ls[!.ls %in% apropos("group", mode = "function")])  #`mget` only non-functions.
                                                      #You can, of course, avoid any 
                                                     #other `mode`, besides "function".

The list returned from mget can, then, be used as the levels(df$X).

然后,从mget返回的列表可以用作级别(df$X)。

#1


1  

Here's an approach. First, use ifelse to assign groups to each name, then use aggregate to get the sum for each group.

这里有一个方法。首先,使用ifelse为每个名称分配组,然后使用聚合获取每个组的和。

> df$group <- with(df, ifelse(name %in% group1, "group1",
                              ifelse(name %in% group2, "group2", "group3" )))
> aggregate(value ~ group, sum, data=df)
   group value
1 group1     7
2 group2     8
3 group3     6

#2


1  

I would suggest having your grouping as a data.frame, something along these lines -

我建议把你的分组作为数据。帧,沿着这些线

grouping <- data.frame(name=c("A","B","C","D","E","F"),groupno=c(1,1,1,2,2,3))
df2 <- merge(df,grouping, by = 'name')
aggregate(value ~ groupno, sum, data=df2)

#3


1  

Another idea:

另一个想法:

df$X <- factor(df$name)
levels(df$X) <- list(group1 = group1, group2 = group2, group3 = group3)
aggregate(df$value, list(group = df$X), sum)
#   group x
#1 group1 7
#2 group2 8
#3 group3 6

EDIT

编辑

As noted by @thelatemail in the comments below you can mget -in a list- all the objects in your workspace called "group_", like this:

@thelatemail在下面的评论中提到,你可以在一个列表中添加mget——你的工作空间中所有的对象都被称为group_,如下所示:

mget(ls(pattern="group\\d+"))

In case, though, you have loaded -say- a function called "group4", this function will be selected too in ls(). A way to avoid this is to use something like:

但是,如果您已经加载了-say-一个名为“group4”的函数,那么这个函数也将在ls()中被选中。避免这种情况的一种方法是:

.ls <- ls(pattern="group\\d+")
mget(.ls[!.ls %in% apropos("group", mode = "function")])  #`mget` only non-functions.
                                                      #You can, of course, avoid any 
                                                     #other `mode`, besides "function".

The list returned from mget can, then, be used as the levels(df$X).

然后,从mget返回的列表可以用作级别(df$X)。