集合(计数)与条件匹配的行,按唯一值分组。

时间:2023-01-31 22:46:31

It seems like such a simple problem, yet i've been pulling my hair out trying to get this to work:

这似乎是一个简单的问题,但我一直努力想让这个问题奏效:

Given this data frame identifying the interactions idhad with contact who is grouped by contactGrp,

给定这个数据帧,识别与联系人的交互,由contactGrp分组,

head(data)
   id               sesTs  contact    contactGrp   relpos   maxpos
1 6849 2012-06-25 15:58:34   peter        west    0.000000      3
2 6849 2012-06-25 18:24:49   sarah        south   0.500000      3
3 6849 2012-06-27 00:13:30   sarah        south   1.000000      3
4 1235 2012-06-29 17:49:35   peter        west    0.000000      2
5 1235 2012-06-29 23:56:35   peter        west    1.000000      2
6 5893 2012-06-30 22:21:33   carl         east    0.000000      1

how many contacts where there for unique(data$contactGrp) with relpos=1 and maxpos>1 ?

有多少联系人(数据$contactGrp)与relpos=1和maxpos>1 ?

An expected Result would be:

预期的结果是:

1 west   1
2 south  1
3 east   0

A small subset of lines i have tried:

我尝试过的一小部分线路:

  • aggregate(data, by=list('contactGrp'), FUN=count) yields an error, no filtering
  • 聚合(数据,by=list('contactGrp'), FUN=count)产生一个错误,没有过滤。
  • using data.table seems to require a key, which is not unique in this data…
  • 使用数据。表似乎需要一个键,这在这个数据中不是唯一的…
  • ddply(data,"contactGrp",summarise,count=???) not sure which function to use to fill the count column
  • ddply(数据,“contactGrp”,总结,count=??)不确定用来填充count列的函数。
  • ddply(subset(data,maxpos>1 & relpos==0), c('contactGrp'), function(df)count(df$relpos)) works but gives me an extra column x and it feels like i've overcomplicated it…
  • ddply(数据,maxpos>1 & relpos==0), c('contactGrp'),函数(df)count(df$relpos))工作,但给我额外的列x,感觉好像我把它过于复杂了……

SQL would be easy: Select contactGrp, count(*) as cnt from data where … Group by contactGrp but im trying to learn R

SQL很容易:选择contactGrp, count(*)为cnt,通过contactGrp进行分组,但我试图学习R。

4 个解决方案

#1


19  

I think this is the ddply version you're looking for:

我想这就是你要找的ddply版本:

ddply(sessions,.(contactGrp),
      summarise,
      count = length(contact[relpos == 0 & maxpos > 1]))

#2


22  

And here is the data.table solution:

这是数据。表解决方案:

> library(data.table)
> dt <- data.table(sessions)
> dt[, length(contact[relpos == 0 & maxpos > 1]), by = contactGrp]
     contactGrp V1
[1,]       west  2
[2,]      south  0
[3,]       east  0

> dt[, length(contact[relpos == 1 & maxpos > 1]), by = contactGrp]
     contactGrp V1
[1,]       west  1
[2,]      south  1
[3,]       east  0

#3


10  

Here is an other approach:

下面是另一种方法:

a <- data.frame(id=1:10, contact=sample(c("peter", "sahrah"), 10, T), contactGrp=sample(c("west", "east"), 10, T), relpos=sample(0:1, 10, T), maxpos=runif(10, 0,10))

library(sqldf)
sqldf("Select contactGrp, count(*) as cnt from a where relpos=0 and maxpos > 1 Group by contactGrp")
  contactGrp cnt
1       east   3
2       west   1

#4


10  

Your first attempted line with aggregate doesn't work because there is no function count. You meant length. All you had to do was execute that with conditional data selection for relpos and maxpos, and also select a dummy variable to get the count of (doesn't matter which). Nevertheless, instead of using flexible aggregating commands of various kinds the built in table command is designed just for this.

由于没有函数计数,所以您的第一行尝试的聚合方法不起作用。你是指长度。您所要做的就是使用条件数据选择relpos和maxpos,并选择一个哑变量来获得计数(不重要)。然而,并不是使用各种类型的灵活的聚合命令,而是针对此设计了表命令。

with( data[data$relpos == 1 & data$maxpos > 1,], table(contactGrp) )

#1


19  

I think this is the ddply version you're looking for:

我想这就是你要找的ddply版本:

ddply(sessions,.(contactGrp),
      summarise,
      count = length(contact[relpos == 0 & maxpos > 1]))

#2


22  

And here is the data.table solution:

这是数据。表解决方案:

> library(data.table)
> dt <- data.table(sessions)
> dt[, length(contact[relpos == 0 & maxpos > 1]), by = contactGrp]
     contactGrp V1
[1,]       west  2
[2,]      south  0
[3,]       east  0

> dt[, length(contact[relpos == 1 & maxpos > 1]), by = contactGrp]
     contactGrp V1
[1,]       west  1
[2,]      south  1
[3,]       east  0

#3


10  

Here is an other approach:

下面是另一种方法:

a <- data.frame(id=1:10, contact=sample(c("peter", "sahrah"), 10, T), contactGrp=sample(c("west", "east"), 10, T), relpos=sample(0:1, 10, T), maxpos=runif(10, 0,10))

library(sqldf)
sqldf("Select contactGrp, count(*) as cnt from a where relpos=0 and maxpos > 1 Group by contactGrp")
  contactGrp cnt
1       east   3
2       west   1

#4


10  

Your first attempted line with aggregate doesn't work because there is no function count. You meant length. All you had to do was execute that with conditional data selection for relpos and maxpos, and also select a dummy variable to get the count of (doesn't matter which). Nevertheless, instead of using flexible aggregating commands of various kinds the built in table command is designed just for this.

由于没有函数计数,所以您的第一行尝试的聚合方法不起作用。你是指长度。您所要做的就是使用条件数据选择relpos和maxpos,并选择一个哑变量来获得计数(不重要)。然而,并不是使用各种类型的灵活的聚合命令,而是针对此设计了表命令。

with( data[data$relpos == 1 & data$maxpos > 1,], table(contactGrp) )