如何对python熊猫中相同列上的唯一和某些值的计数进行分组和计数?

时间:2022-01-10 22:46:06

My question is related to my previous Question but it's different. So I am asking the new question.

我的问题和我之前的问题有关,但不同。所以我提出了一个新的问题。

In above question see the answer of @jezrael.

在上述问题中,请参阅@jezrael的答案。

df = pd.DataFrame({'col1':[1,1,1],
                   'col2':[4,4,6],
                   'col3':[7,7,9],
                   'col4':[3,3,5]})

print (df)
   col1  col2  col3  col4
0     1     4     7     3
1     1     4     7     3
2     1     6     9     5

df1 = df.groupby(['col1','col2']).agg({'col3':'size','col4':'nunique'})
df1['result_col'] = df1['col3'].div(df1['col4'])
print (df1)
           col4  col3  result_col
col1 col2                        
1    4        1     2         2.0
     6        1     1         1.0

Now here I want to take count for the specific value of col4 . Say I also want to take count of col4 == 3 in the same query.

现在我要计算col4的具体值。假设我还想在同一个查询中取col4 == 3的count。

df.groupby(['col1','col2']).agg({'col3':'size','col4':'nunique'}) ... + count(col4=='3')

How to do this in same above query I have tried bellow but not getting solution.

如何在上面的查询中做到这一点,我尝试过bellow但是没有得到解决。

df.groupby(['col1','col2']).agg({'col3':'size','col4':'nunique','col4':'x: lambda x[x == 7].count()'})

2 个解决方案

#1


2  

Do some preprocessing by including the col4==3 as a column ahead of time. Then use aggregate

提前将col4== =3作为列进行一些预处理。然后使用聚合

df.assign(result_col=df.col4.eq(3).astype(int)).groupby(
    ['col1', 'col2']
).agg(dict(col3='size', col4='nunique', result_col='sum'))

           col3  result_col  col4
col1 col2                        
1    4        2           2     1
     6        1           0     1

old answers

旧的答案

g = df.groupby(['col1', 'col2'])
g.agg({'col3':'size','col4': 'nunique'}).assign(
    result_col=g.col4.apply(lambda x: x.eq(3).sum()))

           col3  col4  result_col
col1 col2                        
1    4        2     1           2
     6        1     1           0

slightly rearranged

稍微重新安排

g = df.groupby(['col1', 'col2'])
final_df = g.agg({'col3':'size','col4': 'nunique'})
final_df.insert(1, 'result_col', g.col4.apply(lambda x: x.eq(3).sum()))
final_df

           col3  result_col  col4
col1 col2                        
1    4        2           2     1
     6        1           0     1

#2


2  

I think you need aggregate with list of function in dict for column col4.

我认为在第col4列中需要用函数列表进行聚合。

If need count 3 values the simpliest is sum True values in x == 3:

如果需要数3个值,最简单的是x = 3的和真值:

df1 = df.groupby(['col1','col2'])
        .agg({'col3':'size','col4': ['nunique', lambda x: (x == 3).sum()]})
df1 = df1.rename(columns={'<lambda>':'count_3'})
df1.columns = ['{}_{}'.format(x[0], x[1]) for x in df1.columns]
print (df1)
           col4_nunique  col4_count_3  col3_size
col1 col2                                       
1    4                1             2          2
     6                1             0          1

#1


2  

Do some preprocessing by including the col4==3 as a column ahead of time. Then use aggregate

提前将col4== =3作为列进行一些预处理。然后使用聚合

df.assign(result_col=df.col4.eq(3).astype(int)).groupby(
    ['col1', 'col2']
).agg(dict(col3='size', col4='nunique', result_col='sum'))

           col3  result_col  col4
col1 col2                        
1    4        2           2     1
     6        1           0     1

old answers

旧的答案

g = df.groupby(['col1', 'col2'])
g.agg({'col3':'size','col4': 'nunique'}).assign(
    result_col=g.col4.apply(lambda x: x.eq(3).sum()))

           col3  col4  result_col
col1 col2                        
1    4        2     1           2
     6        1     1           0

slightly rearranged

稍微重新安排

g = df.groupby(['col1', 'col2'])
final_df = g.agg({'col3':'size','col4': 'nunique'})
final_df.insert(1, 'result_col', g.col4.apply(lambda x: x.eq(3).sum()))
final_df

           col3  result_col  col4
col1 col2                        
1    4        2           2     1
     6        1           0     1

#2


2  

I think you need aggregate with list of function in dict for column col4.

我认为在第col4列中需要用函数列表进行聚合。

If need count 3 values the simpliest is sum True values in x == 3:

如果需要数3个值,最简单的是x = 3的和真值:

df1 = df.groupby(['col1','col2'])
        .agg({'col3':'size','col4': ['nunique', lambda x: (x == 3).sum()]})
df1 = df1.rename(columns={'<lambda>':'count_3'})
df1.columns = ['{}_{}'.format(x[0], x[1]) for x in df1.columns]
print (df1)
           col4_nunique  col4_count_3  col3_size
col1 col2                                       
1    4                1             2          2
     6                1             0          1