计算组平均值并使用Pandas groupby将其分配给子组

时间:2023-02-01 22:54:19

I have a DataFrame with the populations of each city. I want to calculate the average population in each state using the populations from each city within that state.

我有一个DataFrame与每个城市的人口。我想用这个州内每个城市的人口计算每个州的平均人口。

Here's a sample of the data:

这是一个数据样本:

State     City         Population     State Ave
CA        San Diego    10000          ??
CA        Palo Alto    8000           ??
CA        Marin        5000           ??
SC        Columbia     4000           ??
SC        Charleston   3000           ??
SC        Greenville   4000           ??

I can retrieve the averages with:

我可以通过以下方式检索平均值:

data = pd.read_csv(/Downloads/test.csv')

grouped = data.group_by("State")

for k, group in grouped:
  print grouped.mean()

State       Population
CA          7666.66666667
SC          3666.66666667

But how do I assign the state average to each city?

但是,我如何为每个城市分配州平均值?

Note: I tried to simplify a big problem with this smaller example and the data above, which is obviously not real.

注意:我试图用这个较小的例子和上面的数据简化一个大问题,这显然不是真的。

2 个解决方案

#1


You could use transform and place the result in df['Avg']

你可以使用transform并将结果放在df ['Avg']中

In [216]: df['Avg'] = df.groupby('State')['Population'].transform('mean')

In [217]: df
Out[217]:
  State        City  Population          Avg
0    CA    SanDiego       10000  7666.666667
1    CA    PaloAlto        8000  7666.666667
2    CA       Marin        5000  7666.666667
3    SC    Columbia        4000  3666.666667
4    SC  Charleston        3000  3666.666667
5    SC  Greenville        4000  3666.666667

#2


mean = df.groupby('State')['Population'].mean()

mean = df.groupby('State')['Population']。mean()

df['mean'] = df.name.apply(mean.get_value)

df ['mean'] = df.name.apply(mean.get_value)

#1


You could use transform and place the result in df['Avg']

你可以使用transform并将结果放在df ['Avg']中

In [216]: df['Avg'] = df.groupby('State')['Population'].transform('mean')

In [217]: df
Out[217]:
  State        City  Population          Avg
0    CA    SanDiego       10000  7666.666667
1    CA    PaloAlto        8000  7666.666667
2    CA       Marin        5000  7666.666667
3    SC    Columbia        4000  3666.666667
4    SC  Charleston        3000  3666.666667
5    SC  Greenville        4000  3666.666667

#2


mean = df.groupby('State')['Population'].mean()

mean = df.groupby('State')['Population']。mean()

df['mean'] = df.name.apply(mean.get_value)

df ['mean'] = df.name.apply(mean.get_value)