I'm trying to create a new column which returns the mean of values from an existing column in the same df. However the mean should be computed based on a grouping in three other columns.
我尝试创建一个新的列,它返回同一个df中的现有列的值的平均值。但是,应该根据其他三列中的分组计算平均值。
Out[184]:
YEAR daytype hourtype scenario option_value
0 2015 SAT of_h 0 0.134499
1 2015 SUN of_h 1 63.019250
2 2015 WD of_h 2 52.113516
3 2015 WD pk_h 3 43.126513
4 2015 SAT of_h 4 56.431392
I basically would like to have a new column 'mean' which compute the mean of "option value", when "YEAR", "daytype", and "hourtype" are similar.
我基本上希望有一个新的列“mean”,当“YEAR”、“daytype”和“hourtype”是相似的时候,它可以计算“option value”的均值。
I tried the following approach but without success ...
我尝试了以下方法,但没有成功……
In [185]: o2['premium']=o2.groupby(['YEAR', 'daytype', 'hourtype'])['option_cf'].mean()
TypeError: incompatible index of inserted column with frame index
2 个解决方案
#1
8
Here's one way to do it
这里有一个方法
In [19]: def cust_mean(grp):
....: grp['mean'] = grp['option_value'].mean()
....: return grp
....:
In [20]: o2.groupby(['YEAR', 'daytype', 'hourtype']).apply(cust_mean)
Out[20]:
YEAR daytype hourtype scenario option_value mean
0 2015 SAT of_h 0 0.134499 28.282946
1 2015 SUN of_h 1 63.019250 63.019250
2 2015 WD of_h 2 52.113516 52.113516
3 2015 WD pk_h 3 43.126513 43.126513
4 2015 SAT of_h 4 56.431392 28.282946
So, what was going wrong with your attempt?
那么,你的尝试出了什么问题?
It returns an aggregate with different shape from the original dataframe.
它返回具有与原始数据aframe不同形状的聚合。
In [21]: o2.groupby(['YEAR', 'daytype', 'hourtype'])['option_value'].mean()
Out[21]:
YEAR daytype hourtype
2015 SAT of_h 28.282946
SUN of_h 63.019250
WD of_h 52.113516
pk_h 43.126513
Name: option_value, dtype: float64
Or use transform
或者使用变换
In [1461]: o2['premium'] = (o2.groupby(['YEAR', 'daytype', 'hourtype'])['option_value']
.transform('mean'))
In [1462]: o2
Out[1462]:
YEAR daytype hourtype scenario option_value premium
0 2015 SAT of_h 0 0.134499 28.282946
1 2015 SUN of_h 1 63.019250 63.019250
2 2015 WD of_h 2 52.113516 52.113516
3 2015 WD pk_h 3 43.126513 43.126513
4 2015 SAT of_h 4 56.431392 28.282946
#2
1
You can do it the way you intended by tweaking your code in the following way:
您可以按照以下方式对代码进行调整:
o2 = o2.set_index(['YEAR', 'daytype', 'hourtype'])
o2['premium'] = o2.groupby(level=['YEAR', 'daytype', 'hourtype'])['option_value'].mean()
Why the original error? As explained by John Galt, the data coming out of groupby().mean() is not the same shape (length) as the original DataFrame.
为什么最初的错误呢?正如John Galt所解释的,groupby().mean()中的数据与原始数据aframe的形状(长度)不同。
Pandas can handle this cleverly if you first start with the 'grouping columns' in the index. Then it knows how to propogate the mean data correctly.
如果您首先从索引中的“分组列”开始,熊猫可以巧妙地处理这个问题。然后它知道如何正确地提出平均数据。
John's solution follows the same logic, because groupby naturally puts the grouping columns in the index during execution.
John的解决方案遵循同样的逻辑,因为groupby在执行期间自然地将分组列放入索引中。
#1
8
Here's one way to do it
这里有一个方法
In [19]: def cust_mean(grp):
....: grp['mean'] = grp['option_value'].mean()
....: return grp
....:
In [20]: o2.groupby(['YEAR', 'daytype', 'hourtype']).apply(cust_mean)
Out[20]:
YEAR daytype hourtype scenario option_value mean
0 2015 SAT of_h 0 0.134499 28.282946
1 2015 SUN of_h 1 63.019250 63.019250
2 2015 WD of_h 2 52.113516 52.113516
3 2015 WD pk_h 3 43.126513 43.126513
4 2015 SAT of_h 4 56.431392 28.282946
So, what was going wrong with your attempt?
那么,你的尝试出了什么问题?
It returns an aggregate with different shape from the original dataframe.
它返回具有与原始数据aframe不同形状的聚合。
In [21]: o2.groupby(['YEAR', 'daytype', 'hourtype'])['option_value'].mean()
Out[21]:
YEAR daytype hourtype
2015 SAT of_h 28.282946
SUN of_h 63.019250
WD of_h 52.113516
pk_h 43.126513
Name: option_value, dtype: float64
Or use transform
或者使用变换
In [1461]: o2['premium'] = (o2.groupby(['YEAR', 'daytype', 'hourtype'])['option_value']
.transform('mean'))
In [1462]: o2
Out[1462]:
YEAR daytype hourtype scenario option_value premium
0 2015 SAT of_h 0 0.134499 28.282946
1 2015 SUN of_h 1 63.019250 63.019250
2 2015 WD of_h 2 52.113516 52.113516
3 2015 WD pk_h 3 43.126513 43.126513
4 2015 SAT of_h 4 56.431392 28.282946
#2
1
You can do it the way you intended by tweaking your code in the following way:
您可以按照以下方式对代码进行调整:
o2 = o2.set_index(['YEAR', 'daytype', 'hourtype'])
o2['premium'] = o2.groupby(level=['YEAR', 'daytype', 'hourtype'])['option_value'].mean()
Why the original error? As explained by John Galt, the data coming out of groupby().mean() is not the same shape (length) as the original DataFrame.
为什么最初的错误呢?正如John Galt所解释的,groupby().mean()中的数据与原始数据aframe的形状(长度)不同。
Pandas can handle this cleverly if you first start with the 'grouping columns' in the index. Then it knows how to propogate the mean data correctly.
如果您首先从索引中的“分组列”开始,熊猫可以巧妙地处理这个问题。然后它知道如何正确地提出平均数据。
John's solution follows the same logic, because groupby naturally puts the grouping columns in the index during execution.
John的解决方案遵循同样的逻辑,因为groupby在执行期间自然地将分组列放入索引中。