和numpy数组(5)-二维数组的轴一样,pandas DataFrame也有轴的概念,决定了方法是对行应用还是对列应用:
以下面这个数据为例说明:
这个数据是5个车站10天内的客流数据:
ridership_df = pd.DataFrame(
data=[[ 0, 0, 2, 5, 0],
[1478, 3877, 3674, 2328, 2539],
[1613, 4088, 3991, 6461, 2691],
[1560, 3392, 3826, 4787, 2613],
[1608, 4802, 3932, 4477, 2705],
[1576, 3933, 3909, 4979, 2685],
[ 95, 229, 255, 496, 201],
[ 2, 0, 1, 27, 0],
[1438, 3785, 3589, 4174, 2215],
[1342, 4043, 4009, 4665, 3033]],
index=['05-01-11', '05-02-11', '05-03-11', '05-04-11', '05-05-11',
'05-06-11', '05-07-11', '05-08-11', '05-09-11', '05-10-11'],
columns=['R003', 'R004', 'R005', 'R006', 'R007']
)
R003 R004 R005 R006 R007
05-01-11 0 0 2 5 0
05-02-11 1478 3877 3674 2328 2539
05-03-11 1613 4088 3991 6461 2691
05-04-11 1560 3392 3826 4787 2613
05-05-11 1608 4802 3932 4477 2705
05-06-11 1576 3933 3909 4979 2685
05-07-11 95 229 255 496 201
05-08-11 2 0 1 27 0
05-09-11 1438 3785 3589 4174 2215
05-10-11 1342 4043 4009 4665 3033
这个数据里,行表示每一天里各个站的客流,列表示每一个站里各天的客流
如果要计算每天各个站的平均客流:
print(ridership_df.mean(axis=1)) or:
print(ridership_df.mean(axis='columns'))
05-01-11 1.4
05-02-11 2779.2
05-03-11 3768.8
05-04-11 3235.6
05-05-11 3504.8
05-06-11 3416.4
05-07-11 255.2
05-08-11 6.0
05-09-11 3040.2
05-10-11 3418.4
dtype: float64
如果要计算每个站各天的平均客流:
print(ridership_df.mean(axis=0)) or: print(ridership_df.mean(axis='index'))
R003 1071.2
R004 2814.9
R005 2718.8
R006 3239.9
R007 1868.2
dtype: float64
*总结:
axis=或者axis='index',计算列
axis=或者axis='columns',计算行