Pandas透视表和交叉表

透视表

参数名	说明
values	待聚合的列的名称。默认聚合所有数值列
index	用于分组的列名或其他分组键,出现在结果透视表的行
columns	用于分组的列表或其他分组键，出现在结果透视表的列
aggfunc	聚合函数或函数列表,默认为'mean',可以是任何对groupby有效的函数
fill_value	用于替换结果表中的缺失值
margins	添加行/列小计和总计,默认为False

# pivot_table默认聚合分组平均数

tips = pd.read_csv('C:/Users/1/Desktop/tips.csv')

tips.pivot_table(index=['sex','smoker'])

                    size	  tip	    total_bill

sex	    smoker

Female	    No	2.592593	2.773519	18.105185

           Yes	2.242424	2.931515	17.977879

Male	    No	2.711340	3.113402	19.791237

           Yes	2.500000	3.051167	22.284500

# 聚合小费的比例和人数的多少，根据天数进行分组

tips['tip_pct'] = tips.tip / tips.total_bill

tips.pivot_table(['tip_pct','size'],index=['sex','day'],columns='smoker')

                size	                tip_pct

smoker	          No	   Yes	        No	      Yes

sex	    day

Female	Fri	  2.500000	2.000000	0.165296	0.209129

        Sat	  2.307692	2.200000	0.147993	0.163817

        Sun	  3.071429	2.500000	0.165710	0.237075

        Thur  2.480000	2.428571	0.155971	0.163073

Male	Fri	  2.000000	2.125000	0.138005	0.144730

        Sat	  2.656250	2.629630	0.162132	0.139067

        Sun	  2.883721	2.600000	0.158291	0.173964

        Thur  2.500000	2.300000	0.165706	0.164417

# margins=True，会开启一列ALL的行和列，其值对应于单个等级中所有数据的分组统计。

tips.pivot_table(['tip_pct','size'],index=['sex','day'],columns='smoker',margins=True)

                size	                                  tip_pct

    smoker	  No	       Yes	      All	       No	      Yes	       All

sex	    day

Female	Fri	2.500000	2.000000	2.111111	0.165296	0.209129	0.199388

        Sat	2.307692	2.200000	2.250000	0.147993	0.163817	0.156470

        Sun	3.071429	2.500000	2.944444	0.165710	0.237075	0.181569

        Thur2.480000	2.428571	2.468750	0.155971	0.163073	0.157525

Male	Fri	2.000000	2.125000	2.100000	0.138005	0.144730	0.143385

        Sat	2.656250	2.629630	2.644068	0.162132	0.139067	0.151577

        Sun	2.883721	2.600000	2.810345	0.158291	0.173964	0.162344

        Thur2.500000	2.300000	2.433333	0.165706	0.164417	0.165276

All		    2.668874	2.408602	2.569672	0.159328	0.163196	0.160803

# aggfunc可以直接接受聚合函数，len可以统计分组大小的交叉表

# 根据性别，是否吸烟对每天的人数进行统计

tips.pivot_table('tip_pct',index=['sex','smoker'],columns='day',aggfunc=len,margins=True)

day	        Fri	     Sat	Sun	    Thur	All

sex	  smoker

Female	No	 2.0	13.0	14.0	25.0	54.0

        Yes	 7.0	15.0	4.0	    7.0	    33.0

Male	No	 2.0	32.0	43.0	20.0	97.0

        Yes	 8.0	27.0	15.0	10.0	60.0

All		     19.0	87.0	76.0	62.0	244.0

# 如果存在空的组合，可以设置空值fill_value

tips.pivot_table('size',index=['time','sex','smoker'],columns='day',aggfunc='sum',fill_value=0)

         day	     Fri	Sat	 Sun	Thur

time	sex	   smoker

Dinner	Female	No	  2	     30	 43	     2

               Yes	  8	     33	 10	     0

Male	        No	  4	     85	 124	 0

               Yes	  12	 71	 39	     0

Lunch	Female	No	  3	      0	  0	    60

                Yes	  6	      0	  0	    17

        Male	No	  0	      0	  0	    50

                Yes	  5	      0	  0	    23

交叉表: crosstab

是一种用于计算分组频率的特殊透视表

data = [[1,'Female','Right-handed'],

        [2,'Male','Left-handed'],

        [3,'Female','Right-handed'],

        [4,'Male','Right-handed'],

        [5,'Male','Left-handed'],

        [6,'Male','Right-handed'],

        [7,'Female','Right-handed'],

        [8,'Female','Left-handed'],

        [9,'Male','Right-handed'],

        [10,'Female','Right-handed']]

data = pd.DataFrame(data,columns=['Sample','Gender','Handedness'])

# 前两个参数可以是数组、Series或数组列表

# 根据性别和用手习惯对这段数据进行统计汇总

pd.crosstab(data.Gender,data.Handedness,margins=True)

Handedness	Left-handed	Right-handed	All

 Gender

 Female	          1	          4	         5

   Male	          2	          3	         5

   All	          3	          7	        10

# 这是传统的透视表,比较复杂点比交叉表

pd.pivot_table(data,index=['Gender'],columns=['Handedness'],aggfunc=len,margins=True)

             Sample

Handedness	Left-handed	Right-handed	All

Gender

Female	         1	         4	         5

Male	         2	         3 	         5

All	             3	         7	         10

# 根据时间，星期几进行了对每天的是否吸烟人数分组统计

pd.crosstab([tips.time,tips.day],tips.smoker,margins=True)

      smoker	No	  Yes	All

time	day

Dinner	Fri	    3	   9	12

        Sat	    45	  42	87

        Sun	    57	  19	76

        Thur	1	   0	1

Lunch	Fri	    1	   6	7

        Thur	44	  17	61

        All		151	  93	244

秒客网

Pandas透视表和交叉表

透视表

交叉表: crosstab

是一种用于计算分组频率的特殊透视表

相关文章