机器学习-模型评价指标

对于二分类问题，它的样本只有正样本和负样本两类。测试样本中，正样本被分类器判定为正样本的数量记为TP(true positive)，被判定为负样本的数量记为FP(false negative)。负样本被分类器判定为负样本的数量记为TN(true negative),被判定为正样本的数量记为FP(false positive)。如图所示，A,B两组样本总数量各为100。

机器学习-模型评价指标

精度定义： TP/(TP+FP)

召回率定义：TP/(TP+FN)

虚景率： 1 - TP/(TP+FP)

真阳率：TPR =TP/(TP +FN)

假阳率：FPR = FP/(FP+TN)

ROS曲线的横轴为假阳率，纵轴为真阳率。

一个好的分类曲线应该让假阳率低，真阳率高，理想情况下应该是接近于y=1 的直线，即让曲线下的面积尽可能的大。

例子：

生成两组正态分布样本，两组样本对应的标签分别表示正样本，和负样本；资源链接如下：

链接：https://pan.baidu.com/s/1X4hHygzSQHB3f8_kepxE8A
提取码：6uvg

# -*- coding: utf-8 -*-
"""
Spyder Editor

This is a temporary script file.
"""
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

def floatrange(start,stop,steps):
    return [start+float(i)*(stop-start)/(float(steps)-1) for i in range(steps)]

"""读取数据"""
data = np.loadtxt('data.txt')

""""计算不同类别的正态参数"""
totalCount = len(data[:,0])
positiveCount =np.sum(data[:,1])
negativeCount = totalCount - positiveCount

#正标本均值，方差
positiveIndex= np.where(data[:,1] ==1)
positiveSum = np.sum(data[positiveIndex,0])
positive_u =positiveSum / positiveCount
positive_derta =np.sqrt(np.sum(np.square(data[positiveIndex,0] - positive_u )) / positiveCount)

#负标本均值，方差
negativeIndex= np.where(data[:,1] ==0)
negativeSum = np.sum(data[negativeIndex,0])
negative_u =negativeSum / negativeCount
negative_derta =np.sqrt(np.sum(np.square(data[negativeIndex,0] - negative_u )) / negativeCount)

#概率密度 曲线绘制
x = floatrange(2,25,1000)
print(positive_u,positive_derta)
pd = np.exp(-1.0*np.square(x-positive_u) / (2*np.square(positive_derta))) /(positive_derta*np.sqrt(2*np.pi))
nd = np.exp(-1.0*np.square(x-negative_u) / (2*np.square(negative_derta))) /(negative_derta*np.sqrt(2*np.pi))
plt.figure(1)
plt.plot(x,pd,'r')   
plt.plot(x,nd,'b') 
    

#概率分布构建
positiveFun = stats.norm(positive_u,positive_derta)
negativeFun = stats.norm(negative_u,negative_derta)

positiveValue = positiveFun.cdf(x)
negativeValue = negativeFun.cdf(x)


#真阳率，假阳率
positiveRate = 1 -positiveFun.cdf(x)
negativeRate = 1 -negativeFun.cdf(x)

#阀值
disvalue =positiveFun.cdf(x) +1 -negativeFun.cdf(x)
minvalue = np.min(disvalue)
index = np.where(disvalue == minvalue)
indexvalue =int(index[0])

xvalue = x[indexvalue]

#混淆矩阵
positivevalue = 1 -positiveFun.cdf(xvalue)
negativevalue = 1 -negativeFun.cdf(xvalue)
v00= int(positivevalue * positiveCount)
v01= positiveCount -v00
v10 =int(negativevalue* negativeCount)
v11 =negativeCount -v10
print("disvalue:",xvalue)
print("positiverate:",positivevalue,"negativerate:",negativevalue)
print(v00,",",v01)
print(v10,",",v11)


xdis = [xvalue,xvalue] 
ydis = [0,0.2]  
plt.plot(xdis,ydis,'g')
"""ros 曲线"""
plt.figure(2)
plt.plot(negativeRate,positiveRate,'r')

运行结果如下所示：

机器学习-模型评价指标

秒客网

机器学习-模型评价指标

相关文章