《机器学习实战》学习笔记第五章 —— Logistic回归

时间:2023-03-09 07:18:51
《机器学习实战》学习笔记第五章 —— Logistic回归

一.有关笔记:

1..吴恩达机器学习笔记(二) —— Logistic回归

2.吴恩达机器学习笔记(十一) —— Large Scale Machine Learning

二.Python源码(不带正则项):

 # coding:utf-8

 '''
Created on Oct 27, 2010
Logistic Regression Working Module
@author: Peter
'''
from numpy import * def sigmoid(inX):
return 1.0 / (1 + exp(-inX)) def gradAscent(dataMatIn, classLabels):
dataMatrix = mat(dataMatIn) # convert to NumPy matrix
labelMat = mat(classLabels).transpose() # convert to NumPy matrix
m, n = shape(dataMatrix)
alpha = 0.001
maxCycles = 500
weights = ones((n, 1))
for k in range(maxCycles): # heavy on matrix operations
h = sigmoid(dataMatrix * weights) # matrix mult
error = (labelMat - h) # vector subtraction
weights = weights + alpha * dataMatrix.transpose() * error # matrix mult
return weights def stocGradAscent0(dataMatrix, classLabels,numIter=150):
m, n = shape(dataMatrix)
alpha = 0.01
weights = ones(n) # initialize to all ones
for j in range(numIter):
for i in range(m):
h = sigmoid(sum(dataMatrix[i] * weights))
error = classLabels[i] - h
weights = weights + alpha * error * dataMatrix[i]
return weights def stocGradAscent1(dataMatrix, classLabels, numIter=150):
m, n = shape(dataMatrix)
weights = ones(n) # initialize to all ones
for j in range(numIter):
dataIndex = range(m)
for i in range(m):
alpha = 4 / (1.0 + j + i) + 0.0001 # apha decreases with iteration, does not
randIndex = int(random.uniform(0, len(dataIndex))) # go to 0 because of the constant
h = sigmoid(sum(dataMatrix[randIndex] * weights))
error = classLabels[randIndex] - h
weights = weights + alpha * error * dataMatrix[randIndex]
del (dataIndex[randIndex])
return weights def classifyVector(inX, weights):
prob = sigmoid(sum(inX * weights))
if prob > 0.5:
return 1.0
else:
return 0.0 def colicTest():
frTrain = open('horseColicTraining.txt')
frTest = open('horseColicTest.txt')
trainingSet = []
trainingLabels = []
for line in frTrain.readlines():
currLine = line.strip().split('\t')
lineArr = []
for i in range(21):
lineArr.append(float(currLine[i]))
trainingSet.append(lineArr)
trainingLabels.append(float(currLine[21]))
trainWeights = stocGradAscent1(array(trainingSet), trainingLabels,500)
errorCount = 0; numTestVec = 0.0
for line in frTest.readlines():
numTestVec += 1.0
currLine = line.strip().split('\t')
lineArr = []
for i in range(21):
lineArr.append(float(currLine[i]))
if int(classifyVector(array(lineArr), trainWeights)) != int(currLine[21]):
errorCount += 1
errorRate = (float(errorCount) / numTestVec)
print "the error rate of this test is: %f" % errorRate
return errorRate def multiTest():
numTests = 10; errorSum = 0.0
for k in range(numTests):
errorSum += colicTest()
print "after %d iterations the average error rate is: %f" % (numTests, errorSum / float(numTests)) if __name__=="__main__":
multiTest()

三.Batch gradient descent、Stochastic gradient descent、Mini-batch gradient descent 的性能比较

1.Batch gradient descent

 def gradAscent(dataMatIn, classLabels):
dataMatrix = mat(dataMatIn) # convert to NumPy matrix
labelMat = mat(classLabels).transpose() # convert to NumPy matrix
m, n = shape(dataMatrix)
alpha = 0.001
maxCycles = 500
weights = ones((n, 1))
for k in range(maxCycles): # heavy on matrix operations
h = sigmoid(dataMatrix * weights) # matrix mult
error = (labelMat - h) # vector subtraction
weights = weights + alpha * dataMatrix.transpose() * error # matrix mult
return weights

其运行结果:

《机器学习实战》学习笔记第五章 —— Logistic回归

错误率为:28.4%

2.Stochastic gradient descent

 def stocGradAscent0(dataMatrix, classLabels,numIter=150):
m, n = shape(dataMatrix)
alpha = 0.01
weights = ones(n) # initialize to all ones
for j in range(numIter):
for i in range(m):
h = sigmoid(sum(dataMatrix[i] * weights))
error = classLabels[i] - h
weights = weights + alpha * error * dataMatrix[i]
return weights

迭代次数为150时,错误率为:46.3%

迭代次数为500时,错误率为:32.8%

迭代次数为800时,错误率为:38.8%

3.Mini-batch gradient descent

 def stocGradAscent1(dataMatrix, classLabels, numIter=150):
m, n = shape(dataMatrix)
weights = ones(n) # initialize to all ones
for j in range(numIter):
dataIndex = range(m)
for i in range(m):
alpha = 4 / (1.0 + j + i) + 0.0001 # apha decreases with iteration, does not
randIndex = int(random.uniform(0, len(dataIndex))) # go to 0 because of the constant
h = sigmoid(sum(dataMatrix[randIndex] * weights))
error = classLabels[randIndex] - h
weights = weights + alpha * error * dataMatrix[randIndex]
del (dataIndex[randIndex])
return weights

迭代次数为150时,错误率为:37.8%

迭代次数为500时,错误率为:35.2%

迭代次数为800时,错误率为:37.3%

4.综上:

1.在训练数据集较小且特征较少的时候,使用Batch gradient descent的效果是最好的。但如果不能满足这个条件,则可使用Mini-batch gradient descent,并设置合适的迭代次数。

2.对于Stochastic gradient descent 和 Mini-batch gradient descent 而言,并非迭代次数越多效果越好。不知为何?