Theano3.3-练习之逻辑回归

时间:2022-01-22 10:23:39

是官网上theano的逻辑回归的练习(http://deeplearning.net/tutorial/logreg.html#logreg)的讲解。

Classifying MNIST digits using Logistic Regression

note:这部分假设你已经熟悉了这几个theano概念:: shared
variables
 , basic
arithmetic ops
 , T.grad , floatX.。如果你想要在GPU上运行这个代码,同样可以读读GPU.

note:这部分的代码下载地址 here.

在这部分中,我们会介绍如何使用theano来执行最基础的分类器:逻辑回归。我们先快速的简单介绍下这个模型,这算是一个复习或者说是为了加深理解,并且用来说明如何将数学表达式映射到theano图(graphs)上。在最底层的机器学习传统基础上,本教程来解决MNIST数字分类的最令人激动人心的问题。

一、模型

逻辑回归模型是个概率、线性分类器。它是通过权重矩阵Theano3.3-练习之逻辑回归 和一个偏置向量Theano3.3-练习之逻辑回归来参数化的。通过将输入向量映射到一个超平面集合上来达到分类的目的,每一个超平面都对应着不同的类别。从输入到一个超平面之间的距离反映出该输入是对应的类别成员的概率的大小。

数学上来说,一个输入向量Theano3.3-练习之逻辑回归 是一个类别 Theano3.3-练习之逻辑回归 的成员的概率是随机变量 Theano3.3-练习之逻辑回归
的一个值,可以写成如下形式:

Theano3.3-练习之逻辑回归

模型的预测结果Theano3.3-练习之逻辑回归 表示的类别是它的概率是最大的时候,也就是:

Theano3.3-练习之逻辑回归

在theano中的代码如下:

 # initialize with 0 the weights W as a matrix of shape (n_in, n_out)将权重矩阵W的值初始化为0,floatX类别
self.W = theano.shared(
value=numpy.zeros(
(n_in, n_out),
dtype=theano.config.floatX
),
name='W',
borrow=True
)
# initialize the baises b as a vector of n_out 0s#初始化偏置向量为0,floatX类型
self.b = theano.shared(
value=numpy.zeros(
(n_out,),
dtype=theano.config.floatX
),
name='b',
borrow=True
) # symbolic expression for computing the matrix of class-membership
# probabilities
# Where:
# W is a matrix where column-k represent the separation hyper plain for
# class-k
# x is a matrix where row-j represents input training sample-j
# b is a vector where element-k represent the free parameter of hyper
# plain-k
self.p_y_given_x = T.nnet.softmax(T.dot(input, self.W) + self.b)#直接调用softmax计算输入、权重、偏置的结果 # symbolic description of how to compute prediction as class whose
# probability is maximal
self.y_pred = T.argmax(self.p_y_given_x, axis=1)#读取得到结果的最大值,即softmax得到的最大概率

因为模型的参数在训练中必须保持一个稳定的状态,我们需要为 Theano3.3-练习之逻辑回归分配共享变量。这样不但是将他们声明为符号式的theano变量,而且同样对它们进行了初始化。点和softmax操作随后用来计算向量 Theano3.3-练习之逻辑回归。结果p_y_given_x是一个向量类型的符号变量。为了得到实际模型的预测结果,可以使用T.argmax操作,这个用来返回在p_y_given_x中的最大值(即最大概率的那个类)。现在,该模型还没有任何用处,因为它的参数仍然在初始化的状态,下面的部分将会介绍如何来学习优化这些参数

note:theano的操作的完整列表:list
of ops

二、定义一个损失函数

学习一个最优化模型的参数,需要涉及到最小化一个损失函数。在多类逻辑回归中,通常是使用负log似然作为损失函数。也就是等于在基于由 Theano3.3-练习之逻辑回归参数化模型后,最大化数据集Theano3.3-练习之逻辑回归的似然函数。让我们首先定义下似然函数Theano3.3-练习之逻辑回归 和损失函数Theano3.3-练习之逻辑回归

Theano3.3-练习之逻辑回归

然而整本书都在说最小化的主题,到目前为止梯度下降算是最简单的方法用来最小化任意非线性函数了。该教程将会使用minibatch
sgd的方法。更详细的见Stochastic
Gradient Descent
 。

下面的theano代码是在给定minibatch的基础上定义的损失(符号)函数:

        # y.shape[0] is (symbolically) the number of rows in y, i.e.,
# number of examples (call it n) in the minibatch
# T.arange(y.shape[0]) is a symbolic vector which will contain
# [0,1,2,... n-1] T.log(self.p_y_given_x) is a matrix of
# Log-Probabilities (call it LP) with one row per example and
# one column per class LP[T.arange(y.shape[0]),y] is a vector
# v containing [LP[0,y[0]], LP[1,y[1]], LP[2,y[2]], ...,
# LP[n-1,y[n-1]]] and T.mean(LP[T.arange(y.shape[0]),y]) is
# the mean (across minibatch examples) of the elements in v,
# i.e., the mean log-likelihood across the minibatch.
return -T.mean(T.log(self.p_y_given_x)[T.arange(y.shape[0]), y])

note:即使损失函数在形式上是定义成基于数据集的独立误差项的和。不过在实际操作中,在代码中是通过使用均值(T.mean)来实现的。这使得我们对于学习率的选择能更少的依赖于minibatch
size。

三、创建一个逻辑回归类

我们现在有所有我们需要定义一个LogisticRegression类的工具。可以用一个类来封装逻辑回归的基本操作。这里的代码很类似于到目前为止看到的代码,所以可以自圆其说:

class LogisticRegression(object):
"""Multi-class Logistic Regression Class The logistic regression is fully described by a weight matrix :math:`W`
and bias vector :math:`b`. Classification is done by projecting data
points onto a set of hyperplanes, the distance to which is used to
determine a class membership probability.
""" def __init__(self, input, n_in, n_out):
""" Initialize the parameters of the logistic regression :type input: theano.tensor.TensorType
:param input: symbolic variable that describes the input of the
architecture (one minibatch) :type n_in: int
:param n_in: number of input units, the dimension of the space in
which the datapoints lie :type n_out: int
:param n_out: number of output units, the dimension of the space in
which the labels lie """
# start-snippet-1
# initialize with 0 the weights W as a matrix of shape (n_in, n_out)
self.W = theano.shared(
value=numpy.zeros(
(n_in, n_out),
dtype=theano.config.floatX
),
name='W',
borrow=True
)
# initialize the baises b as a vector of n_out 0s
self.b = theano.shared(
value=numpy.zeros(
(n_out,),
dtype=theano.config.floatX
),
name='b',
borrow=True
) # symbolic expression for computing the matrix of class-membership
# probabilities
# Where:
# W is a matrix where column-k represent the separation hyper plain for
# class-k
# x is a matrix where row-j represents input training sample-j
# b is a vector where element-k represent the free parameter of hyper
# plain-k
self.p_y_given_x = T.nnet.softmax(T.dot(input, self.W) + self.b) # symbolic description of how to compute prediction as class whose
# probability is maximal
self.y_pred = T.argmax(self.p_y_given_x, axis=1)
# end-snippet-1 # parameters of the model
self.params = [self.W, self.b] def negative_log_likelihood(self, y):
"""Return the mean of the negative log-likelihood of the prediction
of this model under a given target distribution. .. math:: \frac{1}{|\mathcal{D}|} \mathcal{L} (\theta=\{W,b\}, \mathcal{D}) =
\frac{1}{|\mathcal{D}|} \sum_{i=0}^{|\mathcal{D}|}
\log(P(Y=y^{(i)}|x^{(i)}, W,b)) \\
\ell (\theta=\{W,b\}, \mathcal{D}) :type y: theano.tensor.TensorType
:param y: corresponds to a vector that gives for each example the
correct label Note: we use the mean instead of the sum so that
the learning rate is less dependent on the batch size
"""
# start-snippet-2
# y.shape[0] is (symbolically) the number of rows in y, i.e.,
# number of examples (call it n) in the minibatch
# T.arange(y.shape[0]) is a symbolic vector which will contain
# [0,1,2,... n-1] T.log(self.p_y_given_x) is a matrix of
# Log-Probabilities (call it LP) with one row per example and
# one column per class LP[T.arange(y.shape[0]),y] is a vector
# v containing [LP[0,y[0]], LP[1,y[1]], LP[2,y[2]], ...,
# LP[n-1,y[n-1]]] and T.mean(LP[T.arange(y.shape[0]),y]) is
# the mean (across minibatch examples) of the elements in v,
# i.e., the mean log-likelihood across the minibatch.
return -T.mean(T.log(self.p_y_given_x)[T.arange(y.shape[0]), y])
# end-snippet-2 def errors(self, y):
"""Return a float representing the number of errors in the minibatch
over the total number of examples of the minibatch ; zero one
loss over the size of the minibatch :type y: theano.tensor.TensorType
:param y: corresponds to a vector that gives for each example the
correct label
""" # check if y has same dimension of y_pred
if y.ndim != self.y_pred.ndim:
raise TypeError(
'y should have the same shape as self.y_pred',
('y', y.type, 'y_pred', self.y_pred.type)
)
# check if y is of the correct datatype
if y.dtype.startswith('int'):
# the T.neq operator returns a vector of 0s and 1s, where 1
# represents a mistake in prediction
return T.mean(T.neq(self.y_pred, y))
else:
raise NotImplementedError()

可以通过下面这样实例化这个类:

    # generate symbolic variables for input (x and y represent a
# minibatch)
x = T.matrix('x') # data, presented as rasterized images
y = T.ivector('y') # labels, presented as 1D vector of [int] labels # construct the logistic regression class
# Each MNIST image has size 28*28
classifier = LogisticRegression(input=x, n_in=28 * 28, n_out=10)

我们先对训练输入分配符号变量Theano3.3-练习之逻辑回归和对应的类别标签分配符号变量Theano3.3-练习之逻辑回归。注意到x
和y 都是在LogisticRegression对象范围的外部定义的。因为该类需要输入来建立它的图(graph),所以需要将它作为参数传递给__init__函数。当你想要将这些类的实例连接起来形成一个深度网络的时候是很有用的。每一层的输出可以传递给上一层的输入(该教程不建立一个多层网络,不过该骂却可以在未来教程中重用)。最后,我们定义一个(符号)cost变量来最小化,使用实例方法classifier.negative_log_likehood:

    # the cost we minimize during training is the negative log likelihood of
# the model in symbolic format
cost = classifier.negative_log_likelihood(y)

注意到因为符号变量classifier是定义在以x项为初始化的基础上的,所以x
是一个对于cost的定义是隐式符号输入。

四、模型的学习

为了在大多数编程语言(c/c++、Matlab、Python)中执行MSGD,需要手动计算关于参数的损失函数梯度的表达式:在这种情况下 Theano3.3-练习之逻辑回归,
和 Theano3.3-练习之逻辑回归,对于复杂模型来说可能需要有相当的技巧,比如表达式Theano3.3-练习之逻辑回归可以变得相当复杂,特别是还要考虑数值稳定的问题的时候。不过通过theano,这个工作就变得相当简单了,它会自动进行分化(differentiation),然后使用某个数学变换来提升数值稳定性。为了在theano中得到 Theano3.3-练习之逻辑回归 和Theano3.3-练习之逻辑回归 的梯度,简单的按照下面的操作:

    g_W = T.grad(cost=cost, wrt=classifier.W)
g_b = T.grad(cost=cost, wrt=classifier.b)

g_w和g_b都是符号变量,用来作为计算图中的一部分。函数train_model是用来执行一步梯度的,可以被定义成如下形式:

    # specify how to update the parameters of the model as a list of
# (variable, update expression) pairs.
updates = [(classifier.W, classifier.W - learning_rate * g_W),
(classifier.b, classifier.b - learning_rate * g_b)] # compiling a Theano function `train_model` that returns the cost, but in
# the same time updates the parameter of the model based on the rules
# defined in `updates`
train_model = theano.function(
inputs=[index],
outputs=cost,
updates=updates,
givens={
x: train_set_x[index * batch_size: (index + 1) * batch_size],
y: train_set_y[index * batch_size: (index + 1) * batch_size]
}
)

updates是一个对列表。在每个对中,第一个元素是符号变量用来在每一步中进行更新,第二个元素是符号函数用来计算新的值。相似的,givens是一个字典,它的键是符号变量,它的值是在每一步中指定的替换值。函数train_model可以被如下规则定义:

  • 输入是minibatch的索引 index 和
    batch size (这不是输入,因为它是固定的) 用来定义 Theano3.3-练习之逻辑回归 和对应的标签 Theano3.3-练习之逻辑回归
  • 返回值是由索引index定义的x和y 的cost/loss函数值
  • 在每次的函数调用上,首先会通过索引index切片的训练集来替换x 和y .然后,通过这个新的minibatch来估算对应的cost函数值,然后使用updates列表中定义的操作.

每次train_model(index)被调用的时候,它都会计算一个minibatch的cost然后返回,同时执行一步MSGD操作。整个学习算法会在数据集的所有样本上进行循环,考虑到一次只有一个minibatch中的所有样本,需要重复的调用train_model函数。

五、测试该模型

正如Learning
a Classifier
中说的,当测试这个模型,感兴趣于其中有多少误分类的样本(不止是在似然函数中)。LogisticRegression类会有一个额外的实例方法,用来建立符号图,从而对每个minibatch中误分类样本进行检索。

代码如下:

   def errors(self, y):
"""Return a float representing the number of errors in the minibatch
over the total number of examples of the minibatch ; zero one
loss over the size of the minibatch :type y: theano.tensor.TensorType
:param y: corresponds to a vector that gives for each example the
correct label
""" # check if y has same dimension of y_pred
if y.ndim != self.y_pred.ndim:
raise TypeError(
'y should have the same shape as self.y_pred',
('y', y.type, 'y_pred', self.y_pred.type)
)
# check if y is of the correct datatype
if y.dtype.startswith('int'):
# the T.neq operator returns a vector of 0s and 1s, where 1
# represents a mistake in prediction
return T.mean(T.neq(self.y_pred, y))
else:
raise NotImplementedError()

然后我们创建一个函数test_model和一个函数validate_model。正如你将会看到的,validate_model是我们早期停止方法中的关键( Early-Stopping)。这些函数会将minibatch
index作为输入然后计算由模型误分类的序列号。这两个函数之间唯一的区别在于test_model是从测试集中提取minibatches,而validate_model是从验证集中提取的:

   # compiling a Theano function that computes the mistakes that are made by
# the model on a minibatch
test_model = theano.function(
inputs=[index],
outputs=classifier.errors(y),
givens={
x: test_set_x[index * batch_size: (index + 1) * batch_size],
y: test_set_y[index * batch_size: (index + 1) * batch_size]
}
) validate_model = theano.function(
inputs=[index],
outputs=classifier.errors(y),
givens={
x: valid_set_x[index * batch_size: (index + 1) * batch_size],
y: valid_set_y[index * batch_size: (index + 1) * batch_size]
}
)

六、合并上面所有的

完成的成品代码如下:

"""
This tutorial introduces logistic regression using Theano and stochastic
gradient descent. Logistic regression is a probabilistic, linear classifier. It is parametrized
by a weight matrix :math:`W` and a bias vector :math:`b`. Classification is
done by projecting data points onto a set of hyperplanes, the distance to
which is used to determine a class membership probability. Mathematically, this can be written as: .. math::
P(Y=i|x, W,b) &= softmax_i(W x + b) \\
&= \frac {e^{W_i x + b_i}} {\sum_j e^{W_j x + b_j}} The output of the model or prediction is then done by taking the argmax of
the vector whose i'th element is P(Y=i|x). .. math:: y_{pred} = argmax_i P(Y=i|x,W,b) This tutorial presents a stochastic gradient descent optimization method
suitable for large datasets. References: - textbooks: "Pattern Recognition and Machine Learning" -
Christopher M. Bishop, section 4.3.2 """
__docformat__ = 'restructedtext en' import cPickle
import gzip
import os
import sys
import time import numpy import theano
import theano.tensor as T class LogisticRegression(object):
"""Multi-class Logistic Regression Class The logistic regression is fully described by a weight matrix :math:`W`
and bias vector :math:`b`. Classification is done by projecting data
points onto a set of hyperplanes, the distance to which is used to
determine a class membership probability.
""" def __init__(self, input, n_in, n_out):
""" Initialize the parameters of the logistic regression :type input: theano.tensor.TensorType
:param input: symbolic variable that describes the input of the
architecture (one minibatch) :type n_in: int
:param n_in: number of input units, the dimension of the space in
which the datapoints lie :type n_out: int
:param n_out: number of output units, the dimension of the space in
which the labels lie """
# start-snippet-1
# initialize with 0 the weights W as a matrix of shape (n_in, n_out)
self.W = theano.shared(
value=numpy.zeros(
(n_in, n_out),
dtype=theano.config.floatX
),
name='W',
borrow=True
)
# initialize the baises b as a vector of n_out 0s
self.b = theano.shared(
value=numpy.zeros(
(n_out,),
dtype=theano.config.floatX
),
name='b',
borrow=True
) # symbolic expression for computing the matrix of class-membership
# probabilities
# Where:
# W is a matrix where column-k represent the separation hyper plain for
# class-k
# x is a matrix where row-j represents input training sample-j
# b is a vector where element-k represent the free parameter of hyper
# plain-k
self.p_y_given_x = T.nnet.softmax(T.dot(input, self.W) + self.b) # symbolic description of how to compute prediction as class whose
# probability is maximal
self.y_pred = T.argmax(self.p_y_given_x, axis=1)
# end-snippet-1 # parameters of the model
self.params = [self.W, self.b] def negative_log_likelihood(self, y):
"""Return the mean of the negative log-likelihood of the prediction
of this model under a given target distribution. .. math:: \frac{1}{|\mathcal{D}|} \mathcal{L} (\theta=\{W,b\}, \mathcal{D}) =
\frac{1}{|\mathcal{D}|} \sum_{i=0}^{|\mathcal{D}|}
\log(P(Y=y^{(i)}|x^{(i)}, W,b)) \\
\ell (\theta=\{W,b\}, \mathcal{D}) :type y: theano.tensor.TensorType
:param y: corresponds to a vector that gives for each example the
correct label Note: we use the mean instead of the sum so that
the learning rate is less dependent on the batch size
"""
# start-snippet-2
# y.shape[0] is (symbolically) the number of rows in y, i.e.,
# number of examples (call it n) in the minibatch
# T.arange(y.shape[0]) is a symbolic vector which will contain
# [0,1,2,... n-1] T.log(self.p_y_given_x) is a matrix of
# Log-Probabilities (call it LP) with one row per example and
# one column per class LP[T.arange(y.shape[0]),y] is a vector
# v containing [LP[0,y[0]], LP[1,y[1]], LP[2,y[2]], ...,
# LP[n-1,y[n-1]]] and T.mean(LP[T.arange(y.shape[0]),y]) is
# the mean (across minibatch examples) of the elements in v,
# i.e., the mean log-likelihood across the minibatch.
return -T.mean(T.log(self.p_y_given_x)[T.arange(y.shape[0]), y])
# end-snippet-2 def errors(self, y):
"""Return a float representing the number of errors in the minibatch
over the total number of examples of the minibatch ; zero one
loss over the size of the minibatch :type y: theano.tensor.TensorType
:param y: corresponds to a vector that gives for each example the
correct label
""" # check if y has same dimension of y_pred
if y.ndim != self.y_pred.ndim:
raise TypeError(
'y should have the same shape as self.y_pred',
('y', y.type, 'y_pred', self.y_pred.type)
)
# check if y is of the correct datatype
if y.dtype.startswith('int'):
# the T.neq operator returns a vector of 0s and 1s, where 1
# represents a mistake in prediction
return T.mean(T.neq(self.y_pred, y))
else:
raise NotImplementedError() def load_data(dataset):
''' Loads the dataset :type dataset: string
:param dataset: the path to the dataset (here MNIST)
''' #############
# LOAD DATA #
############# # Download the MNIST dataset if it is not present
data_dir, data_file = os.path.split(dataset)
if data_dir == "" and not os.path.isfile(dataset):
# Check if dataset is in the data directory.
new_path = os.path.join(
os.path.split(__file__)[0],
"..",
"data",
dataset
)
if os.path.isfile(new_path) or data_file == 'mnist.pkl.gz':
dataset = new_path if (not os.path.isfile(dataset)) and data_file == 'mnist.pkl.gz':
import urllib
origin = (
'http://www.iro.umontreal.ca/~lisa/deep/data/mnist/mnist.pkl.gz'
)
print 'Downloading data from %s' % origin
urllib.urlretrieve(origin, dataset) print '... loading data' # Load the dataset
f = gzip.open(dataset, 'rb')
train_set, valid_set, test_set = cPickle.load(f)
f.close()
#train_set, valid_set, test_set format: tuple(input, target)
#input is an numpy.ndarray of 2 dimensions (a matrix)
#witch row's correspond to an example. target is a
#numpy.ndarray of 1 dimensions (vector)) that have the same length as
#the number of rows in the input. It should give the target
#target to the example with the same index in the input. def shared_dataset(data_xy, borrow=True):
""" Function that loads the dataset into shared variables The reason we store our dataset in shared variables is to allow
Theano to copy it into the GPU memory (when code is run on GPU).
Since copying data into the GPU is slow, copying a minibatch everytime
is needed (the default behaviour if the data is not in a shared
variable) would lead to a large decrease in performance.
"""
data_x, data_y = data_xy
shared_x = theano.shared(numpy.asarray(data_x,
dtype=theano.config.floatX),
borrow=borrow)
shared_y = theano.shared(numpy.asarray(data_y,
dtype=theano.config.floatX),
borrow=borrow)
# When storing data on the GPU it has to be stored as floats
# therefore we will store the labels as ``floatX`` as well
# (``shared_y`` does exactly that). But during our computations
# we need them as ints (we use labels as index, and if they are
# floats it doesn't make sense) therefore instead of returning
# ``shared_y`` we will have to cast it to int. This little hack
# lets ous get around this issue
return shared_x, T.cast(shared_y, 'int32') test_set_x, test_set_y = shared_dataset(test_set)
valid_set_x, valid_set_y = shared_dataset(valid_set)
train_set_x, train_set_y = shared_dataset(train_set) rval = [(train_set_x, train_set_y), (valid_set_x, valid_set_y),
(test_set_x, test_set_y)]
return rval def sgd_optimization_mnist(learning_rate=0.13, n_epochs=1000,
dataset='mnist.pkl.gz',
batch_size=600):
"""
Demonstrate stochastic gradient descent optimization of a log-linear
model This is demonstrated on MNIST. :type learning_rate: float
:param learning_rate: learning rate used (factor for the stochastic
gradient) :type n_epochs: int
:param n_epochs: maximal number of epochs to run the optimizer :type dataset: string
:param dataset: the path of the MNIST dataset file from
http://www.iro.umontreal.ca/~lisa/deep/data/mnist/mnist.pkl.gz """
datasets = load_data(dataset) train_set_x, train_set_y = datasets[0]
valid_set_x, valid_set_y = datasets[1]
test_set_x, test_set_y = datasets[2] # compute number of minibatches for training, validation and testing
n_train_batches = train_set_x.get_value(borrow=True).shape[0] / batch_size
n_valid_batches = valid_set_x.get_value(borrow=True).shape[0] / batch_size
n_test_batches = test_set_x.get_value(borrow=True).shape[0] / batch_size ######################
# BUILD ACTUAL MODEL #
######################
print '... building the model' # allocate symbolic variables for the data
index = T.lscalar() # index to a [mini]batch # generate symbolic variables for input (x and y represent a
# minibatch)
x = T.matrix('x') # data, presented as rasterized images
y = T.ivector('y') # labels, presented as 1D vector of [int] labels # construct the logistic regression class
# Each MNIST image has size 28*28
classifier = LogisticRegression(input=x, n_in=28 * 28, n_out=10) # the cost we minimize during training is the negative log likelihood of
# the model in symbolic format
cost = classifier.negative_log_likelihood(y) # compiling a Theano function that computes the mistakes that are made by
# the model on a minibatch
test_model = theano.function(
inputs=[index],
outputs=classifier.errors(y),
givens={
x: test_set_x[index * batch_size: (index + 1) * batch_size],
y: test_set_y[index * batch_size: (index + 1) * batch_size]
}
) validate_model = theano.function(
inputs=[index],
outputs=classifier.errors(y),
givens={
x: valid_set_x[index * batch_size: (index + 1) * batch_size],
y: valid_set_y[index * batch_size: (index + 1) * batch_size]
}
) # compute the gradient of cost with respect to theta = (W,b)
g_W = T.grad(cost=cost, wrt=classifier.W)
g_b = T.grad(cost=cost, wrt=classifier.b) # start-snippet-3
# specify how to update the parameters of the model as a list of
# (variable, update expression) pairs.
updates = [(classifier.W, classifier.W - learning_rate * g_W),
(classifier.b, classifier.b - learning_rate * g_b)] # compiling a Theano function `train_model` that returns the cost, but in
# the same time updates the parameter of the model based on the rules
# defined in `updates`
train_model = theano.function(
inputs=[index],
outputs=cost,
updates=updates,
givens={
x: train_set_x[index * batch_size: (index + 1) * batch_size],
y: train_set_y[index * batch_size: (index + 1) * batch_size]
}
)
# end-snippet-3 ###############
# TRAIN MODEL #
###############
print '... training the model'
# early-stopping parameters
patience = 5000 # look as this many examples regardless
patience_increase = 2 # wait this much longer when a new best is
# found
improvement_threshold = 0.995 # a relative improvement of this much is
# considered significant
validation_frequency = min(n_train_batches, patience / 2)
# go through this many
# minibatche before checking the network
# on the validation set; in this case we
# check every epoch best_validation_loss = numpy.inf
test_score = 0.
start_time = time.clock() done_looping = False
epoch = 0
while (epoch < n_epochs) and (not done_looping):
epoch = epoch + 1
for minibatch_index in xrange(n_train_batches): minibatch_avg_cost = train_model(minibatch_index)
# iteration number
iter = (epoch - 1) * n_train_batches + minibatch_index if (iter + 1) % validation_frequency == 0:
# compute zero-one loss on validation set
validation_losses = [validate_model(i)
for i in xrange(n_valid_batches)]
this_validation_loss = numpy.mean(validation_losses) print(
'epoch %i, minibatch %i/%i, validation error %f %%' %
(
epoch,
minibatch_index + 1,
n_train_batches,
this_validation_loss * 100.
)
) # if we got the best validation score until now
if this_validation_loss < best_validation_loss:
#improve patience if loss improvement is good enough
if this_validation_loss < best_validation_loss * \
improvement_threshold:
patience = max(patience, iter * patience_increase) best_validation_loss = this_validation_loss
# test it on the test set test_losses = [test_model(i)
for i in xrange(n_test_batches)]
test_score = numpy.mean(test_losses) print(
(
' epoch %i, minibatch %i/%i, test error of'
' best model %f %%'
) %
(
epoch,
minibatch_index + 1,
n_train_batches,
test_score * 100.
)
) if patience <= iter:
done_looping = True
break end_time = time.clock()
print(
(
'Optimization complete with best validation score of %f %%,'
'with test performance %f %%'
)
% (best_validation_loss * 100., test_score * 100.)
)
print 'The code run for %d epochs, with %f epochs/sec' % (
epoch, 1. * epoch / (end_time - start_time))
print >> sys.stderr, ('The code for file ' +
os.path.split(__file__)[1] +
' ran for %.1fs' % ((end_time - start_time))) if __name__ == '__main__':
sgd_optimization_mnist()

用户可以学着使用SGD逻辑回归来分类MNIST数字,从dl教程文件夹中找到如下的代码执行:

python code/logistic_sgd.py

输出会如下:

...
epoch 72, minibatch 83/83, validation error 7.510417 %
epoch 72, minibatch 83/83, test error of best model 7.510417 %
epoch 73, minibatch 83/83, validation error 7.500000 %
epoch 73, minibatch 83/83, test error of best model 7.489583 %
Optimization complete with best validation score of 7.500000 %,with test performance 7.489583 %
The code run for 74 epochs, with 1.936983 epochs/sec

在intel(R)Core(TM)2 Duo CPU E8400 @ 3.00Ghz下,代码差不多速度为1.936 epochs/sec,在执行了75个epochs之后,达到的测试错误率为7.489%。在GPU上执行差不多是10.0epochs/sec。这个例子中我们的batch size为600.

脚注:

[1] 对于更小的数据集和更简单的模型,更多老练的下降方法将会更有效。例子代码logistic_cg.py显示如何使用Scipy的共轭梯度
来在Theano上解决逻辑回归的问题。

下面是在win7_64bit+cuda6.5+anaconda_2.1.0+theano_0.7.0下跑的logistic_cg.py的结果:

Theano3.3-练习之逻辑回归

参考资料:

[1] 官网:http://deeplearning.net/tutorial/logreg.html#logreg

[2] Classifying MNIST digits using Logistic Regression :http://blog.sina.com.cn/s/blog_6caa9fa10101m33n.html

[3]  DeepLearning tutorial(1)Softmax回归原理简介+代码详解 :http://blog.csdn.net/u012162613/article/details/43157801

[4]

Theano3.3-练习之逻辑回归的更多相关文章

  1. 逻辑回归 Logistic Regression

    逻辑回归(Logistic Regression)是广义线性回归的一种.逻辑回归是用来做分类任务的常用算法.分类任务的目标是找一个函数,把观测值匹配到相关的类和标签上.比如一个人有没有病,又因为噪声的 ...

  2. 用R做逻辑回归之汽车贷款违约模型

    数据说明 本数据是一份汽车贷款违约数据 application_id    申请者ID account_number 账户号 bad_ind            是否违约 vehicle_year  ...

  3. 逻辑回归(LR)总结复习

    摘要: 1.算法概述 2.算法推导 3.算法特性及优缺点 4.注意事项 5.实现和具体例子 6.适用场合 内容: 1.算法概述 最基本的LR分类器适合于对两分类(类0,类1)目标进行分类:这个模型以样 ...

  4. scikit-learn 逻辑回归类库使用小结

    之前在逻辑回归原理小结这篇文章中,对逻辑回归的原理做了小结.这里接着对scikit-learn中逻辑回归类库的我的使用经验做一个总结.重点讲述调参中要注意的事项. 1. 概述 在scikit-lear ...

  5. 逻辑回归LR

    逻辑回归算法相信很多人都很熟悉,也算是我比较熟悉的算法之一了,毕业论文当时的项目就是用的这个算法.这个算法可能不想随机森林.SVM.神经网络.GBDT等分类算法那么复杂那么高深的样子,可是绝对不能小看 ...

  6. 逻辑回归(Logistic Regression)

    转载请注明出自BYRans博客:http://www.cnblogs.com/BYRans/ 本文主要讲解分类问题中的逻辑回归.逻辑回归是一个二分类问题. 二分类问题 二分类问题是指预测的y值只有两个 ...

  7. 逻辑回归算法的原理及实现&lpar;LR&rpar;

    Logistic回归虽然名字叫"回归" ,但却是一种分类学习方法.使用场景大概有两个:第一用来预测,第二寻找因变量的影响因素.逻辑回归(Logistic Regression, L ...

  8. 感知器、逻辑回归和SVM的求解

    这篇文章将介绍感知器.逻辑回归的求解和SVM的部分求解,包含部分的证明.本文章涉及的一些基础知识,已经在<梯度下降.牛顿法和拉格朗日对偶性>中指出,而这里要解决的问题,来自<从感知器 ...

  9. stanford coursera 机器学习编程作业 exercise 3(逻辑回归实现多分类问题)

    本作业使用逻辑回归(logistic regression)和神经网络(neural networks)识别手写的阿拉伯数字(0-9) 关于逻辑回归的一个编程练习,可参考:http://www.cnb ...

随机推荐

  1. 精简版StringBuilder&comma;提速字符串拼接

    编写目的 在频繁的字符串拼接中,为了提升程序的性能,我们往往会用StringBuilder代替String+=String这样的操作; 而我在实际编码中发现,大部分情况下我用到的只是StringBui ...

  2. 【HDOJ】【3516】Tree Construction

    DP/四边形不等式 这题跟石子合并有点像…… dp[i][j]为将第 i 个点开始的 j 个点合并的最小代价. 易知有 dp[i][j]=min{dp[i][j] , dp[i][k-i+1]+dp[ ...

  3. Android 对话框弹出位置和透明度的设置

    在Android中 我们经常会用AlertDialog来显示对话框.通过这个对话框是显示在屏幕中心的.但在某些程序中,要求对话框可以显示在不同的位置.例如,屏幕的上 方或下方.要实现这种效果.就需要获 ...

  4. pyhton购物程序

    要求: 启动程序后,让用户输入工资,然后打印出带有序号的商品列表 用户输入商品序号购买相应的商品,或者输入 ' q ' 退出购买界面 选择商品后,检查余额是否足够,够则直接扣款,不够则提示余额不足 用 ...

  5. AFN和SDWebImage请求网络图片的一点问题

    问题1.AFN 处理有关图片相关的请求的问题 在使用AFN Post网络图片的时候发现NSLocalizedDescription=Request failed: unacceptable conte ...

  6. HTML基础教程-元素

    HTML 元素 HTML 文档是由 HTML 元素定义的. HTML 元素 HTML 元素指的是从开始标签(start tag)到结束标签(end tag)的所有代码. 注释:开始标签常被称为开放标签 ...

  7. 警惕phpstudy等开发神器使用默认配置可能带来的危险

    0x00 前言 其实这个点早在之前,我就已经想到了,当时也觉得没啥就记在了我的印象笔记里. 而今天重新把这个点拿出来讲,主要是因为今天早上在温习nmap的时候,一不小心利用这个点,拿下了一位同事的电脑 ...

  8. LAMP环境配置安装注意安装步骤及说明事项

    一.安装gcc shell># yum -y install gcc 二.安装zlib压缩库 shell>## cd /home/hsk/tar shell># tar –zxvf ...

  9. ThinkPHP5&period;0源码学习之注册自动加载

    ThinkPHP5框架的自动注册加载流程如下:

  10. 玩转X-CTR100 l STM32F4 l DRV8825 A4988 步进电机控制

    我造*,你造车,创客一起造起来!塔克创新资讯[塔克社区 www.xtark.cn ][塔克博客 www.cnblogs.com/xtark/ ]      本文介绍X-CTR100控制器控制步进电机 ...