使用python和numpy的梯度下降。

时间:2022-02-01 01:48:23
def gradient(X_norm,y,theta,alpha,m,n,num_it):
    temp=np.array(np.zeros_like(theta,float))
    for i in range(0,num_it):
        h=np.dot(X_norm,theta)
        #temp[j]=theta[j]-(alpha/m)*(  np.sum( (h-y)*X_norm[:,j][np.newaxis,:] )  )
        temp[0]=theta[0]-(alpha/m)*(np.sum(h-y))
        temp[1]=theta[1]-(alpha/m)*(np.sum((h-y)*X_norm[:,1]))
        theta=temp
    return theta



X_norm,mean,std=featureScale(X)
#length of X (number of rows)
m=len(X)
X_norm=np.array([np.ones(m),X_norm])
n,m=np.shape(X_norm)
num_it=1500
alpha=0.01
theta=np.zeros(n,float)[:,np.newaxis]
X_norm=X_norm.transpose()
theta=gradient(X_norm,y,theta,alpha,m,n,num_it)
print theta

My theta from the above code is 100.2 100.2, but it should be 100.2 61.09 in matlab which is correct.

上面的代码是100.2 100.2,但在matlab中应该是100.2 61.09,这是正确的。

4 个解决方案

#1


109  

I think your code is a bit too complicated and it needs more structure, because otherwise you'll be lost in all equations and operations. In the end this regression boils down to four operations:

我认为你的代码有点太复杂了,需要更多的结构,否则你会在所有的方程式和操作中迷失。最后,这个回归归结为四个操作:

  1. Calculate the hypothesis h = X * theta
  2. 计算假设h = X *。
  3. Calculate the loss = h - y and maybe the squared cost (loss^2)/2m
  4. 计算损失= h - y,也许平方成本(损失2)/2m。
  5. Calculate the gradient = X' * loss / m
  6. 计算梯度= X' *损失/ m。
  7. Update the parameters theta = theta - alpha * gradient
  8. 更新参数= - alpha *梯度。

In your case, I guess you have confused m with n. Here m denotes the number of examples in your training set, not the number of features.

在你的例子中,我想你把m和n混淆了。这里m表示训练集的例子个数,而不是特征数。

Let's have a look at my variation of your code:

让我们来看看我的代码变化:

import numpy as np
import random

# m denotes the number of examples here, not the number of features
def gradientDescent(x, y, theta, alpha, m, numIterations):
    xTrans = x.transpose()
    for i in range(0, numIterations):
        hypothesis = np.dot(x, theta)
        loss = hypothesis - y
        # avg cost per example (the 2 in 2*m doesn't really matter here.
        # But to be consistent with the gradient, I include it)
        cost = np.sum(loss ** 2) / (2 * m)
        print("Iteration %d | Cost: %f" % (i, cost))
        # avg gradient per example
        gradient = np.dot(xTrans, loss) / m
        # update
        theta = theta - alpha * gradient
    return theta


def genData(numPoints, bias, variance):
    x = np.zeros(shape=(numPoints, 2))
    y = np.zeros(shape=numPoints)
    # basically a straight line
    for i in range(0, numPoints):
        # bias feature
        x[i][0] = 1
        x[i][1] = i
        # our target variable
        y[i] = (i + bias) + random.uniform(0, 1) * variance
    return x, y

# gen 100 points with a bias of 25 and 10 variance as a bit of noise
x, y = genData(100, 25, 10)
m, n = np.shape(x)
numIterations= 100000
alpha = 0.0005
theta = np.ones(n)
theta = gradientDescent(x, y, theta, alpha, m, numIterations)
print(theta)

At first I create a small random dataset which should look like this:

首先,我创建一个小的随机数据集,它应该是这样的:

使用python和numpy的梯度下降。

As you can see I also added the generated regression line and formula that was calculated by excel.

您可以看到,我还添加了由excel计算的生成的回归线和公式。

You need to take care about the intuition of the regression using gradient descent. As you do a complete batch pass over your data X, you need to reduce the m-losses of every example to a single weight update. In this case, this is the average of the sum over the gradients, thus the division by m.

你需要注意使用梯度下降的回归直觉。当您对数据X进行完整的批处理时,您需要将每个示例的m损失减少到一个单一的权重更新。在这种情况下,这是梯度上的和的平均值,因此除以m。

The next thing you need to take care about is to track the convergence and adjust the learning rate. For that matter you should always track your cost every iteration, maybe even plot it.

接下来需要注意的是跟踪并调整学习速率。对于这个问题,你应该总是跟踪你的成本,每一次迭代,甚至可能绘制它。

If you run my example, the theta returned will look like this:

如果你运行我的例子,返回的是这样的:

Iteration 99997 | Cost: 47883.706462
Iteration 99998 | Cost: 47883.706462
Iteration 99999 | Cost: 47883.706462
[ 29.25567368   1.01108458]

Which is actually quite close to the equation that was calculated by excel (y = x + 30). Note that as we passed the bias into the first column, the first theta value denotes the bias weight.

这与excel计算的方程很接近(y = x + 30)注意,当我们将偏差传递到第一列时,第一个值表示偏置的权重。

#2


8  

Below you can find my implementation of gradient descent for linear regression problem.

下面你可以找到我的线性回归问题梯度下降的实现。

At first, you calculate gradient like X.T * (X * w - y) / N and update your current theta with this gradient simultaneously.

首先,计算梯度,比如X。T * (X * w - y) / N,同时用这个梯度来更新你的电流。

  • X: feature matrix
  • X:特征矩阵
  • y: target values
  • y:目标值
  • w: weights/values
  • w:重量/值
  • N: size of training set
  • N:训练集的大小。

Here is the python code:

下面是python代码:

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import random

def generateSample(N, variance=100):
    X = np.matrix(range(N)).T + 1
    Y = np.matrix([random.random() * variance + i * 10 + 900 for i in range(len(X))]).T
    return X, Y

def fitModel_gradient(x, y):
    N = len(x)
    w = np.zeros((x.shape[1], 1))
    eta = 0.0001

    maxIteration = 100000
    for i in range(maxIteration):
        error = x * w - y
        gradient = x.T * error / N
        w = w - eta * gradient
    return w

def plotModel(x, y, w):
    plt.plot(x[:,1], y, "x")
    plt.plot(x[:,1], x * w, "r-")
    plt.show()

def test(N, variance, modelFunction):
    X, Y = generateSample(N, variance)
    X = np.hstack([np.matrix(np.ones(len(X))).T, X])
    w = modelFunction(X, Y)
    plotModel(X, Y, w)


test(50, 600, fitModel_gradient)
test(50, 1000, fitModel_gradient)
test(100, 200, fitModel_gradient)

使用python和numpy的梯度下降。 使用python和numpy的梯度下降。 使用python和numpy的梯度下降。

#3


1  

I know this question already have been answer but I have made some update to the GD function :

我知道这个问题已经回答了,但是我已经对GD的功能做了一些更新:

  ### COST FUNCTION

def cost(theta,X,y):
     ### Evaluate half MSE (Mean square error)
     m = len(y)
     error = np.dot(X,theta) - y
     J = np.sum(error ** 2)/(2*m)
     return J

 cost(theta,X,y)



def GD(X,y,theta,alpha):

    cost_histo = [0]
    theta_histo = [0]

    # an arbitrary gradient, to pass the initial while() check
    delta = [np.repeat(1,len(X))]
    # Initial theta
    old_cost = cost(theta,X,y)

    while (np.max(np.abs(delta)) > 1e-6):
        error = np.dot(X,theta) - y
        delta = np.dot(np.transpose(X),error)/len(y)
        trial_theta = theta - alpha * delta
        trial_cost = cost(trial_theta,X,y)
        while (trial_cost >= old_cost):
            trial_theta = (theta +trial_theta)/2
            trial_cost = cost(trial_theta,X,y)
            cost_histo = cost_histo + trial_cost
            theta_histo = theta_histo +  trial_theta
        old_cost = trial_cost
        theta = trial_theta
    Intercept = theta[0] 
    Slope = theta[1]  
    return [Intercept,Slope]

res = GD(X,y,theta,alpha)

This function reduce the alpha over the iteration making the function too converge faster see Estimating linear regression with Gradient Descent (Steepest Descent) for an example in R. I apply the same logic but in Python.

这个函数在迭代过程中减少了alpha值,使得函数的收敛速度更快,在r中使用梯度下降(最陡峭的下降)来估计线性回归。

#4


0  

Following @thomas-jungblut implementation in python, i did the same for Octave. If you find something wrong please let me know and i will fix+update.

在python中的@thomas-jungblut实现之后,我对Octave做了同样的操作。如果你发现有问题,请告诉我,我会修正+更新。

Data comes from a txt file with the following rows:

数据来自一个txt文件,包含以下行:

1 10 1000
2 20 2500
3 25 3500
4 40 5500
5 60 6200

think about it as a very rough sample for features [number of bedrooms] [mts2] and last column [rent price] which is what we want to predict.

你可以把它看作是一个非常粗略的样本,它的特征是(卧室的数量)[mts2]和最后一列[房租价格],这是我们想要预测的。

Here is the Octave implementation:

这里是Octave的实现:

%
% Linear Regression with multiple variables
%

% Alpha for learning curve
alphaNum = 0.0005;

% Number of features
n = 2;

% Number of iterations for Gradient Descent algorithm
iterations = 10000

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% No need to update after here
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

DATA = load('CHANGE_WITH_DATA_FILE_PATH');

% Initial theta values
theta = ones(n + 1, 1);

% Number of training samples
m = length(DATA(:, 1));

% X with one mor column (x0 filled with '1's)
X = ones(m, 1);
for i = 1:n
  X = [X, DATA(:,i)];
endfor

% Expected data must go always in the last column  
y = DATA(:, n + 1)

function gradientDescent(x, y, theta, alphaNum, iterations)
  iterations = [];
  costs = [];

  m = length(y);

  for iteration = 1:10000
    hypothesis = x * theta;

    loss = hypothesis - y;

    % J(theta)    
    cost = sum(loss.^2) / (2 * m);

    % Save for the graphic to see if the algorithm did work
    iterations = [iterations, iteration];
    costs = [costs, cost];

    gradient = (x' * loss) / m; % /m is for the average

    theta = theta - (alphaNum * gradient);
  endfor    

  % Show final theta values
  display(theta)

  % Show J(theta) graphic evolution to check it worked, tendency must be zero
  plot(iterations, costs);

endfunction

% Execute gradient descent
gradientDescent(X, y, theta, alphaNum, iterations);

#1


109  

I think your code is a bit too complicated and it needs more structure, because otherwise you'll be lost in all equations and operations. In the end this regression boils down to four operations:

我认为你的代码有点太复杂了,需要更多的结构,否则你会在所有的方程式和操作中迷失。最后,这个回归归结为四个操作:

  1. Calculate the hypothesis h = X * theta
  2. 计算假设h = X *。
  3. Calculate the loss = h - y and maybe the squared cost (loss^2)/2m
  4. 计算损失= h - y,也许平方成本(损失2)/2m。
  5. Calculate the gradient = X' * loss / m
  6. 计算梯度= X' *损失/ m。
  7. Update the parameters theta = theta - alpha * gradient
  8. 更新参数= - alpha *梯度。

In your case, I guess you have confused m with n. Here m denotes the number of examples in your training set, not the number of features.

在你的例子中,我想你把m和n混淆了。这里m表示训练集的例子个数,而不是特征数。

Let's have a look at my variation of your code:

让我们来看看我的代码变化:

import numpy as np
import random

# m denotes the number of examples here, not the number of features
def gradientDescent(x, y, theta, alpha, m, numIterations):
    xTrans = x.transpose()
    for i in range(0, numIterations):
        hypothesis = np.dot(x, theta)
        loss = hypothesis - y
        # avg cost per example (the 2 in 2*m doesn't really matter here.
        # But to be consistent with the gradient, I include it)
        cost = np.sum(loss ** 2) / (2 * m)
        print("Iteration %d | Cost: %f" % (i, cost))
        # avg gradient per example
        gradient = np.dot(xTrans, loss) / m
        # update
        theta = theta - alpha * gradient
    return theta


def genData(numPoints, bias, variance):
    x = np.zeros(shape=(numPoints, 2))
    y = np.zeros(shape=numPoints)
    # basically a straight line
    for i in range(0, numPoints):
        # bias feature
        x[i][0] = 1
        x[i][1] = i
        # our target variable
        y[i] = (i + bias) + random.uniform(0, 1) * variance
    return x, y

# gen 100 points with a bias of 25 and 10 variance as a bit of noise
x, y = genData(100, 25, 10)
m, n = np.shape(x)
numIterations= 100000
alpha = 0.0005
theta = np.ones(n)
theta = gradientDescent(x, y, theta, alpha, m, numIterations)
print(theta)

At first I create a small random dataset which should look like this:

首先,我创建一个小的随机数据集,它应该是这样的:

使用python和numpy的梯度下降。

As you can see I also added the generated regression line and formula that was calculated by excel.

您可以看到,我还添加了由excel计算的生成的回归线和公式。

You need to take care about the intuition of the regression using gradient descent. As you do a complete batch pass over your data X, you need to reduce the m-losses of every example to a single weight update. In this case, this is the average of the sum over the gradients, thus the division by m.

你需要注意使用梯度下降的回归直觉。当您对数据X进行完整的批处理时,您需要将每个示例的m损失减少到一个单一的权重更新。在这种情况下,这是梯度上的和的平均值,因此除以m。

The next thing you need to take care about is to track the convergence and adjust the learning rate. For that matter you should always track your cost every iteration, maybe even plot it.

接下来需要注意的是跟踪并调整学习速率。对于这个问题,你应该总是跟踪你的成本,每一次迭代,甚至可能绘制它。

If you run my example, the theta returned will look like this:

如果你运行我的例子,返回的是这样的:

Iteration 99997 | Cost: 47883.706462
Iteration 99998 | Cost: 47883.706462
Iteration 99999 | Cost: 47883.706462
[ 29.25567368   1.01108458]

Which is actually quite close to the equation that was calculated by excel (y = x + 30). Note that as we passed the bias into the first column, the first theta value denotes the bias weight.

这与excel计算的方程很接近(y = x + 30)注意,当我们将偏差传递到第一列时,第一个值表示偏置的权重。

#2


8  

Below you can find my implementation of gradient descent for linear regression problem.

下面你可以找到我的线性回归问题梯度下降的实现。

At first, you calculate gradient like X.T * (X * w - y) / N and update your current theta with this gradient simultaneously.

首先,计算梯度,比如X。T * (X * w - y) / N,同时用这个梯度来更新你的电流。

  • X: feature matrix
  • X:特征矩阵
  • y: target values
  • y:目标值
  • w: weights/values
  • w:重量/值
  • N: size of training set
  • N:训练集的大小。

Here is the python code:

下面是python代码:

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import random

def generateSample(N, variance=100):
    X = np.matrix(range(N)).T + 1
    Y = np.matrix([random.random() * variance + i * 10 + 900 for i in range(len(X))]).T
    return X, Y

def fitModel_gradient(x, y):
    N = len(x)
    w = np.zeros((x.shape[1], 1))
    eta = 0.0001

    maxIteration = 100000
    for i in range(maxIteration):
        error = x * w - y
        gradient = x.T * error / N
        w = w - eta * gradient
    return w

def plotModel(x, y, w):
    plt.plot(x[:,1], y, "x")
    plt.plot(x[:,1], x * w, "r-")
    plt.show()

def test(N, variance, modelFunction):
    X, Y = generateSample(N, variance)
    X = np.hstack([np.matrix(np.ones(len(X))).T, X])
    w = modelFunction(X, Y)
    plotModel(X, Y, w)


test(50, 600, fitModel_gradient)
test(50, 1000, fitModel_gradient)
test(100, 200, fitModel_gradient)

使用python和numpy的梯度下降。 使用python和numpy的梯度下降。 使用python和numpy的梯度下降。

#3


1  

I know this question already have been answer but I have made some update to the GD function :

我知道这个问题已经回答了,但是我已经对GD的功能做了一些更新:

  ### COST FUNCTION

def cost(theta,X,y):
     ### Evaluate half MSE (Mean square error)
     m = len(y)
     error = np.dot(X,theta) - y
     J = np.sum(error ** 2)/(2*m)
     return J

 cost(theta,X,y)



def GD(X,y,theta,alpha):

    cost_histo = [0]
    theta_histo = [0]

    # an arbitrary gradient, to pass the initial while() check
    delta = [np.repeat(1,len(X))]
    # Initial theta
    old_cost = cost(theta,X,y)

    while (np.max(np.abs(delta)) > 1e-6):
        error = np.dot(X,theta) - y
        delta = np.dot(np.transpose(X),error)/len(y)
        trial_theta = theta - alpha * delta
        trial_cost = cost(trial_theta,X,y)
        while (trial_cost >= old_cost):
            trial_theta = (theta +trial_theta)/2
            trial_cost = cost(trial_theta,X,y)
            cost_histo = cost_histo + trial_cost
            theta_histo = theta_histo +  trial_theta
        old_cost = trial_cost
        theta = trial_theta
    Intercept = theta[0] 
    Slope = theta[1]  
    return [Intercept,Slope]

res = GD(X,y,theta,alpha)

This function reduce the alpha over the iteration making the function too converge faster see Estimating linear regression with Gradient Descent (Steepest Descent) for an example in R. I apply the same logic but in Python.

这个函数在迭代过程中减少了alpha值,使得函数的收敛速度更快,在r中使用梯度下降(最陡峭的下降)来估计线性回归。

#4


0  

Following @thomas-jungblut implementation in python, i did the same for Octave. If you find something wrong please let me know and i will fix+update.

在python中的@thomas-jungblut实现之后,我对Octave做了同样的操作。如果你发现有问题,请告诉我,我会修正+更新。

Data comes from a txt file with the following rows:

数据来自一个txt文件,包含以下行:

1 10 1000
2 20 2500
3 25 3500
4 40 5500
5 60 6200

think about it as a very rough sample for features [number of bedrooms] [mts2] and last column [rent price] which is what we want to predict.

你可以把它看作是一个非常粗略的样本,它的特征是(卧室的数量)[mts2]和最后一列[房租价格],这是我们想要预测的。

Here is the Octave implementation:

这里是Octave的实现:

%
% Linear Regression with multiple variables
%

% Alpha for learning curve
alphaNum = 0.0005;

% Number of features
n = 2;

% Number of iterations for Gradient Descent algorithm
iterations = 10000

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% No need to update after here
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

DATA = load('CHANGE_WITH_DATA_FILE_PATH');

% Initial theta values
theta = ones(n + 1, 1);

% Number of training samples
m = length(DATA(:, 1));

% X with one mor column (x0 filled with '1's)
X = ones(m, 1);
for i = 1:n
  X = [X, DATA(:,i)];
endfor

% Expected data must go always in the last column  
y = DATA(:, n + 1)

function gradientDescent(x, y, theta, alphaNum, iterations)
  iterations = [];
  costs = [];

  m = length(y);

  for iteration = 1:10000
    hypothesis = x * theta;

    loss = hypothesis - y;

    % J(theta)    
    cost = sum(loss.^2) / (2 * m);

    % Save for the graphic to see if the algorithm did work
    iterations = [iterations, iteration];
    costs = [costs, cost];

    gradient = (x' * loss) / m; % /m is for the average

    theta = theta - (alphaNum * gradient);
  endfor    

  % Show final theta values
  display(theta)

  % Show J(theta) graphic evolution to check it worked, tendency must be zero
  plot(iterations, costs);

endfunction

% Execute gradient descent
gradientDescent(X, y, theta, alphaNum, iterations);