
字段名 |
含义 |
类型 |
age |
年龄 |
连续变量 |
workclass |
工作类别 |
分类变量,用0-7表示,Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked, |
fnlwgt |
序号 |
连续变量 |
education |
教育程度 |
分类变量,0-15表示,Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool. |
education_num |
受教育时间(年) |
连续变量 |
maritial_status |
婚姻状况 |
分类变量,用0-6表示 Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse |
occupation |
职业 |
分类变量,0-13表示 Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces. |
relationship |
社会关系 |
分类变量,0-5表示 Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried |
race |
种族 |
分类变量,0-4表示 White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black |
sex |
性别 |
分类变量,0-1表示 Female, Male |
capital_gain |
资本收益 |
连续变量 |
capital_loss |
资本消耗 |
连续变量 |
hours_per_week |
每周工作小时数 |
连续变量 |
native_country |
原籍(国家) |
分类变量0-39表示 United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador, *, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong, Holand-Netherlands. |
income |
收入 |
分类变量0,1 表示 <=50K, >50K |


clear all; close all; clc data = load('adult_train.txt');
x = data(:,1:14);
y = data(:,15);
m = length(y); % 样本数目
x = [ones(m, 1), x]; % 输入特征增加一列,x0=1
meanx = mean(x);%求均值
sigmax = std(x);%求标准偏差
x(:,2) = (x(:,2)-meanx(2))./sigmax(2);
x(:,3) = (x(:,3)-meanx(3))./sigmax(3);
x(:,4) = (x(:,4)-meanx(4))./sigmax(4);
x(:,5) = (x(:,5)-meanx(5))./sigmax(5);
x(:,6) = (x(:,6)-meanx(6))./sigmax(6);
x(:,7) = (x(:,7)-meanx(7))./sigmax(7);
x(:,8) = (x(:,8)-meanx(8))./sigmax(8);
x(:,9) = (x(:,9)-meanx(9))./sigmax(9);
x(:,10) = (x(:,10)-meanx(10))./sigmax(10);
x(:,11) = (x(:,11)-meanx(11))./sigmax(11);
x(:,12) = (x(:,12)-meanx(12))./sigmax(12);
x(:,13) = (x(:,13)-meanx(13))./sigmax(13);
x(:,14) = (x(:,14)-meanx(14))./sigmax(14);
x(:,15) = (x(:,15)-meanx(15))./sigmax(15);
theta = zeros(size(x(1,:)))'; % 初始化theta g = inline('1.0 ./ (1.0 + exp(-z))'); %定义logistic函数 % Newton's method
MAX_ITR = 7;
J = zeros(MAX_ITR, 1); for i = 1:MAX_ITR
% Calculate the hypothesis function
z = x * theta;
h = g(z);%转换成logistic函数 % Calculate gradient and hessian.
% The formulas below are equivalent to the summation formulas
% given in the lecture videos.
grad = (1/m).*x' * (h-y);%梯度的矢量表示法
H = (1/m).*x' * diag(h) * diag(1-h) * x;%hessian矩阵的矢量表示法 % Calculate J (for testing convergence)
J(i) =(1/m)*sum(-y.*log(h) - (1-y).*log(1-h));%损失函数的矢量表示法 theta = theta - H\grad;%H\逆矩阵
% Display theta
data1 = load('verify.txt');
x1 = data1(:,1:14);
y1 = data1(:,15);
m1 = length(y1);
x1 = [ones(m1, 1), x1]; meanx1 = mean(x1);%求均值
sigmax1 = std(x1);%求标准偏差
x1(:,2) = (x1(:,2)-meanx1(2))./sigmax1(2);
x1(:,3) = (x1(:,3)-meanx1(3))./sigmax1(3);
x1(:,4) = (x1(:,4)-meanx1(4))./sigmax1(4);
x1(:,5) = (x1(:,5)-meanx1(5))./sigmax1(5);
x1(:,6) = (x1(:,6)-meanx1(6))./sigmax1(6);
x1(:,7) = (x1(:,7)-meanx1(7))./sigmax1(7);
x1(:,8) = (x1(:,8)-meanx1(8))./sigmax1(8);
x1(:,9) = (x1(:,9)-meanx1(9))./sigmax1(9);
x1(:,10) = (x1(:,10)-meanx1(10))./sigmax1(10);
x1(:,11) = (x1(:,11)-meanx1(11))./sigmax1(11);
x1(:,12) = (x1(:,12)-meanx1(12))./sigmax1(12);
x1(:,13) = (x1(:,13)-meanx1(13))./sigmax1(13);
x1(:,14) = (x1(:,14)-meanx1(14))./sigmax1(14);
x1(:,15) = (x1(:,15)-meanx1(15))./sigmax1(15)
y2 = g(x1*theta);