A Brief Review of Supervised Learning

时间:2022-12-14 11:51:16

There are a number of algorithms that are typically used for system identification, adaptive control, adaptive signal processing, and machine learning. These algorithms all have particular similarities and differences. However, they all need to process some type of experimental data. How we collect the data and process it determines the most suitable algorithm to use. In adaptive control, there is a device referred to as the self-tuning regulator. In this case, the algorithm measures the states as outputs, estimates the model parameters, and outputs the control signals. In reinforcement learning, the algorithms process rewards, estimates value functions, and output actions. Although one may refer to the recursive least squares (RLS) algorithm in the self-tuning regulator as a supervised learning algorithm and reinforcement learning as an unsupervised learning algorithm, they are both very similar.

1.1 Least Squares Estimates

The least squares (LS) algorithm is a well-known and robust algorithm for fitting experimental data to a model. The first step is for the user to define a mathematical structure or model that he/she believes will fit the data. The second step is to design an experiment to collect data under suitable conditions. “Suitable conditions” usually means the operating conditions under which the system will typically operate. The next step is to run the estimation algorithm, which can take several forms, and, finally, validate the identified or “learned” model. The LS algorithm is often used to fit the data. Let us look at the case of the classical two-dimensional linear regression fit that we are all familiar with:

(0)A Brief Review of Supervised Learning

In this a simple linear regression model, where the input is the sampled signal and the output is . The model structure defined is a straight line. Therefore, we are assuming that the data collected will fit a straight line. This can be written in the form:A Brief Review of Supervised Learning

(0)A Brief Review of Supervised Learning

where and . How one chooses determines the model structure, and this reflects how one believes the data should behave. This is the essence of machine learning, and virtually all university students will at some point learn the basic statistics of linear regression. Behind the computations of the linear regression algorithm is the scalar cost

function, given byA Brief Review of Supervised Learning

(0)A Brief Review of Supervised Learning

The term is the estimate of the LS parameter . The goal is for the estimate to minimize the cost function . To find the “optimal” value of the parameter estimate , one takes the partial derivative of the cost function with respect to and sets this derivative to zero.

Therefore, one gets

(1)A Brief Review of Supervised Learning

Setting
, we get

A Brief Review of Supervised Learning

(1)A Brief Review of Supervised Learning

Solving
for , we get the LS solutionA Brief Review of Supervised Learning

(1)A Brief Review of Supervised Learning

where
the inverse, , exists. If
the inverse does not exists,
then the system is not identifiable. For example, if in the straight
line case one only had a single point, then the inverse would not
span the two-dimensional space and it would not exist. One needs at
least two independent points
to draw a straight line. Or, for example, if one had exactly the same
point over and over again, then the inverse would not exist. One
needs at least two independent points to draw a straight line. The
matrix is referred to as the information
matrix

and is related to how well one can estimate the parameters. The
inverse of the information matrix is the covariance matrix, and it is
proportional to the variance of the parameter estimates. Both these
matrices are positive definite and symmetric. These are very
important properties which are used extensively in analyzing

the
behavior of the algorithm. In the literature, one will often see the
covariance matrix referred to as . We can write the second equation
on the right of Eq. in the form :A Brief Review of Supervised Learning

(1)A Brief Review of Supervised Learning

and one
can define the prediction errors as

(1)A Brief Review of Supervised Learning

The
term within brackets in Eq. is known as the prediction
error

or, as some people will refer to it, the innovations.
The term represents the error in predicting the output of the
system. In this case, the output term is the correct answer, which
is what we want to estimate. Since we know the correct answer, this
is referred to as supervised
learning
.
Notice that the value of the prediction error times the data vector
is equal to zero. We then say that the prediction errors are
orthogonal to the data, or that the data sits in the null space of
the prediction errors. In simplistic terms, this means that, if one
has chosen a good model structure , then the
prediction errors should
appear as white
noise.
Always plot the prediction errors as a quick check to see how good
your predictor is. If the errors appear to be correlated (i.e., not
white noise), then you can improve your model and get a better
prediction.A Brief Review of Supervised Learning

One
does not typically write the linear regression in the form of Eq. ,
but typically will add a white noise term, and then the linear
regression takes the form

(1)A Brief Review of Supervised Learning

where
is a
white noise term.
Equation can represent an infinite number of possible model
structures. For example, let us assume that we want to learn the
dynamics of a
second-order linear system
or the parameters of a
second-order infinite impulse response (IIR)
filter.
Then we could choose the second-order model structure given by A Brief Review of Supervised Learning

(1)A Brief Review of Supervised Learning

Then
the model structure would be defined in asA Brief Review of Supervised Learning

(1)A Brief Review of Supervised Learning

In
general, one can write an arbitrary th-order autoregressive exogenous
(ARX) model

structure
as A Brief Review of Supervised Learning

(1)A Brief Review of Supervised Learning

and
takes the form A Brief Review of Supervised Learning

(1)A Brief Review of Supervised Learning

One
then collects the data from a suitable experiment (easier said than
done!), and then computes the parameters using Eq. The vector can
take many different forms; in fact, it can contain nonlinear
functions of the data,
for example, logarithmic terms or square terms, and it can have
different delay terms. To a large degree, one can use ones
professional judgment as to what to put into . One will often write
the data in the matrix form, in which case the matrix is defined as A Brief Review of Supervised Learning

(1)A Brief Review of Supervised Learning

and the
output matrix as

(1)A Brief Review of Supervised Learning

Then
one can write the LS estimate as

A Brief Review of Supervised Learning

Furthermore,
one can write the prediction errors as

A Brief Review of Supervised Learning

We can
also write the orthogonal condition as
A Brief Review of Supervised Learning

The LS
method of parameter identification or machine learning is very well
developed and there are many properties associated with the
technique. In fact, much of the work in statistical inference is
derived from the few equations described in this section. This is the
beginning of many scientific investigations including work in the
social sciences.

A Brief Review of Supervised Learning的更多相关文章

  1. A brief introduction to weakly supervised learning(简要介绍弱监督学习)

    by 南大周志华 摘要 监督学习技术通过学习大量训练数据来构建预测模型,其中每个训练样本都有其对应的真值输出.尽管现有的技术已经取得了巨大的成功,但值得注意的是,由于数据标注过程的高成本,很多任务很难 ...

  2. Machine Learning Algorithms Study Notes(2)--Supervised Learning

    Machine Learning Algorithms Study Notes 高雪松 @雪松Cedro Microsoft MVP 本系列文章是Andrew Ng 在斯坦福的机器学习课程 CS 22 ...

  3. Supervised Learning and Unsupervised Learning

    Supervised Learning In supervised learning, we are given a data set and already know what our correc ...

  4. 监督学习Supervised Learning

    In supervised learning, we are given a data set and already know what our correct output should look ...

  5. 论文笔记:A Review on Deep Learning Techniques Applied to Semantic Segmentation

    A Review on Deep Learning Techniques Applied to Semantic Segmentation 2018-02-22  10:38:12   1. Intr ...

  6. 学习笔记之Supervised Learning with scikit-learn | DataCamp

    Supervised Learning with scikit-learn | DataCamp https://www.datacamp.com/courses/supervised-learnin ...

  7. (转载)[机器学习] Coursera ML笔记 - 监督学习(Supervised Learning) - Representation

    [机器学习] Coursera ML笔记 - 监督学习(Supervised Learning) - Representation http://blog.csdn.net/walilk/articl ...

  8. 【RS】A review on deep learning for recommender systems: challenges and remedies- 推荐系统深度学习研究综述:挑战和补救措施

    [论文标题]A review on deep learning for recommender systems: challenges and remedies  (Artificial Intell ...

  9. Introduction - Supervised Learning

    摘要: 本文是吴恩达 (Andrew Ng)老师<机器学习>课程,第一章<绪论:初识机器学习>中第3课时<监督学习>的视频原文字幕.为本人在视频学习过程中逐字逐句记 ...

随机推荐

  1. How to use aws CloudFront for CDN

    How to use aws CloudFront for CDN 1. create a new distribution in AWS cloudfront service, select alt ...

  2. &lt&semi;Think Complexity&gt&semi; 用字典实现图

    今天在图书馆闲逛的时候偶然看见<Think Complexity>(复杂性思考)这本书,下午看了一会儿觉得很有意思.本书第二章讲的是用Python实现的图,特别写篇博客记录.   首先,图 ...

  3. RecycleView 滑动到底部,加载更多

    android.support.v7 包提供了一个新的组件:RecycleView,用以提供一个灵活的列表试图.显示大型数据集,它支持局部刷新.显示动画等功能,可以用来取代ListView与GridV ...

  4. C&plus;&plus;&lowbar;auto

    自动变量,自动获取类型,输出,泛型 自动变量,可以实现自动循环一维数组 自动循环的时候,对应的必须是常量 //auto自动变量,自动匹配类型 #include <iostream> usi ...

  5. 也说Javascript对象拷贝及疑问

    一.浅拷贝 当我们需要将一个对象拷贝至另一个对象时,我们一般会这么实现 function shadowCopy(source,target){ var target=target||{}; for(v ...

  6. python基本数据类型之字典

    python基本数据类型之字典 python中的字典是以键(key)值(value)对的形式储存数据,基本形式如下: d = {'Bart': 95, 'Michael': 34, 'Lisa': 5 ...

  7. 浅谈 DDoS 攻击与防御

    浅谈 DDoS 攻击与防御 原创: iMike 运维之美  什么是 DDoS DDoS 是英文 Distributed Denial of Service 的缩写,中文译作分布式拒绝服务.那什么又是拒 ...

  8. oracle之 安装 11G RAC 报 NTP failed

    问题描述: 使用 NTP 同步集群节点时间,安装 11G RAC 报 NTP 过不去. 解决过程:-- 查看 /etc/sysconfig/ntpd 文件配置root@hbdw1:/root$cat ...

  9. 感知器python

    感知器学习的目标是求得一个能够将训练集正实例点和负实例点·完全正确分开的分离超平面.即找到这超平面的参数w,b. 超平面定义 w*x+b=0 其中w是参数,x是数据.公式很好理解以二维平面为例,w有两 ...

  10. POJ 3686 &ast;最小费用流-转化成普通指派问题)

    题意] 有N个订单和M个机器,给出第i个订单在第j个机器完成的时间Mij,每台机器同一时刻只能处理一个订单,机器必须完整地完成一个订单后才能接着完成下一个订单.问N个订单完成时间的平均值最少为多少. ...