Notes : Chapter 1

时间:2021-11-19 03:28:25

<Hands-on ML with Sklearn & TF>  Chapter 1

  1. what is ml
    1. from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.
  2. what problems to solve
    1. exist solution but a lot of hand-tuning/rules
    2. no good solutions using a traditional approach
    3. fluctuating environment
    4. get insight about conplex problem and large data
  3. type
    1. whether or not trained with human supervision(supervised, unsupervised, semisupervised, reinforcement)
    2. whether or not learn incrementally on the fly(online, batch)
    3. whether or not work by simply comparing new data point vs know data point,or instance detect pattern in training data and build a prediction model(instace-based, model-based)
  4. (un)supervision learning
    1. supervision : include the desired solution called labels
      • classification,K-Nearest Neighbors, Linear Regression, Logistic Regression, SVM, Decision Trees, Random Forests, Neural network
    2. unsupervision : without labels
      • Clustering : k-means, HCA, ecpectation maximization
      • Viualization and dimensionality reducation : PCA, kernal PCA, LLE, t-SNE
      • Association rule learning : Apriori, Eclat
    3. semisupervision
      • unsupervision --> supervision
    4. reinforcement : an agent in context
      1. observe the environment
      2. select and perform action
      3. get rewards in return
  5. batch/online learning
    1. batch : offline, to known new data need to train a new version from scratch one the full dataset
    2. online : incremental learning : challenge is bad data
  6. instance-based/model-based
    1. instance-based : the system learns the examples by heart, then the generalizes to the new cases using a similarity measure
    2. model-based : studied the data; select a model; train it on the training data; applied the model to make predictions on new cases
  7. Challenge
    1. insufficient quantity of training data
    2. nonrepresentative training data
    3. poor-quality data
    4. irrelevant features : feature selection; feature extraction; creating new feature by gathering new data
    5. overfitting : regularization -> hyperparameter
    6. underfitting : powerful model; better feature; reduce construct
  8. Testing and Validating
    1. 80% of data for training 20% for testing
    2. validating : best model and hyperparameter for training set unliking perform as well on new data
      1. train multiple models with various hyperparameters using training data
      2. to get generatlization error , select the model and hyperparamaters that perform best on the validation set
    3. cross-validating : the training set is split into complementary subsets, ans each model is trained against a different conbination of thse subsets and validated against the remain parts.

   Example 1-1:

import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import sklearn.linear_model #load the data
oecd_bli = pd.read_csv("datasets/lifesat/oecd_bli_2015.csv",thousands=',')
gdp_per_capita = pd.read_csv("datasets/lifesat/gdp_per_capita.csv",thousands=',',delimiter='\t',encoding='latin1',na_values='n/a') #prepare the data
def prepare_country_stats(oecd_bli, gdp_per_capita):
#get the pandas dataframe of GDP per capita and Life satisfaction
oecd_bli = oecd_bli[oecd_bli["INEQUALITY"]=="TOT"]
oecd_bli = oecd_bli.pivot(index="Country", columns="Indicator", values="Value")
gdp_per_capita.rename(columns={"": "GDP per capita"}, inplace=True)
gdp_per_capita.set_index("Country", inplace=True)
full_country_stats = pd.merge(left=oecd_bli, right=gdp_per_capita, left_index=True, right_index=True)
return full_country_stats[["GDP per capita", 'Life satisfaction']] country_stats = prepare_country_stats(oecd_bli, gdp_per_capita)
#regularization remove_indices = [0, 1, 6, 8, 33, 34, 35]
country_stats.to_csv('country_stats.csv',encoding='utf-8')
X = np.c_[country_stats["GDP per capita"]]
Y = np.c_[country_stats["Life satisfaction"]] #Visualize the data
country_stats.plot(kind='scatter',x='GDP per capita',y='Life satisfaction') #Select a linear model
lin_reg_model = sklearn.linear_model.LinearRegression() #Train the model
lin_reg_model.fit(X, Y) #plot Regression model
t0, t1 = lin_reg_model.intercept_[0], lin_reg_model.coef_[0][0]
X = np.linspace(0, 110000, 1000)
plt.plot(X, t0 + t1 * X, "k")
plt.show() #Make a prediction for Cyprus
X_new=[[22587]]
print(lin_reg_model.predict(X_new))

      Notes : <Hands-on ML with Sklearn & TF> Chapter 1

课后练习挺好的

Notes : <Hands-on ML with Sklearn & TF> Chapter 1的更多相关文章

  1. Notes : &lt&semi;Hands-on ML with Sklearn &amp&semi; TF&gt&semi; Chapter 5

    .caret, .dropup > .btn > .caret { border-top-color: #000 !important; } .label { border: 1px so ...

  2. Notes : &lt&semi;Hands-on ML with Sklearn &amp&semi; TF&gt&semi; Chapter 7

    .caret, .dropup > .btn > .caret { border-top-color: #000 !important; } .label { border: 1px so ...

  3. Notes : &lt&semi;Hands-on ML with Sklearn &amp&semi; TF&gt&semi; Chapter 6

    .caret, .dropup > .btn > .caret { border-top-color: #000 !important; } .label { border: 1px so ...

  4. Notes : &lt&semi;Hands-on ML with Sklearn &amp&semi; TF&gt&semi; Chapter 4

    .caret, .dropup > .btn > .caret { border-top-color: #000 !important; } .label { border: 1px so ...

  5. Notes : &lt&semi;Hands-on ML with Sklearn &amp&semi; TF&gt&semi; Chapter 3

    Chapter 3-Classification .caret, .dropup > .btn > .caret { border-top-color: #000 !important; ...

  6. Book : &lt&semi;Hands-on ML with Sklearn &amp&semi; TF&gt&semi; pdf&sol;epub

    非常好的书,最近发现了pdf版本,链接:http://www.finelybook.com/hands-on-machine-learning-with-scikit-learn-and-tensor ...

  7. H5 Notes:PostMessage Cross-Origin Communication

    Javascript中要实现跨域通信,主要有window.name,jsonp,document.domain,cors等方法.不过在H5中有一种新的方法postMessage可以安全实现跨域通信,并 ...

  8. H5 Notes:Navigator Geolocation

    H5的地理位置API可以帮助我们来获取用户的地理位置,经纬度.海拔等,因此我们可以利用该API做天气应用.地图服务等. Geolocation对象是我们获取地理位置用到的对象. 首先判断浏览器是否支持 ...

  9. notes:spm多重比较校正

    SPM做完统计后,statistical table中的FDRc实际上是在该p-uncorrected下,可以令FDR-correcred p<=0.05的最小cluster中的voxel数目: ...

随机推荐

  1. mac osx 安装redis扩展

    1 php -v查看php版本 2 brew search php|grep redis 搜索对应的redis   ps:如果没有brew 就根据http://brew.sh安装 3 brew ins ...

  2. nginx basic auth 登陆验证模块

    #1. 新建一个pw.pl文件专门用来生成密码 #!/usr/bin/perl use strict; my $pw=$ARGV[0]; print crypt($pw,$pw)."\n&q ...

  3. NSArray 所有基础点示例

    #import <Foundation/Foundation.h> //排序算法,应用于 NSArray *arr=[arrs1 sortedArrayUsingFunction:sort ...

  4. mysql DDL语句

    sql语言分为三个级别. 1.ddl 语句 ,数据定义语句,定义了数据库.表.索引等对象的定义.常用语句包含:create.drop.alter. 2.dml 语句 ,数据操纵语句,用于添加.删除.更 ...

  5. 移动存储卡仍然用FAT32文件系统的真相

    微软在2001年就为自家的XP系统的本地磁盘默认使用了NTFS文件系统,但是12年之后,市面上的USB可移动设备和SD卡等外置存储器仍然在用着FAT32文件格式,这是什么理由让硬件厂商选择过时的文件系 ...

  6. 【高斯消元】兼 【期望dp】例题

    [总览] 高斯消元基本思想是将方程式的系数和常数化为矩阵,通过将矩阵通过行变换成为阶梯状(三角形),然后从小往上逐一求解. 如:$3X_1 + 2X_2 + 1X_3 = 3$ $           ...

  7. Spring MVC核心技术

    目录 异常处理 类型转换器 数据验证 文件上传与下载 拦截器 异常处理 Spring MVC中, 系统的DAO, Service, Controller层出现异常, 均通过throw Exceptio ...

  8. PVS桌面主镜像配置后,实际用户登录,配置未生效

    1.打开系统属性——高级——用户配置文件下的[设置] 2.打开用户配置文件,可以看到[复制]项灰化 3.使用windwows enable 工具启动上述灰化项,运行附件的exe文件后,任务栏出现下图标 ...

  9. &period;Net Core&lpar;三&rpar;MVC Core

    MVC Core的改动感觉挺大的,需要的功能大多从Nuget安装,还内置了IOC,支持SelfHost方式运行等等. 一.项目结构的变化创建的新MVC项目的结构发生了变化,比如:静态文件需要统一放置到 ...

  10. SQL Server 日期函数大全

    一.统计语句 1.--统计当前[>当天00点以后的数据] SELECT * FROM 表 WHERE CONVERT(Nvarchar, dateandtime, 111) = CONVERT( ...