Comparing randomized search and grid search for hyperparameter estimation

Compare randomized search and grid search for optimizing hyperparameters of a random forest. All parameters that influence the learning are searched simultaneously (except for the number of estimators, which poses a time / quality tradeoff).

The randomized search and the grid search explore exactly the same space of parameters. The result in parameter settings is quite similar, while the run time for randomized search is drastically lower.

The performance is slightly worse for the randomized search, though this is most likely a noise effect and would not carry over to a held-out test set.

Note that in practice, one would not search over this many different parameters simultaneously using grid search, but pick only the ones deemed most important.

Python source code: randomized_search.py

print(__doc__)

import numpy as np

from time import time

from operator import itemgetter

from scipy.stats import randint as sp_randint

from sklearn.grid_search import GridSearchCV, RandomizedSearchCV

from sklearn.datasets import load_digits

from sklearn.ensemble import RandomForestClassifier

# get some data

iris = load_digits()

X, y = iris.data, iris.target

# build a classifier

clf = RandomForestClassifier(n_estimators=20)

# Utility function to report best scores

def report(grid_scores, n_top=3):

    top_scores = sorted(grid_scores, key=itemgetter(1), reverse=True)[:n_top]

    for i, score in enumerate(top_scores):

        print("Model with rank: {0}".format(i + 1))

        print("Mean validation score: {0:.3f} (std: {1:.3f})".format(

              score.mean_validation_score,

              np.std(score.cv_validation_scores)))

        print("Parameters: {0}".format(score.parameters))

        print("")

# specify parameters and distributions to sample from

param_dist = {"max_depth": [3, None],

              "max_features": sp_randint(1, 11),

              "min_samples_split": sp_randint(1, 11),

              "min_samples_leaf": sp_randint(1, 11),

              "bootstrap": [True, False],

              "criterion": ["gini", "entropy"]}

# run randomized search

n_iter_search = 20

random_search = RandomizedSearchCV(clf, param_distributions=param_dist,

                                   n_iter=n_iter_search)

start = time()

random_search.fit(X, y)

print("RandomizedSearchCV took %.2f seconds for %d candidates"

      " parameter settings." % ((time() - start), n_iter_search))

report(random_search.grid_scores_)

# use a full grid over all parameters

param_grid = {"max_depth": [3, None],

              "max_features": [1, 3, 10],

              "min_samples_split": [1, 3, 10],

              "min_samples_leaf": [1, 3, 10],

              "bootstrap": [True, False],

              "criterion": ["gini", "entropy"]}

# run grid search

grid_search = GridSearchCV(clf, param_grid=param_grid)

start = time()

grid_search.fit(X, y)

print("GridSearchCV took %.2f seconds for %d candidate parameter settings."

      % (time() - start, len(grid_search.grid_scores_)))

report(grid_search.grid_scores_)

Comparing randomized search and grid search for hyperparameter estimation的更多相关文章

3&period;2&period; Grid Search&colon; Searching for estimator parameters
3.2. Grid Search: Searching for estimator parameters Parameters that are not directly learnt within ...
scikit-learn：3&period;2&period; Grid Search&colon; Searching for estimator parameters
參考:http://scikit-learn.org/stable/modules/grid_search.html GridSearchCV通过(蛮力)搜索參数空间(參数的全部可能组合).寻找最好的 ...
Grid search in the tidyverse
@drsimonj here to share a tidyverse method of grid search for optimizing a model's hyperparameters. ...
How to Grid Search Hyperparameters for Deep Learning Models in Python With Keras
Hyperparameter optimization is a big part of deep learning. The reason is that neural networks are n ...
Grid Search学习
转自:https://www.cnblogs.com/ysugyl/p/8711205.html Grid Search:一种调参手段:穷举搜索:在所有候选的参数选择中,通过循环遍历,尝试每一种可能性 ...
grid search 超参数寻优
http://scikit-learn.org/stable/modules/grid_search.html 1. 超参数寻优方法 gridsearchCV 和 RandomizedSearchC ...
[转载]Grid Search
[转载]Grid Search 初学机器学习,之前的模型都是手动调参的,效果一般.同学和我说他用了一个叫grid search的方法.可以实现自动调参,顿时感觉非常高级.吃饭的时候想调参的话最差不过也 ...
【起航计划 032】2015 起航计划 Android APIDemo的魔鬼步伐 31 App-&gt&semi;Search-&gt&semi;Invoke Search 搜索功能 Search Dialog SearchView SearchRecentSuggestions
Search (搜索)是Android平台的一个核心功能之一,用户可以在手机搜索在线的或是本地的信息.Android平台为所有需要提供搜索或是查询功能的应用提供了一个统一的Search Framew ...
grid search
sklearn.metrics.make_scorer(score_func, greater_is_better=True, needs_proba=False, needs_threshold=F ...

随机推荐

Java调试
线上load高的问题排查步骤是: 先用top找到耗资源的进程 ps+grep找到对应的java进程/线程 jstack分析哪些线程阻塞了,阻塞在哪里 jstat看看FullGC频率 jmap看看有没有 ...
x264源代码概述框架分析架构分析
函数背景色函数在图中以方框的形式表现出来.不同的背景色标志了该函数不同的作用: 白色背景的函数:不加区分的普通内部函数. 浅红背景的函数:libx264类库的接口函数(API). 粉红色背景函数:滤 ...
mybatis中oracle in&gt&semi;1000的处理
oracle数据库中,如果你使用in,然后括号对应的是一个子查询,当查询出来的结果>1000的时候就会报错. 这个是数据库的规定,我们无法改变它. 如何解决这个问题呢? 现在我看到了三种解决方式 ...
A better SHOW TABLE STATUS
From command line we have the entire MySQL server on hands (if we have privileges too of course) but ...
hdu 2275 Kiki &amp&semi; Little Kiki 1
原题链接:http://acm.hdu.edu.cn/showproblem.php?pid=2275 题意:n个操作 Push 入容器 Pop弹出一个满足<=该数的最大的数(若没有输出No ...
用python3破解wingIDE
值得注意的是,python2的整除/在python3中变成了//,sha方法细化成了sha1和sha256,所以破解文件需要更改加密方式和整除部分的编码方式,经过修改后,这个文件可以完美演算出破解码, ...
Silverlight&&num;160&semi;中&&num;160&semi;TreeView&&num;160&semi;的数据绑定
方法一:Silverlight使用XAML标记语言来编写,如果不使用XAML强大的绑定功能,实在是罪过.通过使用绑定,可以将UI与视图模型层分离,有利于系统的维护.作为Silverlight中比较有代 ...
卸载重装Mysql
卸载重装前请备份数据库卸载 sudo apt autoremove --purge mysql-server-core-5.7 清理残留 sudo rm -r /var/lib/mysql* sud ...
centOS中mysql一些常用操作
安装mysqlyum -y install mysql-server 修改mysql配置vi /etc/my.cnf 这里会有很多需要注意的配置项,后面会有专门的笔记暂时修改一下编码(添加在密码下 ...
jQuery之jQuery扩展和事件
一.jQuery事件常用事件 blur([[data],fn]) 失去焦点 focus([[data],fn]) 获取焦点( 搜索框例子) change([[data],fn]) 当select下拉 ...