[优达 机器学习入门]课程5:选择你自己的算法

时间:2023-02-13 00:17:35

KNN(classic,simple,easy to understand)

from sklearn.neighbors import KNeighborsClassifier
clf = KNeighborsClassifier()
clf.fit(features_train, labels_train) 
clf.predict(features_test)
acc = clf.score(features_test, labels_test)

[优达 机器学习入门]课程5:选择你自己的算法

n_neighbors : Number of neighbors to use by default for kneighbors queries.

algorithm : {‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}

[优达 机器学习入门]课程5:选择你自己的算法


AdaBoost(ensemble method)

from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
clf = AdaBoostClassifier(DecisionTreeClassifier(max_depth=1))
clf.fit(features_train, labels_train)
pred = clf.predict(features_test)
accuracy = clf.score(features_test, labels_test)

[优达 机器学习入门]课程5:选择你自己的算法

base_estimator : object, optional (default=DecisionTreeClassifier)

n_estimators
The maximum number of estimators at which boosting is terminated. In case of perfect fit, the learning procedure is stopped early.

algorithm : {‘SAMME’, ‘SAMME.R’}, optional (default=’SAMME.R’)

If ‘SAMME.R’ then use the SAMME.R real boosting algorithm. base_estimator must support calculation of class probabilities. If ‘SAMME’ then use the SAMME discrete boosting algorithm. The SAMME.R algorithm typically converges faster than SAMME, achieving a lower test error with fewer boosting iterations.

[优达 机器学习入门]课程5:选择你自己的算法


Random Forest

from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier()
clf.fit(features_train, labels_train)
pred = clf.predict(features_test)
accuracy = clf.score(features_test, labels_test)

[优达 机器学习入门]课程5:选择你自己的算法

n_estimators : integer, optional (default=10)
The number of trees in the forest.

[优达 机器学习入门]课程5:选择你自己的算法