来自插入符号中训练数据的ROC曲线

时间:2022-12-07 11:05:50

Using the R package caret, how can I generate a ROC curve based on the cross-validation results of the train() function?

使用R包插入符号,如何根据train()函数的交叉验证结果生成ROC曲线?

Say, I do the following:

说,我做以下事情:

data(Sonar)
ctrl <- trainControl(method="cv", 
  summaryFunction=twoClassSummary, 
  classProbs=T)
rfFit <- train(Class ~ ., data=Sonar, 
  method="rf", preProc=c("center", "scale"), 
  trControl=ctrl)

The training function goes over a range of mtry parameter and calculates the ROC AUC. I would like to see the associated ROC curve -- how do I do that?

训练函数超过一系列mtry参数并计算ROC AUC。我想看看相关的ROC曲线 - 我该怎么做?

Note: if the method used for sampling is LOOCV, then rfFit will contain a non-null data frame in the rfFit$pred slot, which seems to be exactly what I need. However, I need that for the "cv" method (k-fold validation) rather than LOO.

注意:如果用于采样的方法是LOOCV,那么rfFit将在rfFit $ pred插槽中包含一个非空数据帧,这似乎正是我所需要的。但是,我需要“cv”方法(k-fold验证)而不是LOO。

Also: no, roc function that used to be included in former versions of caret is not an answer -- this is a low level function, you can't use it if you don't have the prediction probabilities for each cross-validated sample.

另外:不,以前版本的插入符号中包含的roc函数不是答案 - 这是一个低级函数,如果您没有每个交叉验证样本的预测概率,则不能使用它。

2 个解决方案

#1


25  

There is just the savePredictions = TRUE argument missing from ctrl (this also works for other resampling methods):

ctrl中只缺少savePredictions = TRUE参数(这也适用于其他重采样方法):

library(caret)
library(mlbench)
data(Sonar)
ctrl <- trainControl(method="cv", 
                     summaryFunction=twoClassSummary, 
                     classProbs=T,
                     savePredictions = T)
rfFit <- train(Class ~ ., data=Sonar, 
               method="rf", preProc=c("center", "scale"), 
               trControl=ctrl)
library(pROC)
# Select a parameter setting
selectedIndices <- rfFit$pred$mtry == 2
# Plot:
plot.roc(rfFit$pred$obs[selectedIndices],
         rfFit$pred$M[selectedIndices])

来自插入符号中训练数据的ROC曲线

Maybe I am missing something, but a small concern is that train always estimates slightly different AUC values than plot.roc and pROC::auc (absolute difference < 0.005), although twoClassSummary uses pROC::auc to estimate the AUC. Edit: I assume this occurs because the ROC from train is the average of the AUC using the separate CV-Sets and here we are calculating the AUC over all resamples simultaneously to obtain the overall AUC.

也许我错过了一些东西,但是一个小问题是火车总是估计略微不同的AUC值而不是plot.roc和pROC :: auc(绝对差<0.005),尽管twoClassSummary使用pROC :: auc来估计AUC。编辑:我认为这是因为列车的ROC是使用单独的CV集的AUC的平均值,这里我们同时计算所有重采样的AUC以获得总AUC。

Update Since this is getting a bit of attention, here's a solution using plotROC::geom_roc() for ggplot2:

更新由于这引起了一些关注,这里是使用plotROC :: geom_roc()为ggplot2的解决方案:

library(ggplot2)
library(plotROC)
ggplot(rfFit$pred[selectedIndices, ], 
       aes(m = M, d = factor(obs, levels = c("R", "M")))) + 
    geom_roc(hjust = -0.4, vjust = 1.5) + coord_equal()

来自插入符号中训练数据的ROC曲线

#2


8  

Here, I'm modifying the plot of @thei1e which others may find helpful.

在这里,我正在修改@ thei1e的情节,其他人可能会觉得有帮助。

Train model and make predictions

训练模型并做出预测

library(caret)
library(ggplot2)
library(mlbench)
library(plotROC)

data(Sonar)

ctrl <- trainControl(method="cv", summaryFunction=twoClassSummary, classProbs=T,
                     savePredictions = T)

rfFit <- train(Class ~ ., data=Sonar, method="rf", preProc=c("center", "scale"), 
               trControl=ctrl)

# Select a parameter setting
selectedIndices <- rfFit$pred$mtry == 2

Updated ROC curve plot

更新了ROC曲线图

g <- ggplot(rfFit$pred[selectedIndices, ], aes(m=M, d=factor(obs, levels = c("R", "M")))) + 
  geom_roc(n.cuts=0) + 
  coord_equal() +
  style_roc()

g + annotate("text", x=0.75, y=0.25, label=paste("AUC =", round((calc_auc(g))$AUC, 4)))

来自插入符号中训练数据的ROC曲线

#1


25  

There is just the savePredictions = TRUE argument missing from ctrl (this also works for other resampling methods):

ctrl中只缺少savePredictions = TRUE参数(这也适用于其他重采样方法):

library(caret)
library(mlbench)
data(Sonar)
ctrl <- trainControl(method="cv", 
                     summaryFunction=twoClassSummary, 
                     classProbs=T,
                     savePredictions = T)
rfFit <- train(Class ~ ., data=Sonar, 
               method="rf", preProc=c("center", "scale"), 
               trControl=ctrl)
library(pROC)
# Select a parameter setting
selectedIndices <- rfFit$pred$mtry == 2
# Plot:
plot.roc(rfFit$pred$obs[selectedIndices],
         rfFit$pred$M[selectedIndices])

来自插入符号中训练数据的ROC曲线

Maybe I am missing something, but a small concern is that train always estimates slightly different AUC values than plot.roc and pROC::auc (absolute difference < 0.005), although twoClassSummary uses pROC::auc to estimate the AUC. Edit: I assume this occurs because the ROC from train is the average of the AUC using the separate CV-Sets and here we are calculating the AUC over all resamples simultaneously to obtain the overall AUC.

也许我错过了一些东西,但是一个小问题是火车总是估计略微不同的AUC值而不是plot.roc和pROC :: auc(绝对差<0.005),尽管twoClassSummary使用pROC :: auc来估计AUC。编辑:我认为这是因为列车的ROC是使用单独的CV集的AUC的平均值,这里我们同时计算所有重采样的AUC以获得总AUC。

Update Since this is getting a bit of attention, here's a solution using plotROC::geom_roc() for ggplot2:

更新由于这引起了一些关注,这里是使用plotROC :: geom_roc()为ggplot2的解决方案:

library(ggplot2)
library(plotROC)
ggplot(rfFit$pred[selectedIndices, ], 
       aes(m = M, d = factor(obs, levels = c("R", "M")))) + 
    geom_roc(hjust = -0.4, vjust = 1.5) + coord_equal()

来自插入符号中训练数据的ROC曲线

#2


8  

Here, I'm modifying the plot of @thei1e which others may find helpful.

在这里,我正在修改@ thei1e的情节,其他人可能会觉得有帮助。

Train model and make predictions

训练模型并做出预测

library(caret)
library(ggplot2)
library(mlbench)
library(plotROC)

data(Sonar)

ctrl <- trainControl(method="cv", summaryFunction=twoClassSummary, classProbs=T,
                     savePredictions = T)

rfFit <- train(Class ~ ., data=Sonar, method="rf", preProc=c("center", "scale"), 
               trControl=ctrl)

# Select a parameter setting
selectedIndices <- rfFit$pred$mtry == 2

Updated ROC curve plot

更新了ROC曲线图

g <- ggplot(rfFit$pred[selectedIndices, ], aes(m=M, d=factor(obs, levels = c("R", "M")))) + 
  geom_roc(n.cuts=0) + 
  coord_equal() +
  style_roc()

g + annotate("text", x=0.75, y=0.25, label=paste("AUC =", round((calc_auc(g))$AUC, 4)))

来自插入符号中训练数据的ROC曲线