从R中的交叉验证(训练)数据绘制ROC曲线

时间:2022-12-07 11:29:53

I would like to know if there is a way to plot the average ROC Curve from the cross-validation data of a SVM-RFE model generated with the caret package.

我想知道是否有一种方法可以从使用插入符包生成的SVM-RFE模型的交叉验证数据中绘制平均ROC曲线。

My results are:

我的结果是:

Recursive feature selection

Outer resampling method: Cross-Validated (10 fold, repeated 5 times) 

Resampling performance over subset size:

 Variables    ROC   Sens   Spec Accuracy  Kappa  ROCSD SensSD SpecSD AccuracySD KappaSD Selected
         1 0.6911 0.0000 1.0000   0.5900 0.0000 0.2186 0.0000 0.0000     0.0303  0.0000         
         2 0.7600 0.3700 0.8067   0.6280 0.1807 0.1883 0.3182 0.2139     0.1464  0.3295         
         3 0.7267 0.4233 0.8667   0.6873 0.3012 0.2020 0.3216 0.1905     0.1516  0.3447         
         4 0.6989 0.3867 0.8600   0.6680 0.2551 0.2130 0.3184 0.1793     0.1458  0.3336         
         5 0.7000 0.3367 0.8600   0.6473 0.2006 0.2073 0.3359 0.1793     0.1588  0.3672         
         6 0.7167 0.3833 0.8200   0.6427 0.2105 0.1909 0.3338 0.2539     0.1682  0.3639         
         7 0.7122 0.3767 0.8333   0.6487 0.2169 0.1784 0.3226 0.2048     0.1642  0.3702         
         8 0.7144 0.4233 0.7933   0.6440 0.2218 0.2017 0.3454 0.2599     0.1766  0.3770         
         9 0.8356 0.6533 0.7867   0.7300 0.4363 0.1706 0.3415 0.2498     0.1997  0.4209         
        10 0.8811 0.6867 0.8200   0.7647 0.5065 0.1650 0.3134 0.2152     0.1949  0.4053        *
        11 0.8700 0.6933 0.8133   0.7627 0.5046 0.1697 0.3183 0.2147     0.1971  0.4091         
        12 0.8678 0.6967 0.7733   0.7407 0.4682 0.1579 0.3153 0.2559     

...
The top 5 variables (out of 10):
   SumAverage_GLCM_R1SC4NG2, Variance_GLCM_R1SC4NG2, HGZE_GLSZM_R1SC4NG2, LGZE_GLSZM_R1SC4NG2, SZLGE_GLSZM_R1SC4NG2

I have tried with the solution mentioned here: ROC curve from training data in caret

我尝试过这里提到的解决方案:来自插入符号中训练数据的ROC曲线

optSize <- svmRFE_NG2$optsize
selectedIndices <- svmRFE_NG2$pred$Variables == optSize
plot.roc(svmRFE_NG2$pred$obs[selectedIndices],
         svmRFE_NG2$pred$LUNG[selectedIndices])

But this solution seems not to work (the resulting AUC value is quite different). I have separated the results of the training process into the 50 cross-validation sets, as mentioned in the previous answer, but I do not know what to do next.

但是这个解决方案似乎不起作用(产生的AUC值非常不同)。我已将训练过程的结果分为50个交叉验证集,如前面的答案所述,但我不知道下一步该怎么做。

resamples<-split(svmRFE_NG2$pred,svmRFE_NG2$pred$Variables)
resamplesFOLD<-split(resamples[[optSize]],resamples[[optSize]]$Resample)

Any ideas?

有任何想法吗?

1 个解决方案

#1


11  

As you already did you can a) enable savePredictions = T in the trainControl parameter of caret::train, then, b) from the trained model object, use the pred variable - which contains all predictions over all partitions and resamples - to compute whichever ROC curve you would like to look at. You now have multiple options of which ROC this can be, e.g.:

正如您已经做到的那样,您可以a)在caret :: train的trainControl参数中启用savePredictions = T,然后,b)从训练的模型对象中,使用pred变量 - 其中包含对所有分区和重新采样的所有预测 - 来计算其中的任何一个你想看的ROC曲线。你现在有多种选择可以使用ROC,例如:

you could look at all predictions over all partitions and resamples at once:

您可以一次查看所有分区和重新采样的所有预测:

plot(roc(predictor = modelObject$pred$CLASSNAME, response = modelObject$pred$obs))

Or you could do this over individual partitions and/or resamples (which is what you tried above). The following example computes the ROC curve per partition and resample, so with 10 partitions and 5 repeats will result in 50 ROC curves:

或者您可以在单个分区和/或重新采样(这是您在上面尝试过的)上执行此操作。以下示例计算每个分区的ROC曲线并重新采样,因此使用10个分区和5个重复将产生50个ROC曲线:

library(plyr)
l_ply(split(modelObject$pred, modelObject$pred$Resample), function(d) {
    plot(roc(predictor = d$CLASSNAME, response = d$obs))
})

Depending on your data and model, the latter will give you certain variance in the resulting ROC curves and AUC values. You can see the same variance in the AUC and SD values caret calculated for your individual partitions and resamples, so this results from your data and model and is correct.

根据您的数据和模型,后者会在得到的ROC曲线和AUC值中给出一定的差异。您可以在针对各个分区和重新采样计算的AUC和SD值插入符号中看到相同的方差,因此这是由您的数据和模型得出的,并且是正确的。

BTW: I was using the pROC::roc function for calculating the examples above, but you could use any suitable function here. And, when using caret::train obtaining the ROC is always the same, no matter the model type.

顺便说一句:我使用的是pROC :: roc函数来计算上面的例子,但你可以在这里使用任何合适的函数。并且,当使用caret :: train获取ROC时始终是相同的,无论模型类型如何。

#1


11  

As you already did you can a) enable savePredictions = T in the trainControl parameter of caret::train, then, b) from the trained model object, use the pred variable - which contains all predictions over all partitions and resamples - to compute whichever ROC curve you would like to look at. You now have multiple options of which ROC this can be, e.g.:

正如您已经做到的那样,您可以a)在caret :: train的trainControl参数中启用savePredictions = T,然后,b)从训练的模型对象中,使用pred变量 - 其中包含对所有分区和重新采样的所有预测 - 来计算其中的任何一个你想看的ROC曲线。你现在有多种选择可以使用ROC,例如:

you could look at all predictions over all partitions and resamples at once:

您可以一次查看所有分区和重新采样的所有预测:

plot(roc(predictor = modelObject$pred$CLASSNAME, response = modelObject$pred$obs))

Or you could do this over individual partitions and/or resamples (which is what you tried above). The following example computes the ROC curve per partition and resample, so with 10 partitions and 5 repeats will result in 50 ROC curves:

或者您可以在单个分区和/或重新采样(这是您在上面尝试过的)上执行此操作。以下示例计算每个分区的ROC曲线并重新采样,因此使用10个分区和5个重复将产生50个ROC曲线:

library(plyr)
l_ply(split(modelObject$pred, modelObject$pred$Resample), function(d) {
    plot(roc(predictor = d$CLASSNAME, response = d$obs))
})

Depending on your data and model, the latter will give you certain variance in the resulting ROC curves and AUC values. You can see the same variance in the AUC and SD values caret calculated for your individual partitions and resamples, so this results from your data and model and is correct.

根据您的数据和模型,后者会在得到的ROC曲线和AUC值中给出一定的差异。您可以在针对各个分区和重新采样计算的AUC和SD值插入符号中看到相同的方差,因此这是由您的数据和模型得出的,并且是正确的。

BTW: I was using the pROC::roc function for calculating the examples above, but you could use any suitable function here. And, when using caret::train obtaining the ROC is always the same, no matter the model type.

顺便说一句:我使用的是pROC :: roc函数来计算上面的例子,但你可以在这里使用任何合适的函数。并且,当使用caret :: train获取ROC时始终是相同的,无论模型类型如何。