如何在sklearn.metrics中为两个函数precision_recall_curve和roc_curve获取相同的阈值

时间:2022-12-08 18:15:09

I need to make a table with the TPR and FPR values, as well as precision and recall. I am using the roc_curve and precision_recall_curve functions from sklearn.metrics package in python. My problem is that every function give me a different vector for the thresholds, and I need only one, to merge the values as columns in a single table. Could anyone help me?

我需要制作一个包含TPR和FPR值的表格,以及精确度和召回率。我在python中使用sklearn.metrics包中的roc_curve和precision_recall_curve函数。我的问题是每个函数都为我提供了一个不同的阈值向量,我只需要一个,将值合并为一个表中的列。谁能帮助我?

Thanks in advance

提前致谢

1 个解决方案

#1


The threshold values have two major differences.

阈值有两个主要差异。

  1. The orders are different. roc_curve has thresholds in decreasing order, while precision_recall_curve has thresholds in increasing order.

    订单不同。 roc_curve具有递减顺序的阈值,而precision_recall_curve具有递增顺序的阈值。

  2. The numbers are different. In roc_curve, n_thresholds = len(np.unique(probas_pred)), while in precision_recall_curve the number n_thresholds = len(np.unique(probas_pred)) - 1. In the latter, the smallest threshold value from roc_curve is not included. In the same time, the last precision and recall values are 1. and 0. respectively with no corresponding threshold. Therefore, the numbers of items for tpr, fpr, precision and recall are the same.

    数字不同。在roc_curve中,n_thresholds = len(np.unique(probas_pred)),而在precision_recall_curve中,数字n_thresholds = len(np.unique(probas_pred)) - 1.在后者中,不包括来自roc_curve的最小阈值。同时,最后的精度和召回值分别为1.和0.没有相应的阈值。因此,tpr,fpr,精度和召回的项目数是相同的。

So, back to your question, how to make a table to include tpr, fpr, precision and recall with corresponding thresholds? Here are the steps:

那么,回到你的问题,如何使表格包括tpr,fpr,精度和召回与相应的阈值?以下是步骤:

  1. Discard the last precision and recall values
  2. 丢弃最后的精度和召回值

  3. Reverse the precision and recall values
  4. 反转精度和召回值

  5. Compute the precision and recall values corresponding to the lowest threshold value from the thresholds of roc_curve
  6. 根据roc_curve的阈值计算与最低阈值对应的精度和召回值

  7. Put all the values into the same table
  8. 将所有值放在同一个表中

#1


The threshold values have two major differences.

阈值有两个主要差异。

  1. The orders are different. roc_curve has thresholds in decreasing order, while precision_recall_curve has thresholds in increasing order.

    订单不同。 roc_curve具有递减顺序的阈值,而precision_recall_curve具有递增顺序的阈值。

  2. The numbers are different. In roc_curve, n_thresholds = len(np.unique(probas_pred)), while in precision_recall_curve the number n_thresholds = len(np.unique(probas_pred)) - 1. In the latter, the smallest threshold value from roc_curve is not included. In the same time, the last precision and recall values are 1. and 0. respectively with no corresponding threshold. Therefore, the numbers of items for tpr, fpr, precision and recall are the same.

    数字不同。在roc_curve中,n_thresholds = len(np.unique(probas_pred)),而在precision_recall_curve中,数字n_thresholds = len(np.unique(probas_pred)) - 1.在后者中,不包括来自roc_curve的最小阈值。同时,最后的精度和召回值分别为1.和0.没有相应的阈值。因此,tpr,fpr,精度和召回的项目数是相同的。

So, back to your question, how to make a table to include tpr, fpr, precision and recall with corresponding thresholds? Here are the steps:

那么,回到你的问题,如何使表格包括tpr,fpr,精度和召回与相应的阈值?以下是步骤:

  1. Discard the last precision and recall values
  2. 丢弃最后的精度和召回值

  3. Reverse the precision and recall values
  4. 反转精度和召回值

  5. Compute the precision and recall values corresponding to the lowest threshold value from the thresholds of roc_curve
  6. 根据roc_curve的阈值计算与最低阈值对应的精度和召回值

  7. Put all the values into the same table
  8. 将所有值放在同一个表中