简单的计算精度,回忆和F1-score在R

时间:2021-07-16 09:05:27

I am using an rpart classifier in R. The question is - I would want to test the trained classifier on a test data. This is fine - I can use the predict.rpart function.

我正在使用r中的rpart分类器,问题是——我想在测试数据上测试训练过的分类器。这很好,我可以用预测。rpart函数。

But I also want to calculate precision, recall and F1 score.

但我也想计算精度,回忆和F1分数。

My question is - do I have to write functions for those myself, or is there any function in R or any of CRAN libraries for that?

我的问题是,我必须为它们自己写函数吗,或者在R或者CRAN库中有函数吗?

6 个解决方案

#1


16  

The ROCR library calculates all these and more (see also http://rocr.bioinf.mpi-sb.mpg.de):

ROCR库计算了所有这些和更多(参见http://rocr.bioinf.mpi-sb.mpg.de):

library (ROCR);
...

y <- ... # logical array of positive / negative cases
predictions <- ... # array of predictions

pred <- prediction(predictions, y);

# Recall-Precision curve             
RP.perf <- performance(pred, "prec", "rec");

plot (RP.perf);

# ROC curve
ROC.perf <- performance(pred, "tpr", "fpr");
plot (ROC.perf);

# ROC area under the curve
auc.tmp <- performance(pred,"auc");
auc <- as.numeric(auc.tmp@y.values)

...

#2


15  

using the caret package:

使用脱字符号计划:

library(caret)

y <- ... # factor of positive / negative cases
predictions <- ... # factor of predictions

precision <- posPredValue(predictions, y, positive="1")
recall <- sensitivity(predictions, y, positive="1")

F1 <- (2 * precision * recall) / (precision + recall)

A generic function that works for binary and multi-class classification without using no package is:

一个适用于二进制和多类分类而不使用任何包的通用函数是:

f1_score <- function(predicted, expected, positive.class="1") {
    predicted <- factor(as.character(predicted), levels=unique(as.character(expected)))
    expected  <- as.factor(expected)
    cm = as.matrix(table(expected, predicted))

    precision <- diag(cm) / colSums(cm)
    recall <- diag(cm) / rowSums(cm)
    f1 <-  ifelse(precision + recall == 0, 0, 2 * precision * recall / (precision + recall))

    #Assuming that F1 is zero when it's not possible compute it
    f1[is.na(f1)] <- 0

    #Binary F1 or Multi-class macro-averaged F1
    ifelse(nlevels(expected) == 2, f1[positive.class], mean(f1))
}

Some comments about the function:

关于功能的一些评论:

  • It's assumed that an F1 = NA is zero
  • 假设F1 = NA为零
  • positive.class is used only in binary f1
  • 积极的。类只在二进制f1中使用
  • for multi-class problems, the macro-averaged F1 is computed
  • 对于多类问题,计算宏观平均F1值
  • If predicted and expected had different levels, predicted will receive the expected levels
  • 如果预测和预期有不同的水平,预测将收到预期的水平

#3


2  

I noticed the comment about F1 score being needed for binary classes. I suspect that it usually is. But a while ago I wrote this in which I was doing classification into several groups denoted by number. This may be of use to you...

我注意到关于二进制类需要F1分数的评论。我怀疑通常是这样。但不久前我写了这篇文章,我把它分成几个以数字表示的组。这可能对你有用……

calcF1Scores=function(act,prd){
  #treats the vectors like classes
  #act and prd must be whole numbers
  df=data.frame(act=act,prd=prd);
  scores=list();
  for(i in seq(min(act),max(act))){
    tp=nrow(df[df$prd==i & df$act==i,]);        
    fp=nrow(df[df$prd==i & df$act!=i,]);
    fn=nrow(df[df$prd!=i & df$act==i,]);
    f1=(2*tp)/(2*tp+fp+fn)
    scores[[i]]=f1;
  }      
  print(scores)
  return(scores);
}

print(mean(unlist(calcF1Scores(c(1,1,3,4,5),c(1,2,3,4,5)))))
print(mean(unlist(calcF1Scores(c(1,2,3,4,5),c(1,2,3,4,5)))))

#4


2  

We can simply get F1 value from caret's confusionMatrix function

我们可以从插入符号的混淆矩阵函数中得到F1值

result <- confusionMatrix(Prediction, Lable)

# View confusion matrix overall
result 

# F1 value
result$byClass[7] 

#5


1  

You can also use the confusionMatrix() provided by caret package. The output includes,between others, Sensitivity (also known as recall) and Pos Pred Value(also known as precision). Then F1 can be easily computed, as stated above, as: F1 <- (2 * precision * recall) / (precision + recall)

您还可以使用插入符号包提供的混淆矩阵()。输出包括灵敏度(也称为召回)和Pos Pred值(也称为精度)。那么F1可以很容易的计算出来,如上面所述,F1 <- (2 * precision * recall) / (precision + recall)

#6


1  

confusionMatrix() from caret package can be used along with a proper optional field "Positive" specifying which factor should be taken as positive factor.

从caret包中得到的混乱矩阵()可以与适当的可选字段“正”一起使用,指定应该将哪些因素作为积极因素。

confusionMatrix(predicted, Funded, mode = "prec_recall", positive="1")

This code will also give additional values such as F-statistic, Accuracy, etc.

此代码还将提供额外的值,如f - statistics、精确度等。

#1


16  

The ROCR library calculates all these and more (see also http://rocr.bioinf.mpi-sb.mpg.de):

ROCR库计算了所有这些和更多(参见http://rocr.bioinf.mpi-sb.mpg.de):

library (ROCR);
...

y <- ... # logical array of positive / negative cases
predictions <- ... # array of predictions

pred <- prediction(predictions, y);

# Recall-Precision curve             
RP.perf <- performance(pred, "prec", "rec");

plot (RP.perf);

# ROC curve
ROC.perf <- performance(pred, "tpr", "fpr");
plot (ROC.perf);

# ROC area under the curve
auc.tmp <- performance(pred,"auc");
auc <- as.numeric(auc.tmp@y.values)

...

#2


15  

using the caret package:

使用脱字符号计划:

library(caret)

y <- ... # factor of positive / negative cases
predictions <- ... # factor of predictions

precision <- posPredValue(predictions, y, positive="1")
recall <- sensitivity(predictions, y, positive="1")

F1 <- (2 * precision * recall) / (precision + recall)

A generic function that works for binary and multi-class classification without using no package is:

一个适用于二进制和多类分类而不使用任何包的通用函数是:

f1_score <- function(predicted, expected, positive.class="1") {
    predicted <- factor(as.character(predicted), levels=unique(as.character(expected)))
    expected  <- as.factor(expected)
    cm = as.matrix(table(expected, predicted))

    precision <- diag(cm) / colSums(cm)
    recall <- diag(cm) / rowSums(cm)
    f1 <-  ifelse(precision + recall == 0, 0, 2 * precision * recall / (precision + recall))

    #Assuming that F1 is zero when it's not possible compute it
    f1[is.na(f1)] <- 0

    #Binary F1 or Multi-class macro-averaged F1
    ifelse(nlevels(expected) == 2, f1[positive.class], mean(f1))
}

Some comments about the function:

关于功能的一些评论:

  • It's assumed that an F1 = NA is zero
  • 假设F1 = NA为零
  • positive.class is used only in binary f1
  • 积极的。类只在二进制f1中使用
  • for multi-class problems, the macro-averaged F1 is computed
  • 对于多类问题,计算宏观平均F1值
  • If predicted and expected had different levels, predicted will receive the expected levels
  • 如果预测和预期有不同的水平,预测将收到预期的水平

#3


2  

I noticed the comment about F1 score being needed for binary classes. I suspect that it usually is. But a while ago I wrote this in which I was doing classification into several groups denoted by number. This may be of use to you...

我注意到关于二进制类需要F1分数的评论。我怀疑通常是这样。但不久前我写了这篇文章,我把它分成几个以数字表示的组。这可能对你有用……

calcF1Scores=function(act,prd){
  #treats the vectors like classes
  #act and prd must be whole numbers
  df=data.frame(act=act,prd=prd);
  scores=list();
  for(i in seq(min(act),max(act))){
    tp=nrow(df[df$prd==i & df$act==i,]);        
    fp=nrow(df[df$prd==i & df$act!=i,]);
    fn=nrow(df[df$prd!=i & df$act==i,]);
    f1=(2*tp)/(2*tp+fp+fn)
    scores[[i]]=f1;
  }      
  print(scores)
  return(scores);
}

print(mean(unlist(calcF1Scores(c(1,1,3,4,5),c(1,2,3,4,5)))))
print(mean(unlist(calcF1Scores(c(1,2,3,4,5),c(1,2,3,4,5)))))

#4


2  

We can simply get F1 value from caret's confusionMatrix function

我们可以从插入符号的混淆矩阵函数中得到F1值

result <- confusionMatrix(Prediction, Lable)

# View confusion matrix overall
result 

# F1 value
result$byClass[7] 

#5


1  

You can also use the confusionMatrix() provided by caret package. The output includes,between others, Sensitivity (also known as recall) and Pos Pred Value(also known as precision). Then F1 can be easily computed, as stated above, as: F1 <- (2 * precision * recall) / (precision + recall)

您还可以使用插入符号包提供的混淆矩阵()。输出包括灵敏度(也称为召回)和Pos Pred值(也称为精度)。那么F1可以很容易的计算出来,如上面所述,F1 <- (2 * precision * recall) / (precision + recall)

#6


1  

confusionMatrix() from caret package can be used along with a proper optional field "Positive" specifying which factor should be taken as positive factor.

从caret包中得到的混乱矩阵()可以与适当的可选字段“正”一起使用,指定应该将哪些因素作为积极因素。

confusionMatrix(predicted, Funded, mode = "prec_recall", positive="1")

This code will also give additional values such as F-statistic, Accuracy, etc.

此代码还将提供额外的值,如f - statistics、精确度等。