混淆矩阵在Matlab中PRtools模式识别工具箱的应用

声明：本文用到的代码均来自于PRTools(http://www.prtools.org)模式识别工具箱，并以matlab软件进行实验。

混淆矩阵是模式识别中的常用工具，在PRTools工具箱中有直接的函数confmat可供引用。具体使用方法如下所示：

  [C,NE,LABLIST] = CONFMAT(LAB1,LAB2,METHOD,FID)

  INPUT

   LAB1        Set of labels

   LAB2        Set of labels

   METHOD      'count' (default) to count number of co-occurences in

                  LAB1 and LAB2, 'disagreement' to count relative

                    non-co-occurrence.

   FID         Write text result to file

  OUTPUT

   C           Confusion matrix

   NE          Total number of errors (empty labels are neglected)

   LABLIST     Unique labels in LAB1 and LAB2

首先简单理解一些词语：

混淆矩阵在Matlab中PRtools模式识别工具箱的应用

TP(True Positive):被分类器正确分类的正元组。

TN(True Negative):被分类器正确分类的负元组。

FP(False Positive):被错误标记为正元组的负元组。

FN(False Negative):被错误标记为负元组的正元组。

TP与TN告诉我们分类器何时分类正确，FP与FN告诉我们分类器何时分类错误。

对一个M类的数据集，混淆矩阵（Confusion Matrix）是一个至少M×M的表，它的第i行第j列的数值表示为第i类的元组被标记为第j类的个数。

一个例子，以UCI数据集中的Ionosphere数据集为例，调用PRtools工具箱中的混淆矩阵函数：

（1）首先初始化ionosphere数据集合：

data=load('ionosphere.txt');

[m,k]=size(data);

data1=ones(m,k-);

for i=:k-

    data1(:,i)=(data(:,i)-min(data(:,i)))/(max(data(:,i))-min(data(:,i)));

end

label=data(:,k);

[Y,I]=min(label);

if Y()==

    for i=:m

           label(i)=label(i)+;

    end

end

a=dataset(data1,label);

（2）然后调用confmat.m函数：

[train,test]=gendat(a,0.5);

w=treec(train);

conf=confmat(test*w)

运行结果：
混淆矩阵在Matlab中PRtools模式识别工具箱的应用

conf就是混淆矩阵，其矩阵数值含义对应上述表格。

如果不想用PRtools工具箱中的混淆矩阵函数，可以直接自行编写混淆矩阵代码，如下所示，运行时可直接调用。

function [confmatrix] = cfmatrix(actual, predict, classlist, per)

% CFMATRIX calculates the confusion matrix for any prediction

% algorithm that generates a list of classes to which the test

% feature vectors are assigned

%

% Outputs: confusion matrix

%

%                 Actual Classes

%                   p       n

%              ___|_____|______|

%    Predicted  p'|     |      |

%      Classes  n'|     |      |

%

% Inputs:

% . actual / . predict

% The inputs provided are the 'actual' classes vector

% and the 'predict'ed classes vector. The actual classes are the classes

% to which the input feature vectors belong. The predicted classes are the

% class to which the input feature vectors are predicted to belong to,

% based on a prediction algorithm.

% The length of actual class vector and the predicted class vector need to

% be the same. If they are not the same, an error message is displayed.

% . classlist

% The third input provides the list of all the classes {p,n,...} for which

% the classification is being done. All classes are numbers.

% . per = / (default = )

% This parameter when set to  provides the values in the confusion matrix

% as percentages. The default provides the values in numbers.

%

% Example:

% >> a = [               ];

% >> b = [               ];

% >> Cf = cfmatrix(a, b);

%

% [Avinash Uppuluri: avinash_uv@yahoo.com: Last modified: //]

% If classlist not entered: make classlist equal to all

% unique elements of actual

if (nargin < )

   error('Not enough input arguments.');

elseif (nargin == )

    classlist = unique(actual); % default values from actual

    per = ; % default is numbers and input  for percentage

elseif (nargin == )

    per = ; % default is numbers and input  for percentage

end

if (length(actual) ~= length(predict))

    error('First two inputs need to be vectors with equal size.');

elseif ((size(actual,) ~= ) && (size(actual,) ~= ))

    error('First input needs to be a vector and not a matrix');

elseif ((size(predict,) ~= ) && (size(predict,) ~= ))

    error('Second input needs to be a vector and not a matrix');

end

format short g;

n_class = length(classlist);

line_two = '----------';

line_three = '_________|';

for i = :n_class

    obind_class_i = find(actual == classlist(i));

    prind_class_i = find(predict == classlist(i));

    confmatrix(i,i) = length(intersect(obind_class_i,prind_class_i));

    for j = :n_class

        %if (j ~= i)

        if (j < i)

        % observed j predicted i

        confmatrix(i,j) = length(find(actual(prind_class_i) == classlist(j)));

        % observed i predicted j

        confmatrix(j,i) = length(find(predict(obind_class_i) == classlist(j)));

        end

    end

    line_two = strcat(line_two,'---',num2str(classlist(i)),'-----');

    line_three = strcat(line_three,'__________');

end

if (per == )

    confmatrix = (confmatrix ./ length(actual)).*;

end

% output to screen

disp('------------------------------------------');

disp('             Actual Classes');

disp(line_two);

disp('Predicted|                     ');

disp('  Classes|                     ');

disp(line_three);

for i = :n_class

    temps = sprintf('       %d             ',i);

    for j = :n_class

    temps = strcat(temps,sprintf(' |    %2.1f    ',confmatrix(i,j)));

    end

    disp(temps);

    clear temps

end

disp('------------------------------------------');

混淆矩阵的概念其实很好理解，接下来引申几个很好理解的术语的概念(P:正元组数目，N：负元组数目)：
准确率：TP+TN/P+N

错误率：FP+FN/P+N

敏感度、召回率：TP/P

精度：TP/TP+FP

本文主要是从PRtools工具箱中混淆矩阵函数的使用来简单介绍了解混淆矩阵的概念，如有不正确的地方，欢迎指正。

秒客网

混淆矩阵在Matlab中PRtools模式识别工具箱的应用

相关文章