学习SVC决策和预测

时间:2021-07-04 23:56:38

I'm trying to understand the relationship between decision_function and predict, which are instance methods of SVC (http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html). So far I've gathered that decision function returns pairwise scores between classes. I was under the impression that predict chooses the class that maximizes its pairwise score, but I tested this out and got different results. Here's the code I was using to try and understand the relationship between the two. First I generated the pairwise score matrix, and then I printed out the class that has maximal pairwise score which was different than the class predicted by clf.predict.

我正在尝试理解decision_function和prediction之间的关系,它们是SVC的实例方法(http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html)。到目前为止,我已经收集到决策函数在类之间返回成对的分数。我的印象是,预测选择了能使其两两分数最大化的类,但我测试了这个,得到了不同的结果。下面是我用来尝试理解两者之间关系的代码。首先我生成了两两分数矩阵,然后我打印出了具有最大两两分数的类,它与clf. forecast预测的类不同。

        result = clf.decision_function(vector)[0]
        counter = 0
        num_classes = len(clf.classes_)
        pairwise_scores = np.zeros((num_classes, num_classes))
        for r in xrange(num_classes):
            for j in xrange(r + 1, num_classes):
                pairwise_scores[r][j] = result[counter]
                pairwise_scores[j][r] = -result[counter]
                counter += 1

        index = np.argmax(pairwise_scores)
        class = index_star / num_classes
        print class
        print clf.predict(vector)[0]

Does anyone know the relationship between these predict and decision_function?

有人知道这些预测和决策函数之间的关系吗?

5 个解决方案

#1


18  

I don't fully understand your code, but let's go trough the example of the documentation page you referenced:

我不完全理解您的代码,但是让我们看看您所引用的文档页面的示例:

import numpy as np
X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
y = np.array([1, 1, 2, 2])
from sklearn.svm import SVC
clf = SVC()
clf.fit(X, y) 

Now let's apply both the decision function and predict to the samples:

现在让我们将决策函数和预测应用于样本:

clf.decision_function(X)
clf.predict(X)

The output we get is:

我们得到的输出是:

array([[-1.00052254],
       [-1.00006594],
       [ 1.00029424],
       [ 1.00029424]])
array([1, 1, 2, 2])

And that is easy to interpret: The desion function tells us on which side of the hyperplane generated by the classifier we are (and how far we are away from it). Based on that information, the estimator then label the examples with the corresponding label.

这很容易解释:desion函数告诉我们,我们在hyperplane的哪一边(以及我们离它有多远)。基于这些信息,估计器然后用相应的标签来标记这些例子。

#2


13  

When you call decision_function(), you get the output from each of the pairwise classifiers (n*(n-1)/2 numbers total). See pages 127 and 128 of "Support Vector Machines for Pattern Classification".

当您调用decision_function()时,您将从每个成对分类器(n*(n *(n-1))/2个数字总数)中获得输出。参见“支持向量机进行模式分类”的第127和128页。

Each classifier puts in a vote as to what the correct answer is (based on the sign of the output of that classifier); predict() returns the class with the most votes.

每个分类器对正确答案进行投票(基于该分类器输出的符号);predict()返回得票最多的类。

#3


10  

For those interested, I'll post a quick example of the predict function translated from C++ (here) to python:

对于那些感兴趣的人,我将发布一个由c++(这里)翻译成python的预测函数的快速示例:

# I've only implemented the linear and rbf kernels
def kernel(params, sv, X):
    if params.kernel == 'linear':
        return [np.dot(vi, X) for vi in sv]
    elif params.kernel == 'rbf':
        return [math.exp(-params.gamma * np.dot(vi - X, vi - X)) for vi in sv]

# This replicates clf.decision_function(X)
def decision_function(params, sv, nv, a, b, X):
    # calculate the kernels
    k = kernel(params, sv, X)

    # define the start and end index for support vectors for each class
    start = [sum(nv[:i]) for i in range(len(nv))]
    end = [start[i] + nv[i] for i in range(len(nv))]

    # calculate: sum(a_p * k(x_p, x)) between every 2 classes
    c = [ sum(a[ i ][p] * k[p] for p in range(start[j], end[j])) +
          sum(a[j-1][p] * k[p] for p in range(start[i], end[i]))
                for i in range(len(nv)) for j in range(i+1,len(nv))]

    # add the intercept
    return [sum(x) for x in zip(c, b)]

# This replicates clf.predict(X)
def predict(params, sv, nv, a, b, cs, X):
    ''' params = model parameters
        sv = support vectors
        nv = # of support vectors per class
        a  = dual coefficients
        b  = intercepts 
        cs = list of class names
        X  = feature to predict       
    '''
    decision = decision_function(params, sv, nv, a, b, X)
    votes = [(i if decision[p] > 0 else j) for p,(i,j) in enumerate((i,j) 
                                           for i in range(len(cs))
                                           for j in range(i+1,len(cs)))]

    return cs[max(set(votes), key=votes.count)]

There are a lot of input arguments for predict and decision_function, but note that these are all used internally in by the model when calling predict(X). In fact, all of the arguments are accessible to you inside the model after fitting:

有许多输入参数用于预测和decision_function,但是请注意,在调用prediction (X)时,这些参数都是模型内部使用的。事实上,所有的参数在拟合后都可以在模型中找到:

# Create model
clf = svm.SVC(gamma=0.001, C=100.)

# Fit model using features, X, and labels, Y.
clf.fit(X, y)

# Get parameters from model
params = clf.get_params()
sv = clf.support_vectors
nv = clf.n_support_
a  = clf.dual_coef_
b  = clf._intercept_
cs = clf.classes_

# Use the functions to predict
print(predict(params, sv, nv, a, b, cs, X))

# Compare with the builtin predict
print(clf.predict(X))

#4


2  

There's a really nice Q&A for the multi-class one-vs-one scenario at datascience.sx:

在datascience.sx上有一个非常好的多类one-vs-one场景的问答。

Question

I have a multiclass SVM classifier with labels 'A', 'B', 'C', 'D'.

我有一个多类SVM分类器,标签为' a ', 'B', 'C', 'D'。

This is the code I'm running:

这是我正在运行的代码:

>>>print clf.predict([predict_this])
['A']
>>>print clf.decision_function([predict_this])
[[ 185.23220833   43.62763596  180.83305074  -93.58628288   62.51448055  173.43335293]]

How can I use the output of decision function to predict the class (A/B/C/D) with the highest probability and if possible, it's value? I have visited https://*.com/a/20114601/7760998 but it is for binary classifiers and could not find a good resource which explains the output of decision_function for multiclass classifiers with shape ovo (one-vs-one).

如何使用决策函数的输出来预测概率最大的类(A/B/C/D),如果可能的话,是值?我访问了https://*.com/a/20114601/7760998,但它是用于二进制分类器的,并且找不到一个好的资源来解释多类分类器的decision_function的输出(一个-vs- 1)。

Edit:

编辑:

The above example is for class 'A'. For another input the classifier predicted 'C' and gave the following result in decision_function

上面的例子是关于类A的。对于另一个输入,分类器预测“C”并在decision_function中给出以下结果

[[ 96.42193513 -11.13296606 111.47424538 -88.5356536 44.29272494 141.0069203 ]]

For another different input which the classifier predicted as 'C' gave the following result from decision_function,

对于分类器预测为“C”的另一个不同输入,decision_function给出以下结果,

[[ 290.54180354 -133.93467605  116.37068951 -392.32251314 -130.84421412   284.87653043]]

Had it been ovr (one-vs-rest), it would become easier by selecting the one with higher value, but in ovo (one-vs-one) there are (n * (n - 1)) / 2 values in the resulting list.

如果它是ovr (1 -vs-rest),那么通过选择值更高的值会变得更容易,但是在ovo (1 -vs- 1)中,结果列表中有(n * (n - 1)) / 2个值。

How to deduce which class would be selected based on the decision function?

如何根据决策函数推断选择哪个类?

Answer

Your link has sufficient resources, so let's go through:

你的链接有足够的资源,让我们浏览一下:

When you call decision_function(), you get the output from each of the pairwise classifiers (n*(n-1)/2 numbers total). See pages 127 and 128 of "Support Vector Machines for Pattern Classification".

当您调用decision_function()时,您将从每个成对分类器(n*(n *(n-1))/2个数字总数)中获得输出。参见“支持向量机进行模式分类”的第127和128页。

Click on the "page 127 and 128" link (not shown here, but in the * answer). You should see:

单击“第127页和128页”链接(这里没有显示,但是在*答案中)。您应该看到:

学习SVC决策和预测

  • Python's SVM implementation uses one-vs-one. That's exactly what the book is talking about.
  • Python的SVM实现使用的是one-vs-one。这本书就是这么说的。
  • For each pairwise comparison, we measure the decision function
  • 对于每对比较,我们测量决策函数
  • The decision function is the just the regular binary SVM decision boundary
  • 决策函数是普通的二元SVM决策边界。

What does that to do with your question?

这和你的问题有什么关系?

  • clf.decision_function() will give you the $D$ for each pairwise comparison
  • clf.decision_function()将为每对比较提供$D$
  • The class with the most votes win
  • 票数最多的班级获胜。

For instance,

例如,

[[ 96.42193513 -11.13296606 111.47424538 -88.5356536 44.29272494 141.0069203 ]]

[[[96.4193513 -11.13296606 111.47424538 -88.5356536 44.29272494 141.0069203]

is comparing:

比较:

[AB, AC, AD, BC, BD, CD]

[AB, AC, AD, BC, BD, CD]

We label each of them by the sign. We get:

我们给他们每个人都贴上标签。我们得到:

[A, C, A, C, B, C]

[A, C, A, C, B, C]

For instance, 96.42193513 is positive and thus A is the label for AB.

例如,96.42193513是正数,因此A是AB的标签。

Now we have three C, C would be your prediction. If you repeat my procedure for the other two examples, you will get Python's prediction. Try it!

现在我们有3个C, C是你的预测。如果您对另外两个示例重复我的过程,您将得到Python的预测。试一试!

#5


0  

They probably have a bit complicated mathematical relation. But if you use the decision_function in LinearSVC classifier, the relation between those two will be more clear! Because then decision_function will give you scores for each class label (not same as SVC) and predict will give the class with the best score.

它们可能有一些复杂的数学关系。但是如果在LinearSVC分类器中使用decision_function,那么两者之间的关系就会更加清晰!因为decision_function会为每个类标签(与SVC不同)提供分数,并预测会给类以最好的分数。

#1


18  

I don't fully understand your code, but let's go trough the example of the documentation page you referenced:

我不完全理解您的代码,但是让我们看看您所引用的文档页面的示例:

import numpy as np
X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
y = np.array([1, 1, 2, 2])
from sklearn.svm import SVC
clf = SVC()
clf.fit(X, y) 

Now let's apply both the decision function and predict to the samples:

现在让我们将决策函数和预测应用于样本:

clf.decision_function(X)
clf.predict(X)

The output we get is:

我们得到的输出是:

array([[-1.00052254],
       [-1.00006594],
       [ 1.00029424],
       [ 1.00029424]])
array([1, 1, 2, 2])

And that is easy to interpret: The desion function tells us on which side of the hyperplane generated by the classifier we are (and how far we are away from it). Based on that information, the estimator then label the examples with the corresponding label.

这很容易解释:desion函数告诉我们,我们在hyperplane的哪一边(以及我们离它有多远)。基于这些信息,估计器然后用相应的标签来标记这些例子。

#2


13  

When you call decision_function(), you get the output from each of the pairwise classifiers (n*(n-1)/2 numbers total). See pages 127 and 128 of "Support Vector Machines for Pattern Classification".

当您调用decision_function()时,您将从每个成对分类器(n*(n *(n-1))/2个数字总数)中获得输出。参见“支持向量机进行模式分类”的第127和128页。

Each classifier puts in a vote as to what the correct answer is (based on the sign of the output of that classifier); predict() returns the class with the most votes.

每个分类器对正确答案进行投票(基于该分类器输出的符号);predict()返回得票最多的类。

#3


10  

For those interested, I'll post a quick example of the predict function translated from C++ (here) to python:

对于那些感兴趣的人,我将发布一个由c++(这里)翻译成python的预测函数的快速示例:

# I've only implemented the linear and rbf kernels
def kernel(params, sv, X):
    if params.kernel == 'linear':
        return [np.dot(vi, X) for vi in sv]
    elif params.kernel == 'rbf':
        return [math.exp(-params.gamma * np.dot(vi - X, vi - X)) for vi in sv]

# This replicates clf.decision_function(X)
def decision_function(params, sv, nv, a, b, X):
    # calculate the kernels
    k = kernel(params, sv, X)

    # define the start and end index for support vectors for each class
    start = [sum(nv[:i]) for i in range(len(nv))]
    end = [start[i] + nv[i] for i in range(len(nv))]

    # calculate: sum(a_p * k(x_p, x)) between every 2 classes
    c = [ sum(a[ i ][p] * k[p] for p in range(start[j], end[j])) +
          sum(a[j-1][p] * k[p] for p in range(start[i], end[i]))
                for i in range(len(nv)) for j in range(i+1,len(nv))]

    # add the intercept
    return [sum(x) for x in zip(c, b)]

# This replicates clf.predict(X)
def predict(params, sv, nv, a, b, cs, X):
    ''' params = model parameters
        sv = support vectors
        nv = # of support vectors per class
        a  = dual coefficients
        b  = intercepts 
        cs = list of class names
        X  = feature to predict       
    '''
    decision = decision_function(params, sv, nv, a, b, X)
    votes = [(i if decision[p] > 0 else j) for p,(i,j) in enumerate((i,j) 
                                           for i in range(len(cs))
                                           for j in range(i+1,len(cs)))]

    return cs[max(set(votes), key=votes.count)]

There are a lot of input arguments for predict and decision_function, but note that these are all used internally in by the model when calling predict(X). In fact, all of the arguments are accessible to you inside the model after fitting:

有许多输入参数用于预测和decision_function,但是请注意,在调用prediction (X)时,这些参数都是模型内部使用的。事实上,所有的参数在拟合后都可以在模型中找到:

# Create model
clf = svm.SVC(gamma=0.001, C=100.)

# Fit model using features, X, and labels, Y.
clf.fit(X, y)

# Get parameters from model
params = clf.get_params()
sv = clf.support_vectors
nv = clf.n_support_
a  = clf.dual_coef_
b  = clf._intercept_
cs = clf.classes_

# Use the functions to predict
print(predict(params, sv, nv, a, b, cs, X))

# Compare with the builtin predict
print(clf.predict(X))

#4


2  

There's a really nice Q&A for the multi-class one-vs-one scenario at datascience.sx:

在datascience.sx上有一个非常好的多类one-vs-one场景的问答。

Question

I have a multiclass SVM classifier with labels 'A', 'B', 'C', 'D'.

我有一个多类SVM分类器,标签为' a ', 'B', 'C', 'D'。

This is the code I'm running:

这是我正在运行的代码:

>>>print clf.predict([predict_this])
['A']
>>>print clf.decision_function([predict_this])
[[ 185.23220833   43.62763596  180.83305074  -93.58628288   62.51448055  173.43335293]]

How can I use the output of decision function to predict the class (A/B/C/D) with the highest probability and if possible, it's value? I have visited https://*.com/a/20114601/7760998 but it is for binary classifiers and could not find a good resource which explains the output of decision_function for multiclass classifiers with shape ovo (one-vs-one).

如何使用决策函数的输出来预测概率最大的类(A/B/C/D),如果可能的话,是值?我访问了https://*.com/a/20114601/7760998,但它是用于二进制分类器的,并且找不到一个好的资源来解释多类分类器的decision_function的输出(一个-vs- 1)。

Edit:

编辑:

The above example is for class 'A'. For another input the classifier predicted 'C' and gave the following result in decision_function

上面的例子是关于类A的。对于另一个输入,分类器预测“C”并在decision_function中给出以下结果

[[ 96.42193513 -11.13296606 111.47424538 -88.5356536 44.29272494 141.0069203 ]]

For another different input which the classifier predicted as 'C' gave the following result from decision_function,

对于分类器预测为“C”的另一个不同输入,decision_function给出以下结果,

[[ 290.54180354 -133.93467605  116.37068951 -392.32251314 -130.84421412   284.87653043]]

Had it been ovr (one-vs-rest), it would become easier by selecting the one with higher value, but in ovo (one-vs-one) there are (n * (n - 1)) / 2 values in the resulting list.

如果它是ovr (1 -vs-rest),那么通过选择值更高的值会变得更容易,但是在ovo (1 -vs- 1)中,结果列表中有(n * (n - 1)) / 2个值。

How to deduce which class would be selected based on the decision function?

如何根据决策函数推断选择哪个类?

Answer

Your link has sufficient resources, so let's go through:

你的链接有足够的资源,让我们浏览一下:

When you call decision_function(), you get the output from each of the pairwise classifiers (n*(n-1)/2 numbers total). See pages 127 and 128 of "Support Vector Machines for Pattern Classification".

当您调用decision_function()时,您将从每个成对分类器(n*(n *(n-1))/2个数字总数)中获得输出。参见“支持向量机进行模式分类”的第127和128页。

Click on the "page 127 and 128" link (not shown here, but in the * answer). You should see:

单击“第127页和128页”链接(这里没有显示,但是在*答案中)。您应该看到:

学习SVC决策和预测

  • Python's SVM implementation uses one-vs-one. That's exactly what the book is talking about.
  • Python的SVM实现使用的是one-vs-one。这本书就是这么说的。
  • For each pairwise comparison, we measure the decision function
  • 对于每对比较,我们测量决策函数
  • The decision function is the just the regular binary SVM decision boundary
  • 决策函数是普通的二元SVM决策边界。

What does that to do with your question?

这和你的问题有什么关系?

  • clf.decision_function() will give you the $D$ for each pairwise comparison
  • clf.decision_function()将为每对比较提供$D$
  • The class with the most votes win
  • 票数最多的班级获胜。

For instance,

例如,

[[ 96.42193513 -11.13296606 111.47424538 -88.5356536 44.29272494 141.0069203 ]]

[[[96.4193513 -11.13296606 111.47424538 -88.5356536 44.29272494 141.0069203]

is comparing:

比较:

[AB, AC, AD, BC, BD, CD]

[AB, AC, AD, BC, BD, CD]

We label each of them by the sign. We get:

我们给他们每个人都贴上标签。我们得到:

[A, C, A, C, B, C]

[A, C, A, C, B, C]

For instance, 96.42193513 is positive and thus A is the label for AB.

例如,96.42193513是正数,因此A是AB的标签。

Now we have three C, C would be your prediction. If you repeat my procedure for the other two examples, you will get Python's prediction. Try it!

现在我们有3个C, C是你的预测。如果您对另外两个示例重复我的过程,您将得到Python的预测。试一试!

#5


0  

They probably have a bit complicated mathematical relation. But if you use the decision_function in LinearSVC classifier, the relation between those two will be more clear! Because then decision_function will give you scores for each class label (not same as SVC) and predict will give the class with the best score.

它们可能有一些复杂的数学关系。但是如果在LinearSVC分类器中使用decision_function,那么两者之间的关系就会更加清晰!因为decision_function会为每个类标签(与SVC不同)提供分数,并预测会给类以最好的分数。