将sklearn函数应用于熊猫dataframe会得到ValueError(“未知标签类型:%r”% y)

The following code gives an error message:

下面的代码给出了一个错误消息:

    >>> import pandas as pd
    >>> from sklearn import preprocessing, svm
    >>> df = pd.DataFrame({"a": [0,1,2], "b":[0,1,2], "c": [0,1,2]})
    >>> clf = svm.SVC()
    >>> df = df.apply(lambda x: preprocessing.scale(x))
    >>> clf.fit(df[["a", "b"]], df["c"])
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "C:\Users\Alexander\Anaconda\lib\site-packages\sklearn\svm\base.py", lin
     151, in fit
        y = self._validate_targets(y)
      File "C:\Users\Alexander\Anaconda\lib\site-packages\sklearn\svm\base.py", lin
     515, in _validate_targets
        check_classification_targets(y)
      File "C:\Users\Alexander\Anaconda\lib\site-packages\sklearn\utils\multiclass.
    y", line 173, in check_classification_targets
        raise ValueError("Unknown label type: %r" % y)
    ValueError: Unknown label type: 0   -1.224745
    1    0.000000
    2    1.224745
    Name: c, dtype: float64

The dtype of the pandas DataFrame is not an object, so applying the sklearn svm function should be fine, but for some reason it does not recognize the classification labels. What is causing this issue?

熊猫DataFrame的dtype不是对象，所以应用sklearn svm函数应该没问题，但是由于某种原因它不认识分类标签。是什么引起了这个问题?

1 个解决方案

#1

The issue is that after your scaling step, the labels are float-valued, which is not a valid label-type; if you convert to int or str it should work:

问题是，在缩放步骤之后，标签是浮动值的，这不是一个有效的标签类型;如果您转换为int或str，它应该工作:

In [32]: clf.fit(df[["a", "b"]], df["c"].astype(int))
Out[32]: 
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

#1

The issue is that after your scaling step, the labels are float-valued, which is not a valid label-type; if you convert to int or str it should work:

问题是，在缩放步骤之后，标签是浮动值的，这不是一个有效的标签类型;如果您转换为int或str，它应该工作:

In [32]: clf.fit(df[["a", "b"]], df["c"].astype(int))
Out[32]: 
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

秒客网

将sklearn函数应用于熊猫dataframe会得到ValueError(“未知标签类型:%r”% y)

1 个解决方案

#1

#1

相关文章