类型错误:ufunc '减'不包含带有签名匹配类型dtype('

Strange error from numpy via matplotlib when trying to get a histogram of a tiny toy dataset. I'm just not sure how to interpret the error, which makes it hard to see what to do next.

当试图获取一个小玩具数据集的直方图时，numpy通过matplotlib产生了奇怪的错误。我只是不知道如何解释这个错误，这让我们很难知道下一步该怎么做。

Didn't find much related, though this nltk question and this gdsCAD question are superficially similar.

虽然这个nltk的问题和gdsCAD的问题表面上相似，但没有发现太多相关。

I intend the debugging info at bottom to be more helpful than the driver code, but if I've missed something, please ask. This is reproducible as part of an existing test suite.

我希望底层的调试信息比驱动程序代码更有用，但是如果我遗漏了什么，请询问。这是可复制的，作为现有测试套件的一部分。

        if n > 1:
            return diff(a[slice1]-a[slice2], n-1, axis=axis)
        else:
>           return a[slice1]-a[slice2]
E           TypeError: ufunc 'subtract' did not contain a loop with signature matching types dtype('<U1') dtype('<U1') dtype('<U1')

../py2.7.11-venv/lib/python2.7/site-packages/numpy/lib/function_base.py:1567: TypeError
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> entering PDB >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> py2.7.11-venv/lib/python2.7/site-packages/numpy/lib/function_base.py(1567)diff()
-> return a[slice1]-a[slice2]
(Pdb) bt
[...]
py2.7.11-venv/lib/python2.7/site-packages/matplotlib/axes/_axes.py(5678)hist()
-> m, bins = np.histogram(x[i], bins, weights=w[i], **hist_kwargs)
  py2.7.11-venv/lib/python2.7/site-packages/numpy/lib/function_base.py(606)histogram()
-> if (np.diff(bins) < 0).any():
> py2.7.11-venv/lib/python2.7/site-packages/numpy/lib/function_base.py(1567)diff()
-> return a[slice1]-a[slice2]
(Pdb) p numpy.__version__
'1.11.0'
(Pdb) p matplotlib.__version__
'1.4.3'
(Pdb) a
a = [u'A' u'B' u'C' u'D' u'E']
n = 1
axis = -1
(Pdb) p slice1
(slice(1, None, None),)
(Pdb) p slice2
(slice(None, -1, None),)
(Pdb)

3 个解决方案

#1

Why is it applying diff to an array of strings.

为什么要将diff应用到字符串数组中。

I get an error at the same point, though with a different message

在同一点上，我得到一个错误，尽管有不同的信息。

In [23]: a=np.array([u'A' u'B' u'C' u'D' u'E'])

In [24]: np.diff(a)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-24-9d5a62fc3ff0> in <module>()
----> 1 np.diff(a)

C:\Users\paul\AppData\Local\Enthought\Canopy\User\lib\site-packages\numpy\lib\function_base.pyc in diff(a, n, axis)
   1112         return diff(a[slice1]-a[slice2], n-1, axis=axis)
   1113     else:
-> 1114         return a[slice1]-a[slice2]
   1115 
   1116 

TypeError: unsupported operand type(s) for -: 'numpy.ndarray' and 'numpy.ndarray'

Is this a array the bins parameter? What does the docs say bins should be?

这是一个数组参数吗?医生说箱子应该是什么?

#2

I am fairly new to this myself, but I had a similar error and found that it is due to a type casting issue. I was trying to concatenate rather than take the difference but I think the principle is the same here. I provided a similar answer on another question so I hope that is OK.

我自己对此相当陌生，但我有一个类似的错误，发现这是由于类型转换问题。我试着把它们串联起来，而不是把它们区别开来，但我认为这里的原理是一样的。我在另一个问题上提供了类似的答案，所以我希望可以。

In essence you need to use a different data type cast, in my case I needed str not float, I suspect yours is the same so my suggested solution is. I am sorry I cannot test it before suggesting but I am unclear from your example what you were doing.

本质上，你需要使用不同的数据类型转换，在我的情况下，我需要str而不是float，我怀疑你的是相同的，所以我建议的解决方案是。很抱歉，我不能在建议之前进行测试，但我不清楚你在做什么。

return diff(str(a[slice1])-str(a[slice2]), n-1, axis=axis)

Please see my example code below for the fix to my code, the change occurs on the third to last line. The code is to produce a basic random forest model.

请参见下面的示例代码，以便修复我的代码，更改发生在第三条到最后一行。代码是生成一个基本的随机森林模型。

import scipy
import math
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn import preprocessing, metrics, cross_validation

Data = pd.read_csv("Free_Energy_exp.csv", sep=",")
Data = Data.fillna(Data.mean()) # replace the NA values with the mean of the descriptor
header = Data.columns.values # Ues the column headers as the descriptor labels
Data.head()
test_name = "Test.csv"

npArray = np.array(Data)
print header.shape
npheader = np.array(header[1:-1])
print("Array shape X = %d, Y = %d " % (npArray.shape))
datax, datay =  npArray.shape

names = npArray[:,0]
X = npArray[:,1:-1].astype(float)
y = npArray[:,-1] .astype(float)
X = preprocessing.scale(X)

XTrain, XTest, yTrain, yTest = cross_validation.train_test_split(X,y, random_state=0)

# Predictions results initialised 
RFpredictions = []
RF = RandomForestRegressor(n_estimators = 10, max_features = 5, max_depth = 5, random_state=0)
RF.fit(XTrain, yTrain)       # Train the model
print("Training R2 = %5.2f" % RF.score(XTrain,yTrain))
RFpreds = RF.predict(XTest)

with open(test_name,'a') as fpred :
    lenpredictions = len(RFpreds)
    lentrue = yTest.shape[0]
    if lenpredictions == lentrue :
            fpred.write("Names/Label,, Prediction Random Forest,, True Value,\n")
            for i in range(0,lenpredictions) :
                    fpred.write(RFpreds[i]+",,"+yTest[i]+",\n")
    else :
            print "ERROR - names, prediction and true value array size mismatch."

This leads to an error of;

这导致了一个错误;

Traceback (most recent call last):
  File "min_example.py", line 40, in <module>
    fpred.write(RFpreds[i]+",,"+yTest[i]+",\n")
TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('S32') dtype('S32') dtype('S32')

The solution is to make each variable a str() type on the third to last line then write to file. No other changes to then code have been made from the above.

解决方案是将每个变量都设置为一个str()类型，在第三个到最后一行，然后写入文件。上述代码没有其他更改。

import scipy
import math
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn import preprocessing, metrics, cross_validation

Data = pd.read_csv("Free_Energy_exp.csv", sep=",")
Data = Data.fillna(Data.mean()) # replace the NA values with the mean of the descriptor
header = Data.columns.values # Ues the column headers as the descriptor labels
Data.head()
test_name = "Test.csv"

npArray = np.array(Data)
print header.shape
npheader = np.array(header[1:-1])
print("Array shape X = %d, Y = %d " % (npArray.shape))
datax, datay =  npArray.shape

names = npArray[:,0]
X = npArray[:,1:-1].astype(float)
y = npArray[:,-1] .astype(float)
X = preprocessing.scale(X)

XTrain, XTest, yTrain, yTest = cross_validation.train_test_split(X,y, random_state=0)

# Predictions results initialised 
RFpredictions = []
RF = RandomForestRegressor(n_estimators = 10, max_features = 5, max_depth = 5, random_state=0)
RF.fit(XTrain, yTrain)       # Train the model
print("Training R2 = %5.2f" % RF.score(XTrain,yTrain))
RFpreds = RF.predict(XTest)

with open(test_name,'a') as fpred :
    lenpredictions = len(RFpreds)
    lentrue = yTest.shape[0]
    if lenpredictions == lentrue :
            fpred.write("Names/Label,, Prediction Random Forest,, True Value,\n")
            for i in range(0,lenpredictions) :
                    fpred.write(str(RFpreds[i])+",,"+str(yTest[i])+",\n")
    else :
            print "ERROR - names, prediction and true value array size mismatch."

These examples are from a larger code so I hope the examples are clear enough.

这些例子来自一个更大的代码，所以我希望这些例子足够清晰。

#3

I had a similar issue where an integer in a row of a dataframe I was iterating over was of type 'numpy.int64'. I got the "TypeError: ufunc 'subtract' did not contain a loop with signature matching types dtype('

我有一个类似的问题，在我正在迭代的dataframe的一行中，有一个整数类型为“numpy.int64”。我得到了“类型错误:ufunc '减去'没有包含有签名匹配类型dtype的循环(' '

Easiest fix for me was to convert the row using pd.to_numeric(row)

对我来说最简单的解决方法是使用pd.to_numeric(行)来转换该行

#1