PolynomialFeatures LinearRegression ValueError：未对齐的形状

I am trying to write a function which trains and tests a LinearRegression with PolynomialFeatures. Here is my code:

我正在尝试编写一个函数,用PolynomialFeatures训练和测试LinearRegression。这是我的代码:

def get_lr2(pdeg):
  from sklearn.linear_model import LinearRegression
  from sklearn.preprocessing import PolynomialFeatures
  from sklearn.metrics.regression import r2_score
  from sklearn.model_selection import train_test_split
  import numpy as np
  import pandas as pd

  np.random.seed(0)
  n = 15
  x = np.linspace(0,10,n) + np.random.randn(n)/5
  y = np.sin(x)+x/6 + np.random.randn(n)/10
  X_train, X_test, y_train, y_test = train_test_split(x, y, random_state=0)
  test_data = np.linspace(0,10,100).reshape(100,1)
  X_trainT     = X_train.reshape(-1,1)
  y_trainT     = y_train.reshape(-1,1)
  poly = PolynomialFeatures(degree=pdeg)
  X_poly = poly.fit_transform(X_trainT)
  X_train1, X_test1, y_train1, y_test1 = train_test_split(X_poly, y_trainT, random_state = 0)
  linreg1 = LinearRegression().fit(X_train1, y_train1)
  return linreg1.predict(test_data)

When I call the function (get_lr2(1)) I am getting

当我调用函数(get_lr2(1))时,我得到了

  -------------------------------------------------------------------------
  ValueError                                Traceback (most recent call last)
  ---> 84 get_lr2(1)

  <ipython-input-29-a9966181155e> in get_lr2(pdeg)
  23     X_train1, X_test1, y_train1, y_test1 = train_test_split(X_poly, y_trainT, random_state = 0)
  24     linreg1 = LinearRegression().fit(X_train1, y_train1)
  ---> 25     return linreg1.predict(test_data)

  ValueError: shapes (100,1) and (2,1) not aligned: 1 (dim 1) != 2 (dim 0)

Can you help?

你能帮我吗?

1 个解决方案

#1

Your code is rather strange. Let's try to reformat it in several ways:

你的代码很奇怪。让我们尝试以几种方式重新格式化它:

Train_test _split.

You doing train_test_split and then throw away your test set and create another one. This is rather strange. If you want that your train test splits sizes were in proportion 15/100 come from, just set this in train_test_split option. So test size should be 100/(100+15) ~= 0.87.

你做了train_test_split,然后扔掉你的测试集并创建另一个。这很奇怪。如果你想要你的火车测试分裂大小是15/100的比例来自,只需在train_test_split选项中设置它。因此测试大小应为100 /(100 + 15)〜= 0.87。
Preprocessing.

If you want to apply some preprocessing(polynomial features here) transformers you can apply them to whole dataset, not some split. This is not true if transformer is dependent to data(in that case you must do fit_transform on train set and then only transform on test set) but in your case it does not matter.

如果您想应用一些预处理(此处为多项式特征)变换器,您可以将它们应用于整个数据集,而不是一些拆分。如果变压器依赖于数据(在这种情况下,您必须在列车组上执行fit_transform,然后仅在测试集上进行转换),则情况并非如此,但在您的情况下,这并不重要。
Reshape.

After our improvements you should do reshape only in one place - while initializing x. Scikit learn models expect your X data to be matrix or column-vector(if only one feature presented). So reshape(-1,1) here will turn you row-vector to column-vector.

在我们的改进之后,你应该只在一个地方重塑形状 - 同时初始化x。 Scikit学习模型期望您的X数据是矩阵或列向量(如果只呈现一个特征)。因此,重塑(-1,1)会将行向量转换为列向量。

So the code will look like this:

所以代码看起来像这样:

def get_lr2(pdeg):
    np.random.seed(0)
    n = 115
    x = (np.linspace(0,10,n) + np.random.randn(n)/5).reshape(-1,1)
    y = np.sin(x)+x/6 + np.random.randn(n)/10

    X_poly = PolynomialFeatures(degree=pdeg).fit_transform(x)

    X_train, X_test, y_train, y_test = train_test_split(X_poly, y, random_state=0, test_size=0.87)

    linreg1 = LinearRegression().fit(X_train, y_train)
    return linreg1.predict(X_test)

get_lr2(2)

#1