PolynomialFeatures LinearRegression ValueError:未对齐的形状

时间:2022-06-24 21:22:44

I am trying to write a function which trains and tests a LinearRegression with PolynomialFeatures. Here is my code:


def get_lr2(pdeg):
  from sklearn.linear_model import LinearRegression
  from sklearn.preprocessing import PolynomialFeatures
  from sklearn.metrics.regression import r2_score
  from sklearn.model_selection import train_test_split
  import numpy as np
  import pandas as pd

  n = 15
  x = np.linspace(0,10,n) + np.random.randn(n)/5
  y = np.sin(x)+x/6 + np.random.randn(n)/10
  X_train, X_test, y_train, y_test = train_test_split(x, y, random_state=0)
  test_data = np.linspace(0,10,100).reshape(100,1)
  X_trainT     = X_train.reshape(-1,1)
  y_trainT     = y_train.reshape(-1,1)
  poly = PolynomialFeatures(degree=pdeg)
  X_poly = poly.fit_transform(X_trainT)
  X_train1, X_test1, y_train1, y_test1 = train_test_split(X_poly, y_trainT, random_state = 0)
  linreg1 = LinearRegression().fit(X_train1, y_train1)
  return linreg1.predict(test_data)

When I call the function (get_lr2(1)) I am getting


  ValueError                                Traceback (most recent call last)
  ---> 84 get_lr2(1)

  <ipython-input-29-a9966181155e> in get_lr2(pdeg)
  23     X_train1, X_test1, y_train1, y_test1 = train_test_split(X_poly, y_trainT, random_state = 0)
  24     linreg1 = LinearRegression().fit(X_train1, y_train1)
  ---> 25     return linreg1.predict(test_data)

  ValueError: shapes (100,1) and (2,1) not aligned: 1 (dim 1) != 2 (dim 0)

Can you help?


1 个解决方案



Your code is rather strange. Let's try to reformat it in several ways:


  • Train_test _split.

    You doing train_test_split and then throw away your test set and create another one. This is rather strange. If you want that your train test splits sizes were in proportion 15/100 come from, just set this in train_test_split option. So test size should be 100/(100+15) ~= 0.87.

    你做了train_test_split,然后扔掉你的测试集并创建另一个。这很奇怪。如果你想要你的火车测试分裂大小是15/100的比例来自,只需在train_test_split选项中设置它。因此测试大小应为100 /(100 + 15)〜= 0.87。

  • Preprocessing.

    If you want to apply some preprocessing(polynomial features here) transformers you can apply them to whole dataset, not some split. This is not true if transformer is dependent to data(in that case you must do fit_transform on train set and then only transform on test set) but in your case it does not matter.


  • Reshape.

    After our improvements you should do reshape only in one place - while initializing x. Scikit learn models expect your X data to be matrix or column-vector(if only one feature presented). So reshape(-1,1) here will turn you row-vector to column-vector.

    在我们的改进之后,你应该只在一个地方重塑形状 - 同时初始化x。 Scikit学习模型期望您的X数据是矩阵或列向量(如果只呈现一个特征)。因此,重塑(-1,1)会将行向量转换为列向量。

So the code will look like this:


def get_lr2(pdeg):
    n = 115
    x = (np.linspace(0,10,n) + np.random.randn(n)/5).reshape(-1,1)
    y = np.sin(x)+x/6 + np.random.randn(n)/10

    X_poly = PolynomialFeatures(degree=pdeg).fit_transform(x)

    X_train, X_test, y_train, y_test = train_test_split(X_poly, y, random_state=0, test_size=0.87)

    linreg1 = LinearRegression().fit(X_train, y_train)
    return linreg1.predict(X_test)




Your code is rather strange. Let's try to reformat it in several ways:


  • Train_test _split.

    You doing train_test_split and then throw away your test set and create another one. This is rather strange. If you want that your train test splits sizes were in proportion 15/100 come from, just set this in train_test_split option. So test size should be 100/(100+15) ~= 0.87.

    你做了train_test_split,然后扔掉你的测试集并创建另一个。这很奇怪。如果你想要你的火车测试分裂大小是15/100的比例来自,只需在train_test_split选项中设置它。因此测试大小应为100 /(100 + 15)〜= 0.87。

  • Preprocessing.

    If you want to apply some preprocessing(polynomial features here) transformers you can apply them to whole dataset, not some split. This is not true if transformer is dependent to data(in that case you must do fit_transform on train set and then only transform on test set) but in your case it does not matter.


  • Reshape.

    After our improvements you should do reshape only in one place - while initializing x. Scikit learn models expect your X data to be matrix or column-vector(if only one feature presented). So reshape(-1,1) here will turn you row-vector to column-vector.

    在我们的改进之后,你应该只在一个地方重塑形状 - 同时初始化x。 Scikit学习模型期望您的X数据是矩阵或列向量(如果只呈现一个特征)。因此,重塑(-1,1)会将行向量转换为列向量。

So the code will look like this:


def get_lr2(pdeg):
    n = 115
    x = (np.linspace(0,10,n) + np.random.randn(n)/5).reshape(-1,1)
    y = np.sin(x)+x/6 + np.random.randn(n)/10

    X_poly = PolynomialFeatures(degree=pdeg).fit_transform(x)

    X_train, X_test, y_train, y_test = train_test_split(X_poly, y, random_state=0, test_size=0.87)

    linreg1 = LinearRegression().fit(X_train, y_train)
    return linreg1.predict(X_test)
