R Caret的rfe [Error in {: task 1 failed - " rfe期望184个重要值,但只有2 "]

时间:2022-12-07 11:20:19

I am using Caret's rfe for a regression application. My data (in data.table) has 176 predictors (including 49 factor predictors). When I run the function, I get this error:

我正在使用插入符号的rfe来进行回归应用。我的数据(在data.table中)有176个谓词(包括49个因子谓词)。当我运行这个函数时,我得到这个错误:

Error in { :  task 1 failed - "rfe is expecting 176 importance values but only has 2"

Then, I used model.matrix( ~ . - 1, data = as.data.frame(train_model_sell_single_bid)) to convert the factor predictors to dummy variables. However, I got similar error:

然后,我使用模型。矩阵(~。- 1, data = as.data.frame(train_model_sell_single_bid)将因子谓词转换为哑变量。然而,我也有类似的错误:

Error in { :  task 1 failed - "rfe is expecting 184 importance values but only has 2"

I'm using R version 3.1.1 on Windows 7 (64-bit), Caret version 6.0-41. I also have Revolution R Enterprise version 7.3 (64-bit) installed. But the same error was reproduced on Amazon EC2 (c3.8xlarge) Linux instance with R version 3.0.1 and Caret version 6.0-24.

我在Windows 7(64位)上使用R版本3.1.1,插入符号版本6.0-41。我还安装了Revolution R Enterprise version 7.3(64位)。但是在Amazon EC2 (c3.8xlarge) Linux实例上,同样的错误在R版本3.0.1和插入符号版本6.0-24中重现。

Datasets used (to reproduce my error):

使用的数据集(复制我的错误):

https://www.dropbox.com/s/utuk9bpxl2996dy/train_model_sell_single_bid.RData?dl=0 https://www.dropbox.com/s/s9xcgfit3iqjffp/train_model_bid_outcomes_sell_single.RData?dl=0

https://www.dropbox.com/s/utuk9bpxl2996dy/train_model_sell_single_bid.RData?dl = 0 https://www.dropbox.com/s/s9xcgfit3iqjffp/train_model_bid_outcomes_sell_single.RData?dl=0

My code:

我的代码:

library(caret)
library(data.table)
library(bit64)
library(doMC)

load("train_model_sell_single_bid.RData")
load("train_model_bid_outcomes_sell_single.RData")

subsets <- seq(from = 4, to = 184, by= 4)

registerDoMC(cores = 32)

set.seed(1015498)
ctrl <- rfeControl(functions = lmFuncs,
                   method = "repeatedcv",
                   repeats = 1,
                   #saveDetails = TRUE,
                   verbose = FALSE)

x <- as.data.frame(train_model_sell_single_bid[,!"security_id", with=FALSE])
y <- train_model_bid_outcomes_sell_single[,bid100]

lmProfile_single_bid100 <- rfe(x, y,
                               sizes = subsets,
                               preProc = c("center", "scale"),
                               rfeControl = ctrl)

1 个解决方案

#1


3  

It seems that you might have highly correlated predictors.
Prior to feature selection you should run:

看起来你可能有高度相关的预测因子。在特征选择之前,你应该运行:

crrltn = findCorrelation(correlations, cutoff = .90)
if (length(crrltn) != 0)
  x <- x[,-crrltn]

If after this the problem persists, it might be related to high correlation of the predictors within folds automatically generated, you can try to control the generated folds with:

如果这个问题持续存在,它可能与自动生成的折叠中预测器的高相关性有关,您可以尝试控制生成的折叠:

set.seed(12213)
index <- createFolds(y, k = 10, returnTrain = T)

and then give these as arguments to the rfeControl function:

然后将这些作为rfeControl函数的参数:

lmctrl <- rfeControl(functions = lmFuncs, 
                     method = "repeatedcv", 
                     index = index,
                     verbose = TRUE)

set.seed(111333)
lrprofile <- rfe( z , x,
                  sizes = sizes,
                  rfeControl = lmctrl)

If you keep having the same problem, check if there are highly correlated between predictors within each fold:

如果你一直遇到同样的问题,检查每一种预测因素之间是否存在高度相关:

for(i in 1:length(index)){
  crrltn = cor(x[index[[i]],])     
  findCorrelation(crrltn, cutoff = .90, names = T, verbose = T)
}

#1


3  

It seems that you might have highly correlated predictors.
Prior to feature selection you should run:

看起来你可能有高度相关的预测因子。在特征选择之前,你应该运行:

crrltn = findCorrelation(correlations, cutoff = .90)
if (length(crrltn) != 0)
  x <- x[,-crrltn]

If after this the problem persists, it might be related to high correlation of the predictors within folds automatically generated, you can try to control the generated folds with:

如果这个问题持续存在,它可能与自动生成的折叠中预测器的高相关性有关,您可以尝试控制生成的折叠:

set.seed(12213)
index <- createFolds(y, k = 10, returnTrain = T)

and then give these as arguments to the rfeControl function:

然后将这些作为rfeControl函数的参数:

lmctrl <- rfeControl(functions = lmFuncs, 
                     method = "repeatedcv", 
                     index = index,
                     verbose = TRUE)

set.seed(111333)
lrprofile <- rfe( z , x,
                  sizes = sizes,
                  rfeControl = lmctrl)

If you keep having the same problem, check if there are highly correlated between predictors within each fold:

如果你一直遇到同样的问题,检查每一种预测因素之间是否存在高度相关:

for(i in 1:length(index)){
  crrltn = cor(x[index[[i]],])     
  findCorrelation(crrltn, cutoff = .90, names = T, verbose = T)
}