R mice package: 2l错误。当只有一个NA时

时间:2022-12-07 12:49:00

After specifying the "2l.norm" method when calling mice i stumbled upon an error message for variables containing only 1 NA. I realize this is a very minor problem considering the very minor amount of missing data for these variables. However, it would be elegant to take into account the data structure for these as well.

在指定“2 l。在调用鼠标时,我偶然发现了一个错误消息,该错误消息只包含一个NA。我意识到这是一个非常小的问题,考虑到这些变量丢失的数据非常少。然而,考虑到这些数据的数据结构也是很优雅的。

I re-created the situation using a database accessible for all, the ChickWeight dataset. I very much realize that the problem can also be a consequence of an error in my implementation of the procedure, so please let me know if such is the case.

我使用一个所有人都可以访问的数据库(ChickWeight dataset)重新创建了这种情况。我很清楚这个问题也可能是我在执行过程中出错的结果,所以请让我知道情况是否如此。

ChickWeight[1:20, ]
dim(ChickWeight)
sum(is.na(ChickWeight)) #contains no NAs
ChickWeight$weight[12] <- NA # add 1 NA
ChickWeight$constant <- 1 #add a constant
ChickWeight$Chick <- as.numeric(levels(ChickWeight$Chick)[ChickWeight$Chick]) #class variable has to be an integer

ini <- mice(ChickWeight, maxit = 0)
pred <- ini$predictorMatrix
pred["weight", ] <- c(0, 2, -2, 1, 2)
method <- ini$method
method["weight"] <- "2l.norm"
imputation <- mice(ChickWeight, m = 5, maxit = 5, pred = pred, method = method)

The last command results in:

最后一个命令的结果是:

Error in [<-.data.frame(*tmp*, , i, value = c(37.3233463394145, 159.862324738397 : replacement has 2 rows, data has 1

<-.data.frame(*tmp*, i, value = c) (37.3233463394145, 159.862324738397: replace有2行,data有1行

Adding one extra NA solves the problem

增加一个NA可以解决这个问题。

ChickWeight$weight[13] <- NA # add another NA
imputation <- mice(ChickWeight, m = 5, maxit = 5, pred = pred, method = method)

Does anyone know what could cause the error?

有人知道是什么导致了这个错误吗?

1 个解决方案

#1


2  

The answer was kindly provided to me by email by Stef van Buuren, the author of the mice package. There is a very minor omission in the mice.impute.2l.norm function in the current version of the mice package (2.22) and will be repaired in the next release.

本书作者Stef van Buuren通过电子邮件向我提供了答案。这封信里有一个很小的遗漏。在当前版本的鼠标包(2.22)中规范功能,并将在下一个版本中进行修复。

The correct code is the following, where only "drop = FALSE" was added in the 4th from last row to rowSums(as.matrix(x[nry, type == 2):

正确的代码如下所示,其中从最后一行到行和的第4行中只添加了“drop = FALSE”(as)。矩阵(x[nry类型= = 2):

mice.impute.2l.norm2 <- 
function (y, ry, x, type, intercept = TRUE, ...) 
{
  rwishart <- function(df, p = nrow(SqrtSigma), SqrtSigma = diag(p)) {
    Z <- matrix(0, p, p)
    diag(Z) <- sqrt(rchisq(p, df:(df - p + 1)))
    if (p > 1) {
      pseq <- 1:(p - 1)
      Z[rep(p * pseq, pseq) + unlist(lapply(pseq, seq))] <- rnorm(p * 
                                                                    (p - 1)/2)
    }
    crossprod(Z %*% SqrtSigma)
  }
  force.chol <- function(x, warn = TRUE) {
    z <- 0
    repeat {
      lambda <- 0.1 * z
      XT <- x + diag(x = lambda, nrow = nrow(x))
      XT <- (XT + t(XT))/2
      s <- try(expr = chol(XT), silent = TRUE)
      if (class(s) != "try-error") 
        break
      z <- z + 1
    }
    attr(s, "forced") <- (z > 0)
    if (warn && z > 0) 
      warning("Cholesky decomposition had to be forced", 
              call. = FALSE)
    return(s)
  }
  if (intercept) {
    x <- cbind(1, as.matrix(x))
    type <- c(2, type)
  }
  n.iter <- 100
  nry <- !ry
  n.class <- length(unique(x[, type == (-2)]))
  if (n.class == 0) 
    stop("No class variable")
  gf.full <- factor(x[, type == (-2)], labels = 1:n.class)
  gf <- gf.full[ry]
  XG <- split.data.frame(as.matrix(x[ry, type == 2]), gf)
  X.SS <- lapply(XG, crossprod)
  yg <- split(as.vector(y[ry]), gf)
  n.g <- tabulate(gf)
  n.rc <- ncol(XG[[1]])
  bees <- matrix(0, nrow = n.class, ncol = n.rc)
  ss <- vector(mode = "numeric", length = n.class)
  mu <- rep(0, n.rc)
  inv.psi <- diag(1, n.rc, n.rc)
  inv.sigma2 <- rep(1, n.class)
  sigma2.0 <- 1
  theta <- 1
  for (iter in 1:n.iter) {
    for (class in 1:n.class) {
      vv <- sym(inv.sigma2[class] * X.SS[[class]] + inv.psi)
      bees.var <- chol2inv(chol(vv))
      bees[class, ] <- drop(bees.var %*% (crossprod(inv.sigma2[class] * 
                                                      XG[[class]], yg[[class]]) + inv.psi %*% mu)) + 
        drop(rnorm(n = n.rc) %*% chol(sym(bees.var)))
      ss[class] <- crossprod(yg[[class]] - XG[[class]] %*% 
                               bees[class, ])
    }
    mu <- colMeans(bees) + drop(rnorm(n = n.rc) %*% chol(chol2inv(chol(sym(inv.psi)))/n.class))
    inv.psi <- rwishart(df = n.class - n.rc - 1, SqrtSigma = chol(chol2inv(chol(sym(crossprod(t(t(bees) - 
                                                                                                  mu)))))))
    inv.sigma2 <- rgamma(n.class, n.g/2 + 1/(2 * theta), 
                         scale = 2 * theta/(ss * theta + sigma2.0))
    H <- 1/mean(inv.sigma2)
    sigma2.0 <- rgamma(1, n.class/(2 * theta) + 1, scale = 2 * 
                         theta * H/n.class)
    G <- exp(mean(log(1/inv.sigma2)))
    theta <- 1/rgamma(1, n.class/2 - 1, scale = 2/(n.class * 
                                                     (sigma2.0/H - log(sigma2.0) + log(G) - 1)))
  }
  imps <- rnorm(n = sum(nry), sd = sqrt(1/inv.sigma2[gf.full[nry]])) + 
    rowSums(as.matrix(x[nry, type == 2, drop = FALSE]) * bees[gf.full[nry], 
                                                              ])
  return(imps)
}

When the new function is added to the mice namespace by

当新函数被添加到mice命名空间by时

environment(mice.impute.2l.norm2) <- asNamespace('mice')

it can be used as normal in mice by calling for 2l.norm2. To show in the previously described example:

它可以通过调用2l.norm2在小鼠中正常使用。在前面描述的示例中显示:

method["weight"] <- "2l.norm2"
imputation <- mice(ChickWeight, m = 5, maxit = 5, pred = pred, method = method)

It now works as expected!

它现在可以像预期的那样工作了!

#1


2  

The answer was kindly provided to me by email by Stef van Buuren, the author of the mice package. There is a very minor omission in the mice.impute.2l.norm function in the current version of the mice package (2.22) and will be repaired in the next release.

本书作者Stef van Buuren通过电子邮件向我提供了答案。这封信里有一个很小的遗漏。在当前版本的鼠标包(2.22)中规范功能,并将在下一个版本中进行修复。

The correct code is the following, where only "drop = FALSE" was added in the 4th from last row to rowSums(as.matrix(x[nry, type == 2):

正确的代码如下所示,其中从最后一行到行和的第4行中只添加了“drop = FALSE”(as)。矩阵(x[nry类型= = 2):

mice.impute.2l.norm2 <- 
function (y, ry, x, type, intercept = TRUE, ...) 
{
  rwishart <- function(df, p = nrow(SqrtSigma), SqrtSigma = diag(p)) {
    Z <- matrix(0, p, p)
    diag(Z) <- sqrt(rchisq(p, df:(df - p + 1)))
    if (p > 1) {
      pseq <- 1:(p - 1)
      Z[rep(p * pseq, pseq) + unlist(lapply(pseq, seq))] <- rnorm(p * 
                                                                    (p - 1)/2)
    }
    crossprod(Z %*% SqrtSigma)
  }
  force.chol <- function(x, warn = TRUE) {
    z <- 0
    repeat {
      lambda <- 0.1 * z
      XT <- x + diag(x = lambda, nrow = nrow(x))
      XT <- (XT + t(XT))/2
      s <- try(expr = chol(XT), silent = TRUE)
      if (class(s) != "try-error") 
        break
      z <- z + 1
    }
    attr(s, "forced") <- (z > 0)
    if (warn && z > 0) 
      warning("Cholesky decomposition had to be forced", 
              call. = FALSE)
    return(s)
  }
  if (intercept) {
    x <- cbind(1, as.matrix(x))
    type <- c(2, type)
  }
  n.iter <- 100
  nry <- !ry
  n.class <- length(unique(x[, type == (-2)]))
  if (n.class == 0) 
    stop("No class variable")
  gf.full <- factor(x[, type == (-2)], labels = 1:n.class)
  gf <- gf.full[ry]
  XG <- split.data.frame(as.matrix(x[ry, type == 2]), gf)
  X.SS <- lapply(XG, crossprod)
  yg <- split(as.vector(y[ry]), gf)
  n.g <- tabulate(gf)
  n.rc <- ncol(XG[[1]])
  bees <- matrix(0, nrow = n.class, ncol = n.rc)
  ss <- vector(mode = "numeric", length = n.class)
  mu <- rep(0, n.rc)
  inv.psi <- diag(1, n.rc, n.rc)
  inv.sigma2 <- rep(1, n.class)
  sigma2.0 <- 1
  theta <- 1
  for (iter in 1:n.iter) {
    for (class in 1:n.class) {
      vv <- sym(inv.sigma2[class] * X.SS[[class]] + inv.psi)
      bees.var <- chol2inv(chol(vv))
      bees[class, ] <- drop(bees.var %*% (crossprod(inv.sigma2[class] * 
                                                      XG[[class]], yg[[class]]) + inv.psi %*% mu)) + 
        drop(rnorm(n = n.rc) %*% chol(sym(bees.var)))
      ss[class] <- crossprod(yg[[class]] - XG[[class]] %*% 
                               bees[class, ])
    }
    mu <- colMeans(bees) + drop(rnorm(n = n.rc) %*% chol(chol2inv(chol(sym(inv.psi)))/n.class))
    inv.psi <- rwishart(df = n.class - n.rc - 1, SqrtSigma = chol(chol2inv(chol(sym(crossprod(t(t(bees) - 
                                                                                                  mu)))))))
    inv.sigma2 <- rgamma(n.class, n.g/2 + 1/(2 * theta), 
                         scale = 2 * theta/(ss * theta + sigma2.0))
    H <- 1/mean(inv.sigma2)
    sigma2.0 <- rgamma(1, n.class/(2 * theta) + 1, scale = 2 * 
                         theta * H/n.class)
    G <- exp(mean(log(1/inv.sigma2)))
    theta <- 1/rgamma(1, n.class/2 - 1, scale = 2/(n.class * 
                                                     (sigma2.0/H - log(sigma2.0) + log(G) - 1)))
  }
  imps <- rnorm(n = sum(nry), sd = sqrt(1/inv.sigma2[gf.full[nry]])) + 
    rowSums(as.matrix(x[nry, type == 2, drop = FALSE]) * bees[gf.full[nry], 
                                                              ])
  return(imps)
}

When the new function is added to the mice namespace by

当新函数被添加到mice命名空间by时

environment(mice.impute.2l.norm2) <- asNamespace('mice')

it can be used as normal in mice by calling for 2l.norm2. To show in the previously described example:

它可以通过调用2l.norm2在小鼠中正常使用。在前面描述的示例中显示:

method["weight"] <- "2l.norm2"
imputation <- mice(ChickWeight, m = 5, maxit = 5, pred = pred, method = method)

It now works as expected!

它现在可以像预期的那样工作了!