为什么。因子返回字符时使用的内部应用?

时间:2022-12-21 18:34:35

I want to convert variables into factors using apply():

我想用apply()把变量转换成因子:

a <- data.frame(x1 = rnorm(100),
                x2 = sample(c("a","b"), 100, replace = T),
                x3 = factor(c(rep("a",50) , rep("b",50))))

a2 <- apply(a, 2,as.factor)
apply(a2, 2,class)

results in:

结果:

         x1          x2          x3 
"character" "character" "character" 

I don't understand why this results in character vectors instead of factor vectors.

我不明白为什么这个结果是字符向量而不是因子向量。

1 个解决方案

#1


29  

apply converts your data.frame to a character matrix. Use lapply:

应用程序将您的data.frame转换为字符矩阵。使用拉普兰人:

lapply(a, class)
# $x1
# [1] "numeric"
# $x2
# [1] "factor"
# $x3
# [1] "factor"

In second command apply converts result to character matrix, using lapply:

在第二个命令中,apply将结果转换为字符矩阵,使用lapply:

a2 <- lapply(a, as.factor)
lapply(a2, class)
# $x1
# [1] "factor"
# $x2
# [1] "factor"
# $x3
# [1] "factor"

But for simple lookout you could use str:

但对于简单的了望,你可以使用str:

str(a)
# 'data.frame':   100 obs. of  3 variables:
#  $ x1: num  -1.79 -1.091 1.307 1.142 -0.972 ...
#  $ x2: Factor w/ 2 levels "a","b": 2 1 1 1 2 1 1 1 1 2 ...
#  $ x3: Factor w/ 2 levels "a","b": 1 1 1 1 1 1 1 1 1 1 ...

Additional explanation according to comments:

根据评论补充说明:

Why does the lapply work while apply doesn't?

The first thing that apply does is to convert an argument to a matrix. So apply(a) is equivalent to apply(as.matrix(a)). As you can see str(as.matrix(a)) gives you:

应用程序做的第一件事是把一个参数转换成一个矩阵。所以apply(a)等价于apply(as.matrix(a))。可以看到str(As .matrix(a))给出:

chr [1:100, 1:3] " 0.075124364" "-1.608618269" "-1.487629526" ...
- attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:3] "x1" "x2" "x3"

There are no more factors, so class return "character" for all columns.
lapply works on columns so gives you what you want (it does something like class(a$column_name) for each column).

没有其他因素,所以类返回所有列的“字符”。lapply工作在列上,因此为您提供所需的内容(它对每个列执行类似class($column_name)的操作)。

You can see in help to apply why apply and as.factor doesn't work :

您可以在帮助中看到为什么应用和as。因素不工作:

In all cases the result is coerced by as.vector to one of the basic vector types before the dimensions are set, so that (for example) factor results will be coerced to a character array.

在所有情况下,结果都是由as强制的。在设置维度之前,向量到一个基本向量类型,因此(例如)因子结果将被强制到一个字符数组。

Why sapply and as.factor doesn't work you can see in help to sapply:

为什么酸式焦磷酸钠和作为。因素不起作用,你可以在帮助sapply:

Value (...) An atomic vector or matrix or list of the same length as X (...) If simplification occurs, the output type is determined from the highest type of the return values in the hierarchy NULL < raw < logical < integer < real < complex < character < list < expression, after coercion of pairlists to lists.

价值(…)原子矢量或矩阵或与X相同长度的列表(…)如果发生简化,输出类型将由层次结构NULL < raw < logic < integer < real < complex < character < list < expression)中返回值的最高类型决定,在对列表强制执行之后。

You never get matrix of factors or data.frame.

你永远不会得到因子或数据的矩阵。

How to convert output to data.frame?

Simple, use as.data.frame as you wrote in comment:

很简单,使用as.data.frame,就像你在评论中写的那样:

a2 <- as.data.frame(lapply(a, as.factor))
str(a2)
'data.frame':   100 obs. of  3 variables:
 $ x1: Factor w/ 100 levels "-2.49629293159922",..: 60 6 7 63 45 93 56 98 40 61 ...
 $ x2: Factor w/ 2 levels "a","b": 1 1 2 2 2 2 2 1 2 2 ...
 $ x3: Factor w/ 2 levels "a","b": 1 1 1 1 1 1 1 1 1 1 ...

But if you want to replace selected character columns with factor there is a trick:

但是如果你想用因子替换选定的字符列,有一个技巧:

a3 <- data.frame(x1=letters, x2=LETTERS, x3=LETTERS, stringsAsFactors=FALSE)
str(a3)
'data.frame':   26 obs. of  3 variables:
 $ x1: chr  "a" "b" "c" "d" ...
 $ x2: chr  "A" "B" "C" "D" ...
 $ x3: chr  "A" "B" "C" "D" ...

columns_to_change <- c("x1","x2")
a3[, columns_to_change] <- lapply(a3[, columns_to_change], as.factor)
str(a3)
'data.frame':   26 obs. of  3 variables:
 $ x1: Factor w/ 26 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ x2: Factor w/ 26 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ x3: chr  "A" "B" "C" "D" ...

You could use it to replace all columns using:

您可以使用它来替换所有的列:

a3 <- data.frame(x1=letters, x2=LETTERS, x3=LETTERS, stringsAsFactors=FALSE)
a3[, ] <- lapply(a3, as.factor)
str(a3)
'data.frame':   26 obs. of  3 variables:
 $ x1: Factor w/ 26 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ x2: Factor w/ 26 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ x3: Factor w/ 26 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10 ...

#1


29  

apply converts your data.frame to a character matrix. Use lapply:

应用程序将您的data.frame转换为字符矩阵。使用拉普兰人:

lapply(a, class)
# $x1
# [1] "numeric"
# $x2
# [1] "factor"
# $x3
# [1] "factor"

In second command apply converts result to character matrix, using lapply:

在第二个命令中,apply将结果转换为字符矩阵,使用lapply:

a2 <- lapply(a, as.factor)
lapply(a2, class)
# $x1
# [1] "factor"
# $x2
# [1] "factor"
# $x3
# [1] "factor"

But for simple lookout you could use str:

但对于简单的了望,你可以使用str:

str(a)
# 'data.frame':   100 obs. of  3 variables:
#  $ x1: num  -1.79 -1.091 1.307 1.142 -0.972 ...
#  $ x2: Factor w/ 2 levels "a","b": 2 1 1 1 2 1 1 1 1 2 ...
#  $ x3: Factor w/ 2 levels "a","b": 1 1 1 1 1 1 1 1 1 1 ...

Additional explanation according to comments:

根据评论补充说明:

Why does the lapply work while apply doesn't?

The first thing that apply does is to convert an argument to a matrix. So apply(a) is equivalent to apply(as.matrix(a)). As you can see str(as.matrix(a)) gives you:

应用程序做的第一件事是把一个参数转换成一个矩阵。所以apply(a)等价于apply(as.matrix(a))。可以看到str(As .matrix(a))给出:

chr [1:100, 1:3] " 0.075124364" "-1.608618269" "-1.487629526" ...
- attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:3] "x1" "x2" "x3"

There are no more factors, so class return "character" for all columns.
lapply works on columns so gives you what you want (it does something like class(a$column_name) for each column).

没有其他因素,所以类返回所有列的“字符”。lapply工作在列上,因此为您提供所需的内容(它对每个列执行类似class($column_name)的操作)。

You can see in help to apply why apply and as.factor doesn't work :

您可以在帮助中看到为什么应用和as。因素不工作:

In all cases the result is coerced by as.vector to one of the basic vector types before the dimensions are set, so that (for example) factor results will be coerced to a character array.

在所有情况下,结果都是由as强制的。在设置维度之前,向量到一个基本向量类型,因此(例如)因子结果将被强制到一个字符数组。

Why sapply and as.factor doesn't work you can see in help to sapply:

为什么酸式焦磷酸钠和作为。因素不起作用,你可以在帮助sapply:

Value (...) An atomic vector or matrix or list of the same length as X (...) If simplification occurs, the output type is determined from the highest type of the return values in the hierarchy NULL < raw < logical < integer < real < complex < character < list < expression, after coercion of pairlists to lists.

价值(…)原子矢量或矩阵或与X相同长度的列表(…)如果发生简化,输出类型将由层次结构NULL < raw < logic < integer < real < complex < character < list < expression)中返回值的最高类型决定,在对列表强制执行之后。

You never get matrix of factors or data.frame.

你永远不会得到因子或数据的矩阵。

How to convert output to data.frame?

Simple, use as.data.frame as you wrote in comment:

很简单,使用as.data.frame,就像你在评论中写的那样:

a2 <- as.data.frame(lapply(a, as.factor))
str(a2)
'data.frame':   100 obs. of  3 variables:
 $ x1: Factor w/ 100 levels "-2.49629293159922",..: 60 6 7 63 45 93 56 98 40 61 ...
 $ x2: Factor w/ 2 levels "a","b": 1 1 2 2 2 2 2 1 2 2 ...
 $ x3: Factor w/ 2 levels "a","b": 1 1 1 1 1 1 1 1 1 1 ...

But if you want to replace selected character columns with factor there is a trick:

但是如果你想用因子替换选定的字符列,有一个技巧:

a3 <- data.frame(x1=letters, x2=LETTERS, x3=LETTERS, stringsAsFactors=FALSE)
str(a3)
'data.frame':   26 obs. of  3 variables:
 $ x1: chr  "a" "b" "c" "d" ...
 $ x2: chr  "A" "B" "C" "D" ...
 $ x3: chr  "A" "B" "C" "D" ...

columns_to_change <- c("x1","x2")
a3[, columns_to_change] <- lapply(a3[, columns_to_change], as.factor)
str(a3)
'data.frame':   26 obs. of  3 variables:
 $ x1: Factor w/ 26 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ x2: Factor w/ 26 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ x3: chr  "A" "B" "C" "D" ...

You could use it to replace all columns using:

您可以使用它来替换所有的列:

a3 <- data.frame(x1=letters, x2=LETTERS, x3=LETTERS, stringsAsFactors=FALSE)
a3[, ] <- lapply(a3, as.factor)
str(a3)
'data.frame':   26 obs. of  3 variables:
 $ x1: Factor w/ 26 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ x2: Factor w/ 26 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ x3: Factor w/ 26 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10 ...