在R数据框架中对多个变量应用相同的因子水平

时间:2022-05-23 22:55:49

I am working with a dataset that includes 16 questions where the response set is identical (Yes, No, Unknown or Missing). I am processing the data using R and I want to turn each of the variables into a factor. For a single variable, I could use the following construction:

我正在处理一个包含16个问题的数据集,其中的响应集是相同的(Yes, No, Unknown或Missing)。我用R处理数据,我想把每个变量都变成因子。对于单个变量,我可以使用以下结构:

df <- read.csv("thedata.csv")
df$q1 <- factor(x=df$q1,levels=c(-9,0,1),
                        labels=c("Unknown or Missing","No","Yes))

I'd like to avoid typing that 16 times. I could do it with a for(), but I was wondering if there is a clearer, more R way to do it. Some sample data:

我想避免输入16次。我可以用a for()来做,但是我想知道是否有更清晰、更有效的方法。一些示例数据:

structure(list(q1 = c(0, 0, 0, -9, 0), q2 = c(0, 0, 1, 0, 0),
               q3 = c(0, 0, 1, 0, 0), q4 = c(1, 1, 0, 0, 0),
               q5 = c(0, 1, 1, 1, 1), q6 = c(1, 1, 1, 0, 0),
               q7 = c(0, 0, 0, 1, 0), q8 = c(0, 0, 1, 1, 1),
               q9 = c(1, 0, -9, 1, 0), q10 = c(1, 0, 0, 0, 0),
               q11 = c(0, 1, 1, 0, 0), q12 = c(1, 1, 0, 0, 0),
               q13 = c(1, -9, 1, 0, 0), q14 = c(0, 0, 0, 1, 1),
               q15 = c(1, 0, 1, 1, 0), q16 = c(1, 1, 1, 1, 1)),
               .Names = c("q1", "q2", "q3", "q4", "q5", "q6", "q7",
                          "q8", "q9", "q10", "q11", "q12", "q13",
                          "q14", "q15", "q16"),
               row.names = c(NA, -5L), class = "data.frame")

2 个解决方案

#1


20  

df[] <- lapply(df, factor, 
              levels=c(-9, 0, 1), 
              labels = c("Unknown or Missing", "No", "Yes"))
str(df)

Likely to be faster than apply or sapply which need data.frame to reform/reclass those results. The trick here is that using [] on the LHS of the assignment preserves the structure of the target (because R "knows" what its class and dimensions are, and the need for data.frame on the list from lapply is not needed. If you had wanted to do this only with selected columns you could do this:

可能比应用或需要数据的应用要快。这里的诀窍是在分配的LHS上使用[]保留目标的结构(因为R“知道”它的类和维度是什么,并且不需要从lapply中获取列表中的frame。如果您想只对选定的列进行此操作,您可以这样做:

 df[colnums] <- lapply(df[colnums], factor, 
              levels=c(-9, 0, 1), 
              labels = c("Unknown or Missing", "No", "Yes"))
 str(df)

#2


1  

An R base solution using apply

使用apply的R基解决方案

 data.frame(apply(df, 2, factor, 
                 levels=c(-9, 0, 1), 
                 labels = c("Unknown or Missing", "No", "Yes")))

Using sapply

使用酸式焦磷酸钠

data.frame(sapply(df, factor, levels=c(-9, 0, 1), 
         labels = c("Unknown or Missing", "No", "Yes")))

#1


20  

df[] <- lapply(df, factor, 
              levels=c(-9, 0, 1), 
              labels = c("Unknown or Missing", "No", "Yes"))
str(df)

Likely to be faster than apply or sapply which need data.frame to reform/reclass those results. The trick here is that using [] on the LHS of the assignment preserves the structure of the target (because R "knows" what its class and dimensions are, and the need for data.frame on the list from lapply is not needed. If you had wanted to do this only with selected columns you could do this:

可能比应用或需要数据的应用要快。这里的诀窍是在分配的LHS上使用[]保留目标的结构(因为R“知道”它的类和维度是什么,并且不需要从lapply中获取列表中的frame。如果您想只对选定的列进行此操作,您可以这样做:

 df[colnums] <- lapply(df[colnums], factor, 
              levels=c(-9, 0, 1), 
              labels = c("Unknown or Missing", "No", "Yes"))
 str(df)

#2


1  

An R base solution using apply

使用apply的R基解决方案

 data.frame(apply(df, 2, factor, 
                 levels=c(-9, 0, 1), 
                 labels = c("Unknown or Missing", "No", "Yes")))

Using sapply

使用酸式焦磷酸钠

data.frame(sapply(df, factor, levels=c(-9, 0, 1), 
         labels = c("Unknown or Missing", "No", "Yes")))