使用0和1向量填充数据帧[重复]

时间:2023-02-09 18:04:24

This question already has an answer here:

这个问题在这里已有答案:

I have a datarame with two columns (A and B). Column A is categorical B is numeric (ranging from 0.0 to 1.0). I want to create a column C for which the values are 1 when the value in Column B is greater than or equal to 0.5 and 0 when the value in column B is less than 0.5. Any suggestions on how to do this? The final df should look like this:

我有一个包含两列(A和B)的数据帧。列A是分类B是数字(范围从0.0到1.0)。我想创建一个列C,当列B中的值大于或等于0.5时值为1,当列B中的值小于0.5时,值为0。有关如何做到这一点的任何建议?最终的df应如下所示:

A = c('spA', 'spB', 'spC', 'spD') 
B = c(0.25, 0.15, 0.50, 0.75) 
C = c(0,0,1,1) 
df = data.frame(A, B, C)

1 个解决方案

#1


1  

Just use

只是用

A = c('spA', 'spB', 'spC', 'spD')  
B = c(0.25, 0.15, 0.50, 0.75)  
df = data.frame(A, B)

df$C <- as.numeric(df$B >= 0.5)

@David Arenburg: Speed comparison of all 3 solutions pointed our above
To be honest i dont know why it is that much faster.

@David Arenburg:我们上面提到的所有3个解决方案的速度比较说实话我不知道为什么它会快得多。

require(microbenchmark)
microbenchmark(
  df$C <- ifelse(df$B>=0.5, 1, 0),
  transform(df, C = as.numeric(B >= 0.5)),
  df$C <- as.numeric(df$B>=0.5)
  )

Result:

结果:

Unit: microseconds
                                    expr     min       lq   median       uq    max neval
       df$C <- ifelse(df$B >= 0.5, 1, 0)  33.585  35.7580  38.1285  41.6845 140.66   100
 transform(df, C = as.numeric(B >= 0.5)) 143.821 149.7470 155.0815 164.5640 284.48   100
         df$C <- as.numeric(df$B >= 0.5)  20.546  22.9165  24.2995  27.2630  53.34   100

EDIT: Lager Dataset

编辑:Lager Dataset

df <- data.frame(B=runif(100000))

require(microbenchmark)
microbenchmark(
  df$C <- ifelse(df$B>=0.5, 1, 0),
  transform(df, C = as.numeric(B >= 0.5)),
  df$C <- as.numeric(df$B>=0.5)
  )

Unit: microseconds
                                    expr       min        lq     median         uq       max neval
       df$C <- ifelse(df$B >= 0.5, 1, 0) 31620.826 33623.452 34529.8380 55652.9290 62707.064   100
 transform(df, C = as.numeric(B >= 0.5))   811.561   979.286  1032.6255  1248.5550  2333.137   100
         df$C <- as.numeric(df$B >= 0.5)   606.498   764.542   808.0045   979.0875 23805.112   100

#1


1  

Just use

只是用

A = c('spA', 'spB', 'spC', 'spD')  
B = c(0.25, 0.15, 0.50, 0.75)  
df = data.frame(A, B)

df$C <- as.numeric(df$B >= 0.5)

@David Arenburg: Speed comparison of all 3 solutions pointed our above
To be honest i dont know why it is that much faster.

@David Arenburg:我们上面提到的所有3个解决方案的速度比较说实话我不知道为什么它会快得多。

require(microbenchmark)
microbenchmark(
  df$C <- ifelse(df$B>=0.5, 1, 0),
  transform(df, C = as.numeric(B >= 0.5)),
  df$C <- as.numeric(df$B>=0.5)
  )

Result:

结果:

Unit: microseconds
                                    expr     min       lq   median       uq    max neval
       df$C <- ifelse(df$B >= 0.5, 1, 0)  33.585  35.7580  38.1285  41.6845 140.66   100
 transform(df, C = as.numeric(B >= 0.5)) 143.821 149.7470 155.0815 164.5640 284.48   100
         df$C <- as.numeric(df$B >= 0.5)  20.546  22.9165  24.2995  27.2630  53.34   100

EDIT: Lager Dataset

编辑:Lager Dataset

df <- data.frame(B=runif(100000))

require(microbenchmark)
microbenchmark(
  df$C <- ifelse(df$B>=0.5, 1, 0),
  transform(df, C = as.numeric(B >= 0.5)),
  df$C <- as.numeric(df$B>=0.5)
  )

Unit: microseconds
                                    expr       min        lq     median         uq       max neval
       df$C <- ifelse(df$B >= 0.5, 1, 0) 31620.826 33623.452 34529.8380 55652.9290 62707.064   100
 transform(df, C = as.numeric(B >= 0.5))   811.561   979.286  1032.6255  1248.5550  2333.137   100
         df$C <- as.numeric(df$B >= 0.5)   606.498   764.542   808.0045   979.0875 23805.112   100