从数据框中的不同列中删除值

时间:2023-02-08 13:19:29

I have a dataset that contains in some columns two values that I have to change to NA.

我有一个数据集,它在某些列中包含两个值,我必须将它们改为NA。

'#DIV/0' and '' (nothing)

“# DIV / 0”和“(无)

I solved this problem using a 'for' loop but I would like to know if there is another way, like using 'apply' and what is the faster method.

我用for循环解决了这个问题,但是我想知道是否有其他的方法,比如使用apply和更快的方法。

My code:

我的代码:

train <- read.csv('https://d396qusza40orc.cloudfront.net/predmachlearn/pml-training.csv',stringsAsFactors = F)
test <- read.csv('https://d396qusza40orc.cloudfront.net/predmachlearn/pml-testing.csv', stringsAsFactors = F)

train2 <- train
for(x in 1:length(train2)){
        train2[train2[,x] %in% c('','#DIV/0'),x] <- NA
}

test2 <- test
for(x in 1:length(test2)){
        test2[test2[,x] %in% c('','#DIV/0'),x] <- NA
}

1 个解决方案

#1


3  

We can use na.strings argument in the read.csv

我们可以利用na。csv中的字符串参数

train <- read.csv('https://d396qusza40orc.cloudfront.net/predmachlearn/pml-training.csv', 
              na.strings=c('#DIV/0', '', 'NA') ,stringsAsFactors = F)
test <- read.csv('https://d396qusza40orc.cloudfront.net/predmachlearn/pml-testing.csv',
                na.strings= c('#DIV/0', '', 'NA'),stringsAsFactors = F)

Just checking

只是检查

sum(train=='#DIV/0', na.rm=TRUE)
#[1] 0
sum(test=='#DIV/0', na.rm=TRUE)
#[1] 0
sum(test=='', na.rm=TRUE)
#[1] 0
sum(train=='', na.rm=TRUE)
#[1] 0

The NA values

NA的值

sum(is.na(train))
#[1] 1921600
sum(is.na(test))
#[1] 2000

#1


3  

We can use na.strings argument in the read.csv

我们可以利用na。csv中的字符串参数

train <- read.csv('https://d396qusza40orc.cloudfront.net/predmachlearn/pml-training.csv', 
              na.strings=c('#DIV/0', '', 'NA') ,stringsAsFactors = F)
test <- read.csv('https://d396qusza40orc.cloudfront.net/predmachlearn/pml-testing.csv',
                na.strings= c('#DIV/0', '', 'NA'),stringsAsFactors = F)

Just checking

只是检查

sum(train=='#DIV/0', na.rm=TRUE)
#[1] 0
sum(test=='#DIV/0', na.rm=TRUE)
#[1] 0
sum(test=='', na.rm=TRUE)
#[1] 0
sum(train=='', na.rm=TRUE)
#[1] 0

The NA values

NA的值

sum(is.na(train))
#[1] 1921600
sum(is.na(test))
#[1] 2000