在[R]中获取错误,在需要TRUE/FALSE时丢失值。

时间:2023-01-12 22:29:06

I am trying to step through a vector to find the outliers using IQR to calculate a range. When I run this script looking for values to the right of the IQR I get results and when I run to the left I get the error: missing value where TRUE/FALSE needed. How can I scrub out the true and false in my dataset? here is my script:

我正在尝试通过一个向量来找到使用IQR来计算一个范围的离群值。当我运行这个脚本寻找IQR的右边的值时,我得到了结果,当我运行到左边时,我得到了错误:缺少TRUE/FALSE所需的值。如何在数据集中删除真实和错误?这是我的脚本:

data = c(100, 120, 121, 123, 125, 124, 123, 123, 123, 124, 125, 167, 180, 123, 156)
Q3 <- quantile(data, 0.75) ##gets the third quantile from the list of vectors
Q1 <- quantile(data, 0.25) ## gets the first quantile from the list of vectors
outliers_left <-(Q1-1.5*IQR(data)) 
outliers_right <-(Q3+1.5*IQR(data))
IQR <- IQR(data)
paste("the innner quantile range is", IQR)
Q1 # quantil at 0.25
Q3 # quantile at 0.75
# show the range of numbers we have
paste("your range is", outliers_left, "through", outliers_right, "to determine outliers")
# count ho many vectors there are and then we will pass this value into a loop to look for 
# anything above and below the Q1-Q3 values
vectorCount <- sum(!is.na(data))
i <- 1
while( i < vectorCount ){
i <- i + 1
x <- data[i]
# if(x < outliers_left) {print(x)} # uncomment this to run and test for the left
if(x > outliers_right) {print(x)}
}

and the error I get is

我得到的误差是。

[1] 167
[1] 180
[1] 156
Error in if (x > outliers_right) { : 
missing value where TRUE/FALSE needed

as you can see if you run this script, it is finding my 3 outliers on the right and also throws the error, but when I run this again on the left of my IQR, and I do have an outlier of 100 in the vector, I just get the error without other results being displayed. How can I fix this script? any help greatly appreciated. I've been scouring the web and my books for days on how to fix this.

你可以看到如果你运行这个脚本,这是找到我3离群值右边也抛出错误,但是当我再次运行这个左边我的差,我有一个离群值向量,100的我只是得到错误没有其他的结果被显示。我如何修复这个脚本?任何帮助深表感谢。我在网上和我的书里搜索了好几天,来解决这个问题。

2 个解决方案

#1


3  

As noted in the comments, the error is due to the way you've constructed your while loop. At the last iteration, i == 16 though there are only 15 elements to process. Changing from i <= vectorCount to i < vectorCount fixes the problem:

正如注释中所指出的,错误是由于您构建while循环的方式造成的。在最后一次迭代中,i == 16,但过程中只有15个元素。从i <= vectorCount改为i < vectorCount修复问题:

i <- 1
while( i < vectorCount ){
  i <- i + 1
  x <- data[i]
  # if(x < outliers_left) {print(x)} # uncomment this to run and test for the left
  if(x > outliers_right) {print(x)}
}
#-----
[1] 167
[1] 180
[1] 156

However, this is really not how R works and you'll soon be frustrated at how long that code will take to run for any appreciable sized data. R is "vectorized" meaning that you can operate on all 15 elements of data at once. To print your outliers, I'd do this:

然而,这并不是R的工作方式,您很快就会对代码的运行时间感到失望。R是“vectorized”,意思是您可以同时对所有15个数据元素进行操作。为了打印出你的离群值,我会这样做:

data[data > outliers_right]
#-----
[1] 167 180 156

Or to get all of them at once using the OR operator:

或者使用Or操作符一次性获取所有信息:

data[data< outliers_left | data > outliers_right]
#-----
[1] 100 167 180 156

For a little context, The above logical comparisons create a boolean value for each element of data and R only returns those that are TRUE. You can check this for yourself by typing:

对于一个小的上下文,上面的逻辑比较为每个数据元素创建一个布尔值,而R只返回TRUE。你可以通过输入:

data > outliers_right
#----
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE  TRUE

The [ bit is actually an extraction operator, used to retrieve a subset of a data object. See the help page for some good background ?"[".

[bit实际上是一个提取操作符,用于检索数据对象的一个子集。请参阅“帮助页”以获得一些良好的背景信息。

#2


1  

The error message arises because you you let i <= vectorCount so i can equal vectorCount, and thus indexing i = i+1 from data will give NA, and the if statement will fail.

错误消息会出现,因为您让i <= vectorCount,这样我就可以获得相同的vectorCount,因此从数据中索引i = i+1将提供NA,而if语句将失败。

If you want to find the outliers based on the IQR, you can use findInterval

如果要根据IQR找到异常值,可以使用findInterval。

outliers <- data[findInterval(data, c(Q1,Q3)) != 1]

I would also stop using paste to create character messages to be printed, use message instead.

我也会停止使用粘贴来创建字符消息来打印,而是使用消息。

#1


3  

As noted in the comments, the error is due to the way you've constructed your while loop. At the last iteration, i == 16 though there are only 15 elements to process. Changing from i <= vectorCount to i < vectorCount fixes the problem:

正如注释中所指出的,错误是由于您构建while循环的方式造成的。在最后一次迭代中,i == 16,但过程中只有15个元素。从i <= vectorCount改为i < vectorCount修复问题:

i <- 1
while( i < vectorCount ){
  i <- i + 1
  x <- data[i]
  # if(x < outliers_left) {print(x)} # uncomment this to run and test for the left
  if(x > outliers_right) {print(x)}
}
#-----
[1] 167
[1] 180
[1] 156

However, this is really not how R works and you'll soon be frustrated at how long that code will take to run for any appreciable sized data. R is "vectorized" meaning that you can operate on all 15 elements of data at once. To print your outliers, I'd do this:

然而,这并不是R的工作方式,您很快就会对代码的运行时间感到失望。R是“vectorized”,意思是您可以同时对所有15个数据元素进行操作。为了打印出你的离群值,我会这样做:

data[data > outliers_right]
#-----
[1] 167 180 156

Or to get all of them at once using the OR operator:

或者使用Or操作符一次性获取所有信息:

data[data< outliers_left | data > outliers_right]
#-----
[1] 100 167 180 156

For a little context, The above logical comparisons create a boolean value for each element of data and R only returns those that are TRUE. You can check this for yourself by typing:

对于一个小的上下文,上面的逻辑比较为每个数据元素创建一个布尔值,而R只返回TRUE。你可以通过输入:

data > outliers_right
#----
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE  TRUE

The [ bit is actually an extraction operator, used to retrieve a subset of a data object. See the help page for some good background ?"[".

[bit实际上是一个提取操作符,用于检索数据对象的一个子集。请参阅“帮助页”以获得一些良好的背景信息。

#2


1  

The error message arises because you you let i <= vectorCount so i can equal vectorCount, and thus indexing i = i+1 from data will give NA, and the if statement will fail.

错误消息会出现,因为您让i <= vectorCount,这样我就可以获得相同的vectorCount,因此从数据中索引i = i+1将提供NA,而if语句将失败。

If you want to find the outliers based on the IQR, you can use findInterval

如果要根据IQR找到异常值,可以使用findInterval。

outliers <- data[findInterval(data, c(Q1,Q3)) != 1]

I would also stop using paste to create character messages to be printed, use message instead.

我也会停止使用粘贴来创建字符消息来打印,而是使用消息。