导入具有多个数字的（64位）整数时，R中的奇怪错误

I am importing a csv that has a single column which contains very long integers (for example: 2121020101132507598)

我正在导入一个csv，它有一个包含很长整数的列（例如：2121020101132507598）

a<-read.csv('temp.csv',as.is=T)

一个<-read.csv（ 'temp.csv'，as.is = T）

When I import these integers as strings they come through correctly, but when imported as integers the last few digits are changed. I have no idea what is going on...

当我将这些整数作为字符串导入时，它们会正确地通过，但是当作为整数导入时，最后几位数字会被更改。我不知道发生了什么......

1 "4031320121153001444" 4031320121153001472
2 "4113020071082679601" 4113020071082679808
3 "4073020091116779570" 4073020091116779520
4 "2081720101128577687" 2081720101128577792
5 "4041720081087539887" 4041720081087539712
6 "4011120071074301496" 4011120071074301440
7 "4021520051054304372" 4021520051054304256
8 "4082520061068996911" 4082520061068997120
9 "4082620101129165548" 4082620101129165312

1 “4031320121153001444” 4031320121153001472 2 “4113020071082679601” 4113020071082679808 3 “4073020091116779570” 4073020091116779520 4 “2081720101128577687” 2081720101128577792 5 “4041720081087539887” 4041720081087539712 6 “4011120071074301496” 4011120071074301440 7 “4021520051054304372” 4021520051054304256 8 “4082520061068996911” 4082520061068997120 9 “4082620101129165548” 4082620101129165312

4 个解决方案

#1

As others have noted, you can't represent integers that large. But R isn't reading those values into integers, it's reading them into double precision numerics.

正如其他人所说，你不能代表那么大的整数。但是R并没有将这些值读成整数，而是将它们读入双精度数值。

Double precision can only represent numbers to ~16 places accurately, which is why you see your numbers rounded after 16 places. See the gmp, Rmpfr, and int64 packages for potential solutions. Though I don't see a function to read from a file in any of them, maybe you could cook something up by looking at their sources.

双精度只能准确地将数字表示到~16个位置，这就是为什么你看到你的数字在16个位置之后四舍五入。有关可能的解决方案，请参阅gmp，Rmpfr和int64包。虽然我没有看到从任何文件中读取文件的功能，但也许你可以通过查看它们的来源来烹饪。

UPDATE: Here's how you can get your file into an int64 object:

更新：以下是如何将文件转换为int64对象：

# This assumes your numbers are the only column in the file
# Read them in however, just ensure they're read in as character
a <- scan("temp.csv", what="")
ia <- as.int64(a)

#2

R's maximum intger value is about 2E9. As @Joshua mentions in another answer, one of the potential solutions is the int64 package.

R的最大整数值约为2E9。正如@Joshua在另一个答案中提到的，其中一个潜在的解决方案是int64包。

Import the values as character instead. Then convert to type int64.

将值导入为字符。然后转换为int64类型。

require(int64)
a <- read.csv('temp.csv', colClasses = 'character', header=FALSE)[[1]]
a <- as.int64(a)
print(a)
[1] 4031320121153001444 4113020071082679601 4073020091116779570
[4] 2081720101128577687 4041720081087539887 4011120071074301496
[7] 4021520051054304372 4082520061068996911 4082620101129165548

#3

You simply cannot represent integers that big. See

你根本无法表示那么大的整数。看到

.Machine

which on my box has

在我的盒子上有

$integer.max
[1] 2147483647

#4

The maximum value of a 32-bit signed integer is 2,147,483,647. Your numbers are much larger.

32位有符号整数的最大值为2,147,483,647。你的数字要大得多。

Try importing them as floating point values instead.

尝试将它们作为浮点值导入。

There4 are a few caveats to be aware of when dealing with floating point arithmetic in R or any other language:

在处理R或任何其他语言中的浮点运算时，需要注意几个注意事项：

http://blog.revolutionanalytics.com/2009/11/floatingpoint-errors-explained.html

http://blog.revolutionanalytics.com/2009/03/when-is-a-zero-not-a-zero.html

http://floating-point-gui.de/basic/

#1