“在使用R和XLConnect包时,使用内存错误(Java)”。

时间:2022-12-07 12:44:19

I tried to load a ~30MB excel spreadsheet into R using the XLConnect package.

我尝试使用XLConnect包将一个~30MB的excel电子表格加载到R中。

This is what I wrote:

这就是我写的:

wb <- loadWorkbook("largespreadsheet.xlsx")

And after about 15 seconds, I got the following error:

大约15秒后,我犯了以下错误:

Error: OutOfMemoryError (Java): GC overhead limit exceeded.

错误:OutOfMemoryError (Java):超出了GC开销限制。

Is this a limitation of the XLConnect package or is there a way to tweak my memory settings to allow for larger files?

这是XLConnect包的限制吗?还是有一种方法可以调整我的内存设置以允许更大的文件?

I appreciate any solutions/tips/advice.

我很欣赏任何解决方案/建议/意见。

6 个解决方案

#1


28  

Follow the advice from their website:

根据他们网站的建议:

options(java.parameters = "-Xmx1024m")
library(XLConnect)

#2


27  

If you still have problems with importing XLSX files you can use this opiton. Anwser with "Xmx1024m" didn't work and i changed to "-Xmx4g".

如果您仍然有导入XLSX文件的问题,您可以使用这个opiton。使用“Xmx1024m”的Anwser没有工作,我改用了“-Xmx4g”。

options(java.parameters = "-Xmx4g" )
library(XLConnect)

This link was useful.

这个链接是有用的。

#3


12  

Use read.xlsx() in the openxlsx package. It has no dependency on rJava thus only has the memory limitations of R itself. I have not explored in much depth for writing and formatting XLSX but it has some promising looking vignettes. For reading large spreadsheets, it works well.

在openxlsx包中使用read.xlsx()。它对rJava没有依赖性,因此只有R本身的内存限制。我还没有深入研究过XLSX的写作和格式,但它有一些看起来很有前途的小片断。对于阅读大型电子表格,它运行良好。

Hat tip to @Brad-Horn. I've just turned his comment as an answer because I also found this to be the best solution!

帽子@Brad-Horn提示。我只是把他的评论当做一个答案,因为我也发现这是最好的解决方案!

#4


3  

In case someone encounters this error when reading not one huge but many files, I managed to solve this error by freeing Java Virtual Machine memory with xlcFreeMemory(), thus:

如果有人在读取一个巨大的文件时遇到这个错误,我通过xlcFreeMemory()将Java虚拟机内存释放出来,从而解决了这个错误:

files <- list.files(path, pattern = "*.xlsx")
for (i in seq_along(files)) {
    wb <- loadWorkbook(...)
    ...
    rm(wb)
    xlcFreeMemory()  # <= free Java Virtual Machine memory !
}

#5


2  

This appears to be the case, when u keep using the same R-session over and over again without restarting R-Studio. Restarting R-Studio can help to allocate a fresh memory-heap to the program. It worked for me right away.

这种情况似乎是这样的,当你连续不断地使用相同的r会话时,不要重新启动R-Studio。重新启动R-Studio可以帮助为程序分配一个新的内存堆。这对我很有效。

#6


0  

Whenever you are using a library that relies on rJava (such as RWeka in my case), you are bound to hit the default heap space (512 MB) some day. Now, when you are using Java, we all know the JVM argument to use (-Xmx2048m if you want 2 gigabytes of RAM). Here it's just a matter of how to specify it in the R environnement.

每当您使用依赖于rJava的库(如我的示例中的RWeka)时,总有一天您一定会碰到默认的堆空间(512 MB)。现在,当您使用Java时,我们都知道要使用的JVM参数(如果您想要2g的RAM,则使用-Xmx2048m)。这只是一个如何在R环境中指定它的问题。

   options(java.parameters = "-Xmx2048m")
   library(rJava)

#1


28  

Follow the advice from their website:

根据他们网站的建议:

options(java.parameters = "-Xmx1024m")
library(XLConnect)

#2


27  

If you still have problems with importing XLSX files you can use this opiton. Anwser with "Xmx1024m" didn't work and i changed to "-Xmx4g".

如果您仍然有导入XLSX文件的问题,您可以使用这个opiton。使用“Xmx1024m”的Anwser没有工作,我改用了“-Xmx4g”。

options(java.parameters = "-Xmx4g" )
library(XLConnect)

This link was useful.

这个链接是有用的。

#3


12  

Use read.xlsx() in the openxlsx package. It has no dependency on rJava thus only has the memory limitations of R itself. I have not explored in much depth for writing and formatting XLSX but it has some promising looking vignettes. For reading large spreadsheets, it works well.

在openxlsx包中使用read.xlsx()。它对rJava没有依赖性,因此只有R本身的内存限制。我还没有深入研究过XLSX的写作和格式,但它有一些看起来很有前途的小片断。对于阅读大型电子表格,它运行良好。

Hat tip to @Brad-Horn. I've just turned his comment as an answer because I also found this to be the best solution!

帽子@Brad-Horn提示。我只是把他的评论当做一个答案,因为我也发现这是最好的解决方案!

#4


3  

In case someone encounters this error when reading not one huge but many files, I managed to solve this error by freeing Java Virtual Machine memory with xlcFreeMemory(), thus:

如果有人在读取一个巨大的文件时遇到这个错误,我通过xlcFreeMemory()将Java虚拟机内存释放出来,从而解决了这个错误:

files <- list.files(path, pattern = "*.xlsx")
for (i in seq_along(files)) {
    wb <- loadWorkbook(...)
    ...
    rm(wb)
    xlcFreeMemory()  # <= free Java Virtual Machine memory !
}

#5


2  

This appears to be the case, when u keep using the same R-session over and over again without restarting R-Studio. Restarting R-Studio can help to allocate a fresh memory-heap to the program. It worked for me right away.

这种情况似乎是这样的,当你连续不断地使用相同的r会话时,不要重新启动R-Studio。重新启动R-Studio可以帮助为程序分配一个新的内存堆。这对我很有效。

#6


0  

Whenever you are using a library that relies on rJava (such as RWeka in my case), you are bound to hit the default heap space (512 MB) some day. Now, when you are using Java, we all know the JVM argument to use (-Xmx2048m if you want 2 gigabytes of RAM). Here it's just a matter of how to specify it in the R environnement.

每当您使用依赖于rJava的库(如我的示例中的RWeka)时,总有一天您一定会碰到默认的堆空间(512 MB)。现在,当您使用Java时,我们都知道要使用的JVM参数(如果您想要2g的RAM,则使用-Xmx2048m)。这只是一个如何在R环境中指定它的问题。

   options(java.parameters = "-Xmx2048m")
   library(rJava)