getURL方法中的RCurl内存泄漏

时间:2021-12-08 23:43:36

It looks like we have hit a bug in RCurl. The method getURL seems to be leaking memory. A simple test case to reproduce the bug is given here:

看起来我们在RCurl中遇到了一个bug。getURL方法似乎正在泄漏内存。这里给出了一个复制错误的简单测试用例:

library(RCurl)
handle<-getCurlHandle()
range<-1:100
for (r in range) {x<-getURL(url="news.google.com.au",curl=handle)}

If I run this code, the memory allocated to the R session is never recovered.

如果我运行这段代码,分配给R会话的内存永远不会恢复。

We are using RCurl for some long running experiments and we are running out of memory on the test system.

我们正在使用RCurl进行一些长时间运行的实验,并且在测试系统上内存不足。

The specs of our test system are as follows:

我们测试系统的规格如下:

OS: Ubuntu 14.04 (64 bit)

操作系统:ubuntu14.04(64位)

Memory: 24 GB

24 GB内存:

RCurl version: 1.95-4.3

RCurl版本:1.95 - -4.3

Any ideas about how to get around this issue?

关于如何解决这个问题,你有什么想法吗?

Thanks

谢谢

2 个解决方案

#1


3  

I'll take a look at this. However, see if getURLContent() also exhibits the problem, i.e. replace getURL() with getURLContent(). The function getURLContent() is a richer version of getURL() and one that gets more attention.

我来看看这个。但是,请查看getURLContent()是否也显示了问题,即用getURLContent()替换getURL()。getURLContent()函数是getURL()的一个更丰富的版本,并且得到了更多的关注。

#2


1  

I just hit this too, and made the following code change to work around it:

我也点击了这个,并对它做了如下代码修改:

LEAK (Old code)

泄漏(旧代码)

h = basicHeaderGatherer()
tmp = tryCatch(getURL(url = url,
                      headerfunction = h$update,
                      useragent = R.version.string,
                      timeout = timeout_secs),
               error = function(x) { .__curlError <<- TRUE; __curlErrorMessage <<- x$message })

NO LEAK (New code)

没有泄漏(新代码)

method <- "GET"
h <- basicHeaderGatherer()
t <- basicTextGatherer()
tmp <- tryCatch(curlPerform(url = url,
                            customrequest = method,
                            writefunction = t$update,
                            headerfunction = h$update,
                            useragent=R.version.string,
                            verbose = FALSE,
                            timeout = timeout_secs),
                error = function(x) { .__curlError <<- TRUE; .__curlErrorMessage <<- x$message })

#1


3  

I'll take a look at this. However, see if getURLContent() also exhibits the problem, i.e. replace getURL() with getURLContent(). The function getURLContent() is a richer version of getURL() and one that gets more attention.

我来看看这个。但是,请查看getURLContent()是否也显示了问题,即用getURLContent()替换getURL()。getURLContent()函数是getURL()的一个更丰富的版本,并且得到了更多的关注。

#2


1  

I just hit this too, and made the following code change to work around it:

我也点击了这个,并对它做了如下代码修改:

LEAK (Old code)

泄漏(旧代码)

h = basicHeaderGatherer()
tmp = tryCatch(getURL(url = url,
                      headerfunction = h$update,
                      useragent = R.version.string,
                      timeout = timeout_secs),
               error = function(x) { .__curlError <<- TRUE; __curlErrorMessage <<- x$message })

NO LEAK (New code)

没有泄漏(新代码)

method <- "GET"
h <- basicHeaderGatherer()
t <- basicTextGatherer()
tmp <- tryCatch(curlPerform(url = url,
                            customrequest = method,
                            writefunction = t$update,
                            headerfunction = h$update,
                            useragent=R.version.string,
                            verbose = FALSE,
                            timeout = timeout_secs),
                error = function(x) { .__curlError <<- TRUE; .__curlErrorMessage <<- x$message })