In (object, Cl):获取行R时出错

时间:2022-09-24 08:16:36

I have a MySQL table I am attempting to access with R using RMySQL.

我有一个MySQL表,我正在尝试使用RMySQL访问R。

There are 1690004 rows that should be returned from

应该返回1690004行

dbGetQuery(con, "SELECT * FROM tablename WHERE export_date ='2015-01-29'")

Unfortunately, I receive the following warning messages:

不幸的是,我收到以下警告信息:

In is(object, Cl) : error while fetching row
In dbGetQuery(con, "SELECT * FROM tablename WHERE export_date ='2015-01-29'",  : pending rows

And only receive ~400K rows.

并且只接收~400K行。

If I break the query into several "fetches" using dbSendQuery, the warning messages start appearing after ~400K rows are received.

如果我使用dbSendQuery将查询分割成几个“fetches”,那么在收到~400K行之后,会开始出现警告消息。

Any help would be appreciated.

如有任何帮助,我们将不胜感激。

1 个解决方案

#1


3  

So, it looks like it was due to a 60 second timeout imposed by my hosting provider (damn Arvixe!). I got around this by "paging/chunking" the output. Because my data has an auto-incrementing primary key, every row returned is in order, allowing me to take the next X rows after each iteration.

所以,它看起来是由我的主机提供商(该死的Arvixe!)强加的60秒超时造成的。我通过“分页/分块”输出来绕过这个问题。因为我的数据有一个自动递增的主键,所以返回的每一行都是有序的,允许我在每次迭代后取下X行。

To get 1.6M rows I did the following:

为了得到160万行,我做了如下操作:

library(RMySQL)
con <- MySQLConnect() # mysql connection function
day <- '2015-01-29' # date of interest
numofids <- 50000 # number of rows to include in each 'chunk'
count <- dbGetQuery(con, paste0("SELECT COUNT(*) as count FROM tablename WHERE export_date = '",day,"'"))$count # get the number of rows returned from the table.
dbDisconnect(con)
ns <- seq(1, count, numofids) # get sequence of rows to work over
tosave <- data.frame() # data frame to bind results to
# iterate through table to get data in 50k row chunks
for(nextseries in ns){ # for each row
  print(nextseries) # print the row it's on
  con <- MySQLConnect()
  d1 <- dbGetQuery(con, paste0("SELECT * FROM tablename WHERE export_date = '",day,"' LIMIT ", nextseries,",",numofids)) # extract data in chunks of 50k rows
  dbDisconnect(con)
  # bind data to tosave dataframe. (the ifelse is avoid an error when it tries to rbind d1 to an empty dataframe on the first pass).
  if(nrow(tosave)>0){
      tosave <- rbind(tosave, d1)
  }else{
      tosave <- d1
  }
}

#1


3  

So, it looks like it was due to a 60 second timeout imposed by my hosting provider (damn Arvixe!). I got around this by "paging/chunking" the output. Because my data has an auto-incrementing primary key, every row returned is in order, allowing me to take the next X rows after each iteration.

所以,它看起来是由我的主机提供商(该死的Arvixe!)强加的60秒超时造成的。我通过“分页/分块”输出来绕过这个问题。因为我的数据有一个自动递增的主键,所以返回的每一行都是有序的,允许我在每次迭代后取下X行。

To get 1.6M rows I did the following:

为了得到160万行,我做了如下操作:

library(RMySQL)
con <- MySQLConnect() # mysql connection function
day <- '2015-01-29' # date of interest
numofids <- 50000 # number of rows to include in each 'chunk'
count <- dbGetQuery(con, paste0("SELECT COUNT(*) as count FROM tablename WHERE export_date = '",day,"'"))$count # get the number of rows returned from the table.
dbDisconnect(con)
ns <- seq(1, count, numofids) # get sequence of rows to work over
tosave <- data.frame() # data frame to bind results to
# iterate through table to get data in 50k row chunks
for(nextseries in ns){ # for each row
  print(nextseries) # print the row it's on
  con <- MySQLConnect()
  d1 <- dbGetQuery(con, paste0("SELECT * FROM tablename WHERE export_date = '",day,"' LIMIT ", nextseries,",",numofids)) # extract data in chunks of 50k rows
  dbDisconnect(con)
  # bind data to tosave dataframe. (the ifelse is avoid an error when it tries to rbind d1 to an empty dataframe on the first pass).
  if(nrow(tosave)>0){
      tosave <- rbind(tosave, d1)
  }else{
      tosave <- d1
  }
}