How do I efficiently find the number of tweets & retweets in a time span using R? (TwitteR package)

时间:2022-12-02 21:24:58

I want to find out the number of tweets, favourites and retweets (cummulative is enough) of the UK General Election candidates of several parties (>2000 candidates) in the 2 months before the election. So far I have tried to make a loop using TwitteR's usertimeline, and then (in the loop, because I don't know how to save it otherwise) saving the number of tweets and retweets and favourites.

我想在选举前的2个月内找出几个政党(> 2000名候选人)的英国大选候选人的推文,收藏和转推(累计足够)的数量。到目前为止,我已经尝试使用TwitteR的usertimeline进行循环,然后(在循环中,因为我不知道如何保存它),节省了推文和转推和收藏的数量。

current is the list with twitter usernames. I'm a programming newby, so please don't hate:

current是包含twitter用户名的列表。我是新手编程,所以请不要讨厌:

tweetsy.2017 <- function(x){
    one = userTimeline(x,  n =3200, includeRts = TRUE,excludeReplies=FALSE)
    onedf = twListToDF(one)
    oneperiod = subset(onedf, created >= as.POSIXct('2017-04-18 00:00:00') & created <= as.POSIXct('2017-06-08 23:59:00')) #61 days
    oneperiod2 = oneperiod[oneperiod$isRetweet == FALSE,]
    ro = nrow(oneperiod)
    f = sum(oneperiod$favoriteCount)
    re = sum(oneperiod$retweetCount)
    output = list(ro, f, re)
    return(output)
#Sys.sleep(100)
}

Tweets.2017 = lapply(current, tweetsy.2017)

My problem is, that this takes very long and gives no intermediate data. Also, it seems inefficient to download all the tweets just to get the number of them. Oh, and I just put the sleep there in case I reach the API Limit, but it seems like my code is too slow to reach it anyway.

我的问题是,这需要很长时间并且不提供中间数据。此外,下载所有推文只是为了获得它们的数量似乎效率低下。哦,我只是把睡眠放在那里以防我达到API限制,但似乎我的代码太慢而无法到达它。

Does anybody have a better Idea? I have tried mclapply and parLapply but haven't managed to get them running..

有没有人有更好的想法?我已经尝试过mclapply和parLapply但是没有设法让它们运行..

1 个解决方案

#1


0  

Wrapped it into a for loop, so I can have intermediate results. Works fine now!

把它包装成for循环,所以我可以得到中间结果。现在工作正常!

for(i in 1:nrow(current)){
    print(paste("Row number ", i , " of ", nrow(twitter_data)))
    id <- twitter_data[i, 1]
    print(as.vector(id))
    ab[[i]] <-  tweetsy.2017(id)
    print("Process sleeps for a few seconds due to twitter API security 
    issues and then it will continue")
    Sys.sleep(9)
}

#1


0  

Wrapped it into a for loop, so I can have intermediate results. Works fine now!

把它包装成for循环,所以我可以得到中间结果。现在工作正常!

for(i in 1:nrow(current)){
    print(paste("Row number ", i , " of ", nrow(twitter_data)))
    id <- twitter_data[i, 1]
    print(as.vector(id))
    ab[[i]] <-  tweetsy.2017(id)
    print("Process sleeps for a few seconds due to twitter API security 
    issues and then it will continue")
    Sys.sleep(9)
}