在mongo db中检索下N条记录

时间:2023-01-03 02:57:47

I need a scheduler job which will execute every 5 mins and process next 100 records from a mongodb collection. It should start from the document which was inserted first. So, in the first run, i can sort the data in ascending order and get the first 100 documents. But for the consecutive runs, how can i retrieve the next 100 records giving the last processed document's object id? (i'm not sure how to use the object id here as it is a generating string with different parameters... i don't have any other id defined.)

我需要一个调度器任务,它将每5分钟执行一次,并处理mongodb集合中的下100条记录。它应该从首先插入的文档开始。因此,在第一次运行中,我可以按升序对数据进行排序,并获得前100个文档。但是对于连续的运行,如何检索下100条记录,并给出最后一个已处理文档的对象id?(我不知道如何使用这里的对象id,因为它是一个具有不同参数的生成字符串……)我没有定义其他的id。

If this is not a good way to retrieve records from mongodb for a large data set, please suggest a better way.

如果这不是从mongodb获取大数据集的记录的好方法,请建议更好的方法。

Each document looks like below:

每个文件如下:

{ "_id" : { "$oid" : "51ff17c8e4b02969f18e72bb"} , "source_of_info" : "somesource" , 
"entityinfo" : [ { "user" : "Alfredo Vela Zancada" , "social_network_entity_id" : 
 364221775325822977 , "text" : "blah blah blah" , "created_at" : { "$date" : "2013-08-
 05T03:10:12.000Z"}}] , "relatedURLs" : [ { "url" : "http://t.co/swqP3FYQt5" 
 ,"expanded_url" : "http://ow.ly/nCkIS"}]}

Thanks.

谢谢。

1 个解决方案

#1


3  

If you keep track of which iteration you're on you could use something like:

如果您跟踪您正在进行的迭代,您可以使用以下内容:

db.users.find().limit(100).skip(1200)

db.users.find().limit(100).skip(1200)

Another solution might be to add a 'processed' flag to each entry. Default it to false. Then do a findAndModify when you get the next 100 where processed is false, and modify them to now be true.

另一种解决方案可能是向每个条目添加“已处理”标志。缺省为false。然后在得到处理为false的下一个100时执行findAndModify,并将它们修改为true。

#1


3  

If you keep track of which iteration you're on you could use something like:

如果您跟踪您正在进行的迭代,您可以使用以下内容:

db.users.find().limit(100).skip(1200)

db.users.find().limit(100).skip(1200)

Another solution might be to add a 'processed' flag to each entry. Default it to false. Then do a findAndModify when you get the next 100 where processed is false, and modify them to now be true.

另一种解决方案可能是向每个条目添加“已处理”标志。缺省为false。然后在得到处理为false的下一个100时执行findAndModify,并将它们修改为true。