处理大型SQL选择查询/读取块中的SQL数据

时间:2022-06-01 20:46:04

I'm using .Net 4.0 and SQL server 2008 R2.

我正在使用.Net 4.0和SQL Server 2008 R2。

I'm running a big SQL select query which returns millions of results and takes up a long time to fully run.

我正在运行一个大的SQL选择查询,它会返回数百万个结果,并且需要很长时间才能完全运行。

Does anyone know how can I read only some of the results returned by the query without having to wait for the whole query to complete?

有谁知道如何只读取查询返回的一些结果而不必等待整个查询完成?

In other words, I want to read the first by 10,000 records chunks while the query still runs and getting the next results.

换句话说,我想在查询仍然运行并获得下一个结果时读取第一个10,000个记录块。

3 个解决方案

#1


12  

It depends in part on whether the query itself is streaming, or whether it does lots of work in temporary tables then (finally) starts returning data. You can't do much in the second scenario except re-write the query; however, in the first case an iterator block would usually help, i.e.

它部分取决于查询本身是否是流式传输,或者它是否在临时表中进行了大量工作,然后(最终)开始返回数据。除了重写查询外,在第二种情况下你不能做太多事情;但是,在第一种情况下,迭代器块通常会有所帮助,即

public IEnumerable<Foo> GetData() {
     // not shown; building command etc
     using(var reader = cmd.ExecuteReader()) {
         while(reader.Read()) {
             Foo foo = // not shown; materialize Foo from reader
             yield return foo;
         }
     }
}

This is now a streaming iterator - you can foreach over it and it will retrieve records live from the incoming TDS data without buffering all the data first.

现在这是一个流式迭代器 - 你可以对它进行预测,它将从传入的TDS数据中实时检索记录,而不首先缓冲所有数据。

If you (perhaps wisely) don't want to write your own materialization code, there are tools that will do this for you - for example, LINQ-to-SQL's ExecuteQuery<T>(tsql, args) will do the above pain-free.

如果您(可能明智地)不想编写自己的实现代码,有一些工具可以为您执行此操作 - 例如,LINQ-to-SQL的ExecuteQuery (tsql,args)将执行上述操作 - *。

#2


2  

You'd need to use data paging.

您需要使用数据分页。

SQL Server has the TOP clause (SQL TOP 10 a,b,c from d) and BETWEEN:

SQL Server有TOP子句(来自d的SQL TOP 10 a,b,c)和BETWEEN:

SELECT TOP 10000 a,b,c from d BETWEEN X and Y

Having this, I guess you'd be able of retrieving an N number of rows, do some partial processing, then load next N number of rows and so on.

有了这个,我想你能够检索N行,做一些部分处理,然后加载下N行,依此类推。

This can be achieved by implementing a multithreaded solution: one will be retrieving results while the other will asynchronously wait for data and it'll be doing some processing.

这可以通过实现多线程解决方案来实现:一个将检索结果,而另一个将异步等待数据,它将进行一些处理。

#3


0  

if you really have to process millions of records Why dont you load 10,000 each round process them and then load the next 10,000? if not consider using the DBMS to filter the data before loading it as the performance on the database is much better than in you logic leyer.

如果你真的必须处理数百万条记录为什么你不加载每轮10000处理它们然后加载下一万个?如果不考虑使用DBMS在加载数据之前过滤数据,因为数据库上的性能比逻辑leyer好得多。

Or follow a lazy load concept and load only Ids to which you load the actual data only when you need it.

或者遵循延迟加载概念并仅加载仅在需要时加载实际数据的ID。

#1


12  

It depends in part on whether the query itself is streaming, or whether it does lots of work in temporary tables then (finally) starts returning data. You can't do much in the second scenario except re-write the query; however, in the first case an iterator block would usually help, i.e.

它部分取决于查询本身是否是流式传输,或者它是否在临时表中进行了大量工作,然后(最终)开始返回数据。除了重写查询外,在第二种情况下你不能做太多事情;但是,在第一种情况下,迭代器块通常会有所帮助,即

public IEnumerable<Foo> GetData() {
     // not shown; building command etc
     using(var reader = cmd.ExecuteReader()) {
         while(reader.Read()) {
             Foo foo = // not shown; materialize Foo from reader
             yield return foo;
         }
     }
}

This is now a streaming iterator - you can foreach over it and it will retrieve records live from the incoming TDS data without buffering all the data first.

现在这是一个流式迭代器 - 你可以对它进行预测,它将从传入的TDS数据中实时检索记录,而不首先缓冲所有数据。

If you (perhaps wisely) don't want to write your own materialization code, there are tools that will do this for you - for example, LINQ-to-SQL's ExecuteQuery<T>(tsql, args) will do the above pain-free.

如果您(可能明智地)不想编写自己的实现代码,有一些工具可以为您执行此操作 - 例如,LINQ-to-SQL的ExecuteQuery (tsql,args)将执行上述操作 - *。

#2


2  

You'd need to use data paging.

您需要使用数据分页。

SQL Server has the TOP clause (SQL TOP 10 a,b,c from d) and BETWEEN:

SQL Server有TOP子句(来自d的SQL TOP 10 a,b,c)和BETWEEN:

SELECT TOP 10000 a,b,c from d BETWEEN X and Y

Having this, I guess you'd be able of retrieving an N number of rows, do some partial processing, then load next N number of rows and so on.

有了这个,我想你能够检索N行,做一些部分处理,然后加载下N行,依此类推。

This can be achieved by implementing a multithreaded solution: one will be retrieving results while the other will asynchronously wait for data and it'll be doing some processing.

这可以通过实现多线程解决方案来实现:一个将检索结果,而另一个将异步等待数据,它将进行一些处理。

#3


0  

if you really have to process millions of records Why dont you load 10,000 each round process them and then load the next 10,000? if not consider using the DBMS to filter the data before loading it as the performance on the database is much better than in you logic leyer.

如果你真的必须处理数百万条记录为什么你不加载每轮10000处理它们然后加载下一万个?如果不考虑使用DBMS在加载数据之前过滤数据,因为数据库上的性能比逻辑leyer好得多。

Or follow a lazy load concept and load only Ids to which you load the actual data only when you need it.

或者遵循延迟加载概念并仅加载仅在需要时加载实际数据的ID。