使用数据库游标有什么好处?

时间:2022-12-25 22:56:00

It is based on the interview question that I faced.

这是基于我所面对的面试问题。

Very short definition can be

很短的定义可以是

It can be used to manipulate the rows returned by a query.

它可以用来操作查询返回的行。

Besides the use of the cursor (Points are listed here on MSDN), I have a question in my mind that if we can perform all the operations using query or stored procedure (if I'm not wrong, Like we can use Transact-SQL for ms-sql), is there any concrete point that we should use cursor?

除了使用光标(点在MSDN列出),我有一个问题在我的脑海里,如果我们可以使用查询或存储过程执行的所有操作(如果我没猜错,就像我们可以利用ms sql transact - sql),有没有具体点,我们应该使用光标?

5 个解决方案

#1


40  

Using cursors compared to big resultsets is like using video streaming instead of downloading an video in one swoop, and watching it when it has downloaded. If you download, you have to have a few gigs of space and the patience to wait until the download finished. Now, no matter how fast your machine or network may be, everyone watches a movie at the same speed.

与大结果集相比,使用游标就像使用视频流,而不是一下子下载一段视频,然后在下载后再观看。如果你下载,你必须有一些空间和耐心等待下载完成。现在,不管你的机器或网络有多快,每个人看电影的速度都是一样的。

Normally any query gets sent to the server, executed, and the resultset sent over the network to you, in one burst of activity. The cursor will give you access to the data row by row and stream every row only when you request it (can actually view it).

通常,任何查询都会被发送到服务器,并在一次活动中执行,并通过网络发送给您的resultset。游标将允许您逐行访问数据行,并且只有在您请求时才对每一行进行流操作(实际上可以查看它)。

  • A cursor can save you time - because you don't need to wait for the processing and download of your complete recordset
  • 游标可以节省您的时间——因为您不需要等待您的完整记录集的处理和下载
  • It will save you memory, both on the server and on the client because they don't have to dedicate a big chunk of memory to resultsets
  • 它将节省您的内存,无论是在服务器上还是在客户机上,因为它们不需要将大量内存用于resultsets
  • Load-balance both your network and your server - Working in "burst" mode is usually more efficient, but it can completely block your server and your network. Such delays are seldom desirable for multiuser environments. Streaming leaves room for other operations.
  • 负载平衡您的网络和服务器-在“突发”模式下工作通常更有效,但它可以完全阻塞您的服务器和网络。这样的延迟对于多用户环境来说是不常见的。流传输为其他操作留下了空间。
  • Allows operations on queried tables (under certain conditions) that do not affect your cursor directly. So while you are holding a cursor on a row, other processes are able to read, update and even delete other rows. This helps especially with very busy tables, many concurrent reads and writes.
  • 允许对查询表(在某些条件下)进行不直接影响游标的操作。因此,当您将光标放在一行上时,其他进程可以读取、更新甚至删除其他行。这在非常繁忙的表中尤其有用,许多并发读取和写入。

Which brings us to some caveats, however:

然而,这也给我们带来了一些警告:

  • Consistency: Using a cursor, you do (usually) not operate on a consistent snapshot of the data, but on a row. So your concurrency/consistency/isolation guarantees drop from the whole database (ACID) to only one row. You can usually inform your DBMS what level of concurrency you want, but if you are too nitpicky (locking the complete table you are in), you will throw away many of the resource savings on the server side.

    一致性:使用游标时(通常)不会对数据的一致快照进行操作,而是对一行进行操作。因此,您的并发/一致性/隔离保证从整个数据库(ACID)下降到只有一行。您通常可以告诉您的DBMS您想要什么级别的并发性,但是如果您太挑剔(锁定您所在的完整表),您将在服务器端浪费许多资源节省。

  • Transmitting every row by itself can be very inefficient, since every packet has negotiation overhead that you might avoid by sending big, maybe compressed, chunks of data per packet. ( No DB server or client library is stupid enough to transmit every row individually, there's caching and chunking on both ends, still, it is relevant.)

    单独传输每一行可能非常低效,因为每个包都有协商开销,您可以通过向每个包发送大的(可能是压缩的)数据块来避免这种开销。(没有哪个DB服务器或客户端库愚蠢到可以单独传输每一行,两端都有缓存和分块,但这仍然是相关的。)

  • Cursors are harder to do right. Consider a query with a big resultset, motivating you to use a cursor, that uses a GROUP BY clause with aggregate functions. (Such queries are common in data warehouses). The GROUP BY can completely trash your server, because it has to generate and store the whole resultset at once, maybe even holding locks on other tables.

    游标更难做对。考虑一个带有大resultset的查询,这将促使您使用游标,它使用带有聚合函数的GROUP BY子句。(此类查询在数据仓库中很常见)。GROUP BY可以完全销毁服务器,因为它必须同时生成和存储整个resultset,甚至可能在其他表上保存锁。

Rule of thumb:

经验法则:

  • If you work on small, quickly created resultsets, don't use cursors.
  • 如果您使用的是小型、快速创建的resultset,那么不要使用游标。
  • Cursors excell on ad hoc, complex (referentially), queries of sequential nature with big resultsets and low consistency requirements.
  • 游标擅长于特定的、复杂的(引用的)、具有大结果集和低一致性需求的顺序性查询。

"Sequential nature" means there are no aggregate functions in heavy GROUP BY clauses in your query. The server can lazily decide to compute 10 rows for your cursor to consume from a cache and do other stuff meanwhile.

“顺序性”意味着在查询中没有按子句分组的聚合函数。服务器可以延迟地决定为您的游标计算10行,以便从缓存中消费,同时做其他的事情。

HTH

HTH

#2


27  

A cursor is a tool that allows you to iterate the records in a set. It has concepts of order and current record.

游标是一种工具,允许您在一个集合中迭代记录。它具有顺序和当前记录的概念。

Generally, SQL operates with multisets: these are sets of possibly repeating records in no given order, taken as a whole.

通常,SQL使用多集操作:这些可能重复的记录集合没有给定的顺序,作为一个整体。

Say, this query:

说,这个查询:

SELECT  *
FROM    a
JOIN    b
ON      b.a = a.id

, operates on multisets a and b.

,作用于多集a和b。

Nothing in this query makes any assumptions about the order of the records, how they are stored, in which order they should be accessed, etc.

这个查询中没有任何关于记录顺序的假设,它们是如何存储的,以及它们应该被访问的顺序等等。

This allows to abstract away implementation details and let the system try to choose the best possible algorithm to run this query.

这允许抽象实现细节,并让系统尝试选择可能的最佳算法来运行此查询。

However, after you have transformed all your data, ultimately you will need to access the records in an ordered way and one by one.

然而,在您转换了所有数据之后,最终您将需要以有序的方式逐一访问记录。

You don't care about how exactly the entries of a phonebook are stored on a hard drive, but a printer does require them to be feed in alphabetical order; and the formatting tags should be applied to each record individually.

你不关心电话簿的条目是如何存储在硬盘上的,但是打印机确实要求它们按字母顺序输入;格式标签应该单独应用于每个记录。

That's exactly where the cursors come into play. Each time you are processing a resultset on the client side, you are using a cursor. You don't get megabytes of unsorted data from the server: you just get a tiny variable: a resultset descriptor, and just write something like this:

这正是游标发挥作用的地方。每次在客户端处理resultset时,都使用游标。您不会从服务器获得兆字节的未排序数据:您只需要得到一个很小的变量:resultset描述符,然后编写如下内容:

while (!rs.EOF) {
   process(rs);
   rs.moveNext();
}

That's cursor that implements all this for you.

这是游标,它为您实现了所有这些。

This of course concerns database-client interaction.

这当然与数据库-客户机交互有关。

As for the database itself: inside the database, you rarely need the cursors, since, as I have told above, almost all data transformations can be implemented using set operations more efficiently.

至于数据库本身:在数据库中,您很少需要游标,因为正如我上面所讲的,几乎所有的数据转换都可以通过使用set操作更有效地实现。

However, there are exceptions:

不过,也有例外:

  • Analytic operations in SQL Server are implemented very poorly. A cumulative sum, for instance, could be calculated much more efficiently with a cursor than using the set-based operations
  • SQL Server中的分析操作实现得非常糟糕。例如,与使用基于集合的操作相比,使用游标可以更有效地计算累积和
  • Processing data in chunks. There are cases when a set based operation should be sequentially applied to a portion of a set and the results of each chunk should be committed independently. While it's still possible to do it using set-based operations, a cursor is often a more preferred way to do this.
  • 处理的数据块。在某些情况下,基于集合的操作应该顺序地应用到集合的一部分,并且每个块的结果应该独立地提交。虽然仍然可以使用基于集合的操作来实现这一点,但是游标通常是一种更可取的方式。
  • Recursion in the systems that do not support it natively.
  • 不支持递归的系统中的递归。

You also may find this article worth reading:

你也会发现这篇文章值得一读:

#3


3  

Using a cursor it is possible to read sequentially through a set of data, programmatically, so it behaves in a similar manner to conventional file access, rather than the set-based behaviour characteristic of SQL.

使用游标可以通过编程方式对一组数据进行顺序读取,因此它的行为方式与常规文件访问类似,而不是基于集合的SQL行为特征。

There are a couple of situations where this may be of use:

有几个情况可能会有用:

  1. Where it is necessary to simulate file-based record access behaviour - for example, where a relational database is being used as the data storage mechanism for a piece of code that was previously written to use indexed files for data storage.

    在需要模拟基于文件的记录访问行为的地方——例如,一个关系数据库正在被用作一段代码的数据存储机制,而这段代码之前被编写为使用索引文件进行数据存储。

  2. Where it is necessary to process data sequentially - a simple example might be to calculate a running total balance for a specific customer. (A number of relational databases, such as Oracle and SQLServer, now have analytical extensions to SQL that should greatly reduce the need for this.)

    在需要处理数据顺序的地方——一个简单的例子可能是计算一个特定客户的运行总余额。(许多关系数据库,如Oracle和SQLServer,现在都有SQL的分析扩展,这将大大减少对SQL的需求。)

Inevitably, wikipedia has more: http://en.wikipedia.org/wiki/Database_cursor

不可避免的是,*有更多:http://en.wikipedia.org/wiki/Database_cursor

#4


1  

With cursor you access one row at a time. So it is good to use it when you want manipulate with a lot of rows but with only one at a given time.

使用游标一次访问一行。所以当你想要对很多行进行操作时最好使用它,但是在给定的时间只有一个行。

I was told at my classes, the reason to use cursor is you want to access more rows than you can fit your memory - so you can't just get all rows into a collection and then loop through it.

在我的课上有人告诉我,使用游标的原因是您希望访问超出内存容量的行——因此您不能将所有行都放入一个集合中,然后对其进行循环。

#5


1  

Sometimes a set based logic can get quite complex and opaque. In these cases and if the performance is not an issue a server side cursor can be used to replace the relational logic with a more manageable and familiar (to a non relational thinker) procedural logic resulting in easier maintenance.

有时基于集合的逻辑会变得非常复杂和不透明。在这些情况下,如果性能不是一个问题,服务器端游标就可以用一个更易于管理和熟悉的(非关系的)过程逻辑来替换关系逻辑,从而简化维护。

#1


40  

Using cursors compared to big resultsets is like using video streaming instead of downloading an video in one swoop, and watching it when it has downloaded. If you download, you have to have a few gigs of space and the patience to wait until the download finished. Now, no matter how fast your machine or network may be, everyone watches a movie at the same speed.

与大结果集相比,使用游标就像使用视频流,而不是一下子下载一段视频,然后在下载后再观看。如果你下载,你必须有一些空间和耐心等待下载完成。现在,不管你的机器或网络有多快,每个人看电影的速度都是一样的。

Normally any query gets sent to the server, executed, and the resultset sent over the network to you, in one burst of activity. The cursor will give you access to the data row by row and stream every row only when you request it (can actually view it).

通常,任何查询都会被发送到服务器,并在一次活动中执行,并通过网络发送给您的resultset。游标将允许您逐行访问数据行,并且只有在您请求时才对每一行进行流操作(实际上可以查看它)。

  • A cursor can save you time - because you don't need to wait for the processing and download of your complete recordset
  • 游标可以节省您的时间——因为您不需要等待您的完整记录集的处理和下载
  • It will save you memory, both on the server and on the client because they don't have to dedicate a big chunk of memory to resultsets
  • 它将节省您的内存,无论是在服务器上还是在客户机上,因为它们不需要将大量内存用于resultsets
  • Load-balance both your network and your server - Working in "burst" mode is usually more efficient, but it can completely block your server and your network. Such delays are seldom desirable for multiuser environments. Streaming leaves room for other operations.
  • 负载平衡您的网络和服务器-在“突发”模式下工作通常更有效,但它可以完全阻塞您的服务器和网络。这样的延迟对于多用户环境来说是不常见的。流传输为其他操作留下了空间。
  • Allows operations on queried tables (under certain conditions) that do not affect your cursor directly. So while you are holding a cursor on a row, other processes are able to read, update and even delete other rows. This helps especially with very busy tables, many concurrent reads and writes.
  • 允许对查询表(在某些条件下)进行不直接影响游标的操作。因此,当您将光标放在一行上时,其他进程可以读取、更新甚至删除其他行。这在非常繁忙的表中尤其有用,许多并发读取和写入。

Which brings us to some caveats, however:

然而,这也给我们带来了一些警告:

  • Consistency: Using a cursor, you do (usually) not operate on a consistent snapshot of the data, but on a row. So your concurrency/consistency/isolation guarantees drop from the whole database (ACID) to only one row. You can usually inform your DBMS what level of concurrency you want, but if you are too nitpicky (locking the complete table you are in), you will throw away many of the resource savings on the server side.

    一致性:使用游标时(通常)不会对数据的一致快照进行操作,而是对一行进行操作。因此,您的并发/一致性/隔离保证从整个数据库(ACID)下降到只有一行。您通常可以告诉您的DBMS您想要什么级别的并发性,但是如果您太挑剔(锁定您所在的完整表),您将在服务器端浪费许多资源节省。

  • Transmitting every row by itself can be very inefficient, since every packet has negotiation overhead that you might avoid by sending big, maybe compressed, chunks of data per packet. ( No DB server or client library is stupid enough to transmit every row individually, there's caching and chunking on both ends, still, it is relevant.)

    单独传输每一行可能非常低效,因为每个包都有协商开销,您可以通过向每个包发送大的(可能是压缩的)数据块来避免这种开销。(没有哪个DB服务器或客户端库愚蠢到可以单独传输每一行,两端都有缓存和分块,但这仍然是相关的。)

  • Cursors are harder to do right. Consider a query with a big resultset, motivating you to use a cursor, that uses a GROUP BY clause with aggregate functions. (Such queries are common in data warehouses). The GROUP BY can completely trash your server, because it has to generate and store the whole resultset at once, maybe even holding locks on other tables.

    游标更难做对。考虑一个带有大resultset的查询,这将促使您使用游标,它使用带有聚合函数的GROUP BY子句。(此类查询在数据仓库中很常见)。GROUP BY可以完全销毁服务器,因为它必须同时生成和存储整个resultset,甚至可能在其他表上保存锁。

Rule of thumb:

经验法则:

  • If you work on small, quickly created resultsets, don't use cursors.
  • 如果您使用的是小型、快速创建的resultset,那么不要使用游标。
  • Cursors excell on ad hoc, complex (referentially), queries of sequential nature with big resultsets and low consistency requirements.
  • 游标擅长于特定的、复杂的(引用的)、具有大结果集和低一致性需求的顺序性查询。

"Sequential nature" means there are no aggregate functions in heavy GROUP BY clauses in your query. The server can lazily decide to compute 10 rows for your cursor to consume from a cache and do other stuff meanwhile.

“顺序性”意味着在查询中没有按子句分组的聚合函数。服务器可以延迟地决定为您的游标计算10行,以便从缓存中消费,同时做其他的事情。

HTH

HTH

#2


27  

A cursor is a tool that allows you to iterate the records in a set. It has concepts of order and current record.

游标是一种工具,允许您在一个集合中迭代记录。它具有顺序和当前记录的概念。

Generally, SQL operates with multisets: these are sets of possibly repeating records in no given order, taken as a whole.

通常,SQL使用多集操作:这些可能重复的记录集合没有给定的顺序,作为一个整体。

Say, this query:

说,这个查询:

SELECT  *
FROM    a
JOIN    b
ON      b.a = a.id

, operates on multisets a and b.

,作用于多集a和b。

Nothing in this query makes any assumptions about the order of the records, how they are stored, in which order they should be accessed, etc.

这个查询中没有任何关于记录顺序的假设,它们是如何存储的,以及它们应该被访问的顺序等等。

This allows to abstract away implementation details and let the system try to choose the best possible algorithm to run this query.

这允许抽象实现细节,并让系统尝试选择可能的最佳算法来运行此查询。

However, after you have transformed all your data, ultimately you will need to access the records in an ordered way and one by one.

然而,在您转换了所有数据之后,最终您将需要以有序的方式逐一访问记录。

You don't care about how exactly the entries of a phonebook are stored on a hard drive, but a printer does require them to be feed in alphabetical order; and the formatting tags should be applied to each record individually.

你不关心电话簿的条目是如何存储在硬盘上的,但是打印机确实要求它们按字母顺序输入;格式标签应该单独应用于每个记录。

That's exactly where the cursors come into play. Each time you are processing a resultset on the client side, you are using a cursor. You don't get megabytes of unsorted data from the server: you just get a tiny variable: a resultset descriptor, and just write something like this:

这正是游标发挥作用的地方。每次在客户端处理resultset时,都使用游标。您不会从服务器获得兆字节的未排序数据:您只需要得到一个很小的变量:resultset描述符,然后编写如下内容:

while (!rs.EOF) {
   process(rs);
   rs.moveNext();
}

That's cursor that implements all this for you.

这是游标,它为您实现了所有这些。

This of course concerns database-client interaction.

这当然与数据库-客户机交互有关。

As for the database itself: inside the database, you rarely need the cursors, since, as I have told above, almost all data transformations can be implemented using set operations more efficiently.

至于数据库本身:在数据库中,您很少需要游标,因为正如我上面所讲的,几乎所有的数据转换都可以通过使用set操作更有效地实现。

However, there are exceptions:

不过,也有例外:

  • Analytic operations in SQL Server are implemented very poorly. A cumulative sum, for instance, could be calculated much more efficiently with a cursor than using the set-based operations
  • SQL Server中的分析操作实现得非常糟糕。例如,与使用基于集合的操作相比,使用游标可以更有效地计算累积和
  • Processing data in chunks. There are cases when a set based operation should be sequentially applied to a portion of a set and the results of each chunk should be committed independently. While it's still possible to do it using set-based operations, a cursor is often a more preferred way to do this.
  • 处理的数据块。在某些情况下,基于集合的操作应该顺序地应用到集合的一部分,并且每个块的结果应该独立地提交。虽然仍然可以使用基于集合的操作来实现这一点,但是游标通常是一种更可取的方式。
  • Recursion in the systems that do not support it natively.
  • 不支持递归的系统中的递归。

You also may find this article worth reading:

你也会发现这篇文章值得一读:

#3


3  

Using a cursor it is possible to read sequentially through a set of data, programmatically, so it behaves in a similar manner to conventional file access, rather than the set-based behaviour characteristic of SQL.

使用游标可以通过编程方式对一组数据进行顺序读取,因此它的行为方式与常规文件访问类似,而不是基于集合的SQL行为特征。

There are a couple of situations where this may be of use:

有几个情况可能会有用:

  1. Where it is necessary to simulate file-based record access behaviour - for example, where a relational database is being used as the data storage mechanism for a piece of code that was previously written to use indexed files for data storage.

    在需要模拟基于文件的记录访问行为的地方——例如,一个关系数据库正在被用作一段代码的数据存储机制,而这段代码之前被编写为使用索引文件进行数据存储。

  2. Where it is necessary to process data sequentially - a simple example might be to calculate a running total balance for a specific customer. (A number of relational databases, such as Oracle and SQLServer, now have analytical extensions to SQL that should greatly reduce the need for this.)

    在需要处理数据顺序的地方——一个简单的例子可能是计算一个特定客户的运行总余额。(许多关系数据库,如Oracle和SQLServer,现在都有SQL的分析扩展,这将大大减少对SQL的需求。)

Inevitably, wikipedia has more: http://en.wikipedia.org/wiki/Database_cursor

不可避免的是,*有更多:http://en.wikipedia.org/wiki/Database_cursor

#4


1  

With cursor you access one row at a time. So it is good to use it when you want manipulate with a lot of rows but with only one at a given time.

使用游标一次访问一行。所以当你想要对很多行进行操作时最好使用它,但是在给定的时间只有一个行。

I was told at my classes, the reason to use cursor is you want to access more rows than you can fit your memory - so you can't just get all rows into a collection and then loop through it.

在我的课上有人告诉我,使用游标的原因是您希望访问超出内存容量的行——因此您不能将所有行都放入一个集合中,然后对其进行循环。

#5


1  

Sometimes a set based logic can get quite complex and opaque. In these cases and if the performance is not an issue a server side cursor can be used to replace the relational logic with a more manageable and familiar (to a non relational thinker) procedural logic resulting in easier maintenance.

有时基于集合的逻辑会变得非常复杂和不透明。在这些情况下,如果性能不是一个问题,服务器端游标就可以用一个更易于管理和熟悉的(非关系的)过程逻辑来替换关系逻辑,从而简化维护。