多个大型MySQL选择查询——最好是并行运行还是在队列中运行?

时间:2023-02-08 23:47:19

I have looked up answers to this question a bunch and couldn't find a specific answer - sorry in advance if I missed something! Also, I'm a SQL optimization noob.

我已经查阅了很多关于这个问题的答案,但是没有找到一个明确的答案——如果我漏掉了什么,请提前说声抱歉!此外,我还是一个SQL优化noob。

I have an analytics dashboard which pulls data based on users' requests from a large database.

我有一个分析仪表板,根据用户的请求从大型数据库中提取数据。

Each page the user loads runs a number of different queries to populate different parts of the page (different charts, tables, etc). Some of these pages can take quite some time to load as the user might request several years of data.

用户加载的每一页都有许多不同的查询来填充页面的不同部分(不同的图表、表格等等)。由于用户可能会请求数年的数据,其中一些页面可能需要相当长的时间才能加载。

Currently, each part of the page pings off one SELECT query to the SQL server but as there are several parts of the page, those queries end up running in parallel.

目前,页面的每个部分都会向SQL服务器发出一个SELECT查询,但由于页面有几个部分,这些查询最终会并行运行。

Would it be faster to run these queries in a queue - to allow the server to process one query at a time? Or to keep everything in parallel, as is?

在队列中运行这些查询会不会更快——允许服务器一次处理一个查询?还是让一切保持平行?

The added benefit of running them one at a time is that we could run the queries to fill in the "above-the-fold" part of the page first...

每次只运行一个查询的额外好处是,我们可以运行查询,首先填充页面的“超出常规”部分……

Hope that all makes sense and take it easy on me please :)

希望大家能理解,请放轻松。

2 个解决方案

#1


1  

I also say "it depends", but I lean toward parallelism.

我也说“看情况”,但我倾向于并行。

  • Probably should not have more parallelism than the number of CPU cores.
  • 可能不应该有比CPU内核数量更多的并行性。
  • I rarely see a system that chews up all the CPU cores -- unless it does not have good enough indexes. That is, fix the indexes before asking the question.
  • 我很少看到一个系统能够啃掉所有的CPU内核——除非它没有足够好的索引。也就是说,在问问题之前先修复索引。
  • If the data is bigger than can be cached, it may be faster to queue, since you may have a choke point -- I/O.
  • 如果数据大于可以缓存的数据,那么排队可能会更快,因为您可能有一个阻塞点——I/O。
  • If the table(s) are continually being changed, turn off the Query Cache.
  • 如果表一直在更改,请关闭查询缓存。
  • Is your goal to get some results on the page early (a likely Human Interface goal), add a small delay in all but one AJAX callee (not caller).
  • 您的目标是尽早在页面上获得一些结果(一个可能的人工接口目标),除了一个AJAX callee(不是调用者)之外,在所有的页面上添加一个小的延迟。
  • If multiple pages could be computing at the same time, things get more complex. For example, you can't really control the parallelism.
  • 如果多个页面可以同时进行计算,事情就会变得更加复杂。例如,你不能真正控制并行度。

Let's see the queries. Perhaps we can speed them up enough to obviate the question.

让我们看看查询。也许我们可以加快他们的速度以避免这个问题。

#2


1  

This is too long for a comment.

这对评论来说太长了。

There is no right answer to this question. Up to a point, running parallel SELECT queries is (generally) going to be faster than one running query. Whether that point is 2 queries or 200 depends on the nature of the queries, the hardware configuration, the data, and the speeds of various components.

这个问题没有正确的答案。在某种程度上,运行并行选择查询(通常)要比运行查询快。这个点是2个查询还是200个查询取决于查询的性质、硬件配置、数据和各种组件的速度。

The situation becomes even more complex when you consider how many different users may be involved and whether or not the data is being updated. You can get into really bad situations with parallel queries and updates if the locks start cascading. Of course, this can happen with multiple simultaneous users as well.

当您考虑可能涉及到多少不同的用户,以及数据是否正在更新时,情况变得更加复杂。如果锁开始级联,您可以使用并行查询和更新进入非常糟糕的情况。当然,这也可能发生在多个同时使用的用户身上。

My guess is that you want a throttling mechanism that will run, say, n queries at a time and put the rest into a queue.

我的猜测是,您需要一个节流机制,它会一次运行,比如n个查询,然后将其余的放入队列中。

#1


1  

I also say "it depends", but I lean toward parallelism.

我也说“看情况”,但我倾向于并行。

  • Probably should not have more parallelism than the number of CPU cores.
  • 可能不应该有比CPU内核数量更多的并行性。
  • I rarely see a system that chews up all the CPU cores -- unless it does not have good enough indexes. That is, fix the indexes before asking the question.
  • 我很少看到一个系统能够啃掉所有的CPU内核——除非它没有足够好的索引。也就是说,在问问题之前先修复索引。
  • If the data is bigger than can be cached, it may be faster to queue, since you may have a choke point -- I/O.
  • 如果数据大于可以缓存的数据,那么排队可能会更快,因为您可能有一个阻塞点——I/O。
  • If the table(s) are continually being changed, turn off the Query Cache.
  • 如果表一直在更改,请关闭查询缓存。
  • Is your goal to get some results on the page early (a likely Human Interface goal), add a small delay in all but one AJAX callee (not caller).
  • 您的目标是尽早在页面上获得一些结果(一个可能的人工接口目标),除了一个AJAX callee(不是调用者)之外,在所有的页面上添加一个小的延迟。
  • If multiple pages could be computing at the same time, things get more complex. For example, you can't really control the parallelism.
  • 如果多个页面可以同时进行计算,事情就会变得更加复杂。例如,你不能真正控制并行度。

Let's see the queries. Perhaps we can speed them up enough to obviate the question.

让我们看看查询。也许我们可以加快他们的速度以避免这个问题。

#2


1  

This is too long for a comment.

这对评论来说太长了。

There is no right answer to this question. Up to a point, running parallel SELECT queries is (generally) going to be faster than one running query. Whether that point is 2 queries or 200 depends on the nature of the queries, the hardware configuration, the data, and the speeds of various components.

这个问题没有正确的答案。在某种程度上,运行并行选择查询(通常)要比运行查询快。这个点是2个查询还是200个查询取决于查询的性质、硬件配置、数据和各种组件的速度。

The situation becomes even more complex when you consider how many different users may be involved and whether or not the data is being updated. You can get into really bad situations with parallel queries and updates if the locks start cascading. Of course, this can happen with multiple simultaneous users as well.

当您考虑可能涉及到多少不同的用户,以及数据是否正在更新时,情况变得更加复杂。如果锁开始级联,您可以使用并行查询和更新进入非常糟糕的情况。当然,这也可能发生在多个同时使用的用户身上。

My guess is that you want a throttling mechanism that will run, say, n queries at a time and put the rest into a queue.

我的猜测是,您需要一个节流机制,它会一次运行,比如n个查询,然后将其余的放入队列中。