Java ForkJoinPool具有非递归任务,是否可以正常工作?

时间:2021-03-09 21:01:20

I want to submit Runnable tasks into ForkJoinPool via a method:

我想通过一种方法将Runnable任务提交到ForkJoinPool:

forkJoinPool.submit(Runnable task)

Note, I use JDK 7.

注意,我使用的是JDK 7。

Under the hood, they are transformed into ForkJoinTask objects. I know that ForkJoinPool is efficient when a task is split into smaller ones recursively.

在引擎盖下,它们被转换为ForkJoinTask对象。我知道当一个任务以递归方式分成较小的任务时,ForkJoinPool是有效的。

Question:

题:

Does work-stealing still work in the ForkJoinPool if there is no recursion?

如果没有递归,窃取工作仍然可以在ForkJoinPool中工作吗?

Is it worth it in this case?

在这种情况下值得吗?

Update 1: Tasks are small and can be unbalanced. Even for strictly equal tasks, such things like context switching, thread scheduling, parking, pages misses etc. get in the way leading to the imbalance.

更新1:任务很小,可能不平衡。即使对于严格相同的任务,诸如上下文切换,线程调度,停车,页面未命中等事情也会导致导致不平衡。

Update 2: Doug Lea wrote in the Concurrency JSR-166 Interest group, by giving a hint on this:

更新2:Doug Lea在并发JSR-166兴趣小组中写道,给出了一个暗示:

This also greatly improves throughput when all tasks are async and submitted to the pool rather than forked, which becomes a reasonable way to structure actor frameworks, as well as many plain services that you might otherwise use ThreadPoolExecutor for.

当所有任务都是异步并提交到池而不是分叉时,这也极大地提高了吞吐量,这成为构造actor框架的合理方法,以及许多您可能使用ThreadPoolExecutor的普通服务。

I presume, when it comes to reasonably small CPU-bound tasks, ForkJoinPool is the way to go, thanks to this optimization. The main point is that these tasks are already small and needn't a recursive decomposition. Work-stealing works, regardless whether it is a big or small task - tasks can be grabbed by another free worker from the Deque's tail of a busy worker.

我认为,当涉及相当小的CPU绑定任务时,由于这种优化,ForkJoinPool是可行的方法。重点是这些任务已经很小,不需要递归分解。无论是大工作还是小任务,工作窃取工作都可以被来自忙碌工人的Deque尾巴的另一个*工作者抓住。

Update 3: Scalability of ForkJoinPool - benchmarking by Akka team of ping-pong shows great results.

更新3:ForkJoinPool的可扩展性 - Akka乒乓球队的基准测试显示了很好的结果。

Despite this, to apply ForkJoinPool more efficiently requires performance tuning.

尽管如此,要更有效地应用ForkJoinPool还需要进行性能调整。

1 个解决方案

#1


13  

ForkJoinPool source code has a nice section called "Implementation Overview", read up for an ultimate truth. The explanation below is my understanding for JDK 8u40.

ForkJoinPool源代码有一个很好的部分叫做“实现概述”,阅读最终的真相。下面的解释是我对JDK 8u40的理解。

Since day one, ForkJoinPool had a work queue per worker thread (let's call them "worker queues"). The forked tasks are pushed into the local worker queue, ready to be popped by the worker again and be executed -- in other words, it looks like a stack from worker thread perspective. When a worker depletes its worker queue, it goes around and tries to steal the tasks from other worker queues. That is "work stealing".

从第一天开始,ForkJoinPool每个工作线程都有一个工作队列(让我们称之为“工作队列”)。分叉的任务被推送到本地工作队列,准备好再次由工作人员弹出并执行 - 换句话说,它看起来像工作线程角度的堆栈。当一个工作人员耗尽其工作队列时,它会四处走动并试图从其他工作队列中窃取任务。那是“偷工作”。

Now, before (IIRC) JDK 7u12, ForkJoinPool had a single global submission queue. When worker threads ran out of local tasks, as well the tasks to steal, they got there and tried to see if external work is available. In this design, there is no advantage against a regular, say, ThreadPoolExecutor backed by ArrayBlockingQueue.

现在,在(IIRC)JDK 7u12之前,ForkJoinPool有一个全局提交队列。当工作线程用尽本地任务以及窃取任务时,他们到达那里并试图查看外部工作是否可用。在这种设计中,对于由ArrayBlockingQueue支持的常规ThreadPoolExecutor没有任何优势。

It changed significantly after then. After this submission queue was identified as the serious performance bottleneck, Doug Lea et al. striped the submission queues as well. In hindsight, that is an obvious idea: you can reuse most of the mechanics available for worker queues. You can even loosely distribute these submission queues per worker threads. Now, the external submission goes into one of the submission queues. Then, workers that have no work to munch on, can first look into the submission queue associated with a particular worker, and then wander around looking into the submission queues of others. One can call that "work stealing" too.

之后它发生了重大变化。在此提交队列被确定为严重的性能瓶颈之后,Doug Lea等人。条纹提交队列也是如此。事后看来,这是一个明显的想法:您可以重用大多数可用于工作队列的机制。您甚至可以为每个工作线程松散地分发这些提交队列。现在,外部提交进入其中一个提交队列。然后,没有工作的工作人员可以首先查看与特定工作者关联的提交队列,然后四处寻找其他人的提交队列。人们也可以称之为“偷工作”。

I have seen many workloads benefiting from this. This particular design advantage of ForkJoinPool even for plain non-recursive tasks was recognized a long ago. Many users at concurrency-interest@ asked for a simple work-stealing executor without all the ForkJoinPool arcanery. This is one of the reasons, why we have Executors.newWorkStealingPool() in JDK 8 onward -- currently delegating to ForkJoinPool, but open for providing a simpler implementation.

我看到许多工作负载从中受益。 ForkJoinPool的这种特殊设计优势甚至可以用于简单的非递归任务,这在很久以前就得到了认可。并发兴趣@中的许多用户都要求一个简单的工作窃取执行程序而没有所有的ForkJoinPool arcanery。这就是为什么我们在JDK 8以后有Executors.newWorkStealingPool()的原因之一 - 当前委托给ForkJoinPool,但是开放用于提供更简单的实现。

#1


13  

ForkJoinPool source code has a nice section called "Implementation Overview", read up for an ultimate truth. The explanation below is my understanding for JDK 8u40.

ForkJoinPool源代码有一个很好的部分叫做“实现概述”,阅读最终的真相。下面的解释是我对JDK 8u40的理解。

Since day one, ForkJoinPool had a work queue per worker thread (let's call them "worker queues"). The forked tasks are pushed into the local worker queue, ready to be popped by the worker again and be executed -- in other words, it looks like a stack from worker thread perspective. When a worker depletes its worker queue, it goes around and tries to steal the tasks from other worker queues. That is "work stealing".

从第一天开始,ForkJoinPool每个工作线程都有一个工作队列(让我们称之为“工作队列”)。分叉的任务被推送到本地工作队列,准备好再次由工作人员弹出并执行 - 换句话说,它看起来像工作线程角度的堆栈。当一个工作人员耗尽其工作队列时,它会四处走动并试图从其他工作队列中窃取任务。那是“偷工作”。

Now, before (IIRC) JDK 7u12, ForkJoinPool had a single global submission queue. When worker threads ran out of local tasks, as well the tasks to steal, they got there and tried to see if external work is available. In this design, there is no advantage against a regular, say, ThreadPoolExecutor backed by ArrayBlockingQueue.

现在,在(IIRC)JDK 7u12之前,ForkJoinPool有一个全局提交队列。当工作线程用尽本地任务以及窃取任务时,他们到达那里并试图查看外部工作是否可用。在这种设计中,对于由ArrayBlockingQueue支持的常规ThreadPoolExecutor没有任何优势。

It changed significantly after then. After this submission queue was identified as the serious performance bottleneck, Doug Lea et al. striped the submission queues as well. In hindsight, that is an obvious idea: you can reuse most of the mechanics available for worker queues. You can even loosely distribute these submission queues per worker threads. Now, the external submission goes into one of the submission queues. Then, workers that have no work to munch on, can first look into the submission queue associated with a particular worker, and then wander around looking into the submission queues of others. One can call that "work stealing" too.

之后它发生了重大变化。在此提交队列被确定为严重的性能瓶颈之后,Doug Lea等人。条纹提交队列也是如此。事后看来,这是一个明显的想法:您可以重用大多数可用于工作队列的机制。您甚至可以为每个工作线程松散地分发这些提交队列。现在,外部提交进入其中一个提交队列。然后,没有工作的工作人员可以首先查看与特定工作者关联的提交队列,然后四处寻找其他人的提交队列。人们也可以称之为“偷工作”。

I have seen many workloads benefiting from this. This particular design advantage of ForkJoinPool even for plain non-recursive tasks was recognized a long ago. Many users at concurrency-interest@ asked for a simple work-stealing executor without all the ForkJoinPool arcanery. This is one of the reasons, why we have Executors.newWorkStealingPool() in JDK 8 onward -- currently delegating to ForkJoinPool, but open for providing a simpler implementation.

我看到许多工作负载从中受益。 ForkJoinPool的这种特殊设计优势甚至可以用于简单的非递归任务,这在很久以前就得到了认可。并发兴趣@中的许多用户都要求一个简单的工作窃取执行程序而没有所有的ForkJoinPool arcanery。这就是为什么我们在JDK 8以后有Executors.newWorkStealingPool()的原因之一 - 当前委托给ForkJoinPool,但是开放用于提供更简单的实现。