为什么使用taskset在一组隔离的内核上运行多线程Linux程序会导致所有线程在一个内核上运行?

时间:2022-04-03 02:14:46

Desired behaviour: run a multi-threaded Linux program on a set of cores which have been isolated using isolcpus.

期望的行为:在使用isolcpus隔离的一组核上运行多线程Linux程序。

Here's a small program we can use as an example multi-threaded program:

这是一个我们可以用作多线程程序示例的小程序:

#include <stdio.h>
#include <pthread.h>
#include <err.h>
#include <unistd.h>
#include <stdlib.h>

#define NTHR    16
#define TIME    60 * 5

void *
do_stuff(void *arg)
{
    int i = 0;

    (void) arg;
    while (1) {
        i += i;
        usleep(10000); /* dont dominate CPU */
    }
}

int
main(void)
{
    pthread_t   threads[NTHR];
    int     rv, i;

    for (i = 0; i < NTHR; i++) {
        rv = pthread_create(&threads[i], NULL, do_stuff, NULL);
        if (rv) {
            perror("pthread_create");
            return (EXIT_FAILURE);
        }
    }
    sleep(TIME);
    exit(EXIT_SUCCESS);
}

If I compile and run this on a kernel with no isolated CPUs, then the threads are spread out over my 4 CPUs. Good!

如果我在没有隔离CPU的内核上编译并运行它,那么线程将分布在我的4个CPU上。好!

Now if I add isolcpus=2,3 to the kernel command line and reboot:

现在,如果我将isolcpus = 2,3添加到内核命令行并重新启动:

  • Running the program without taskset distributes threads over cores 0 and 1. This is expected as the default affinity mask now excludes cores 2 and 3.
  • 在没有任务集的情况下运行程序会在核心0和1上分配线程。这是预期的,因为默认的关联掩码现在排除了核心2和3。
  • Running with taskset -c 0,1 has the same effect. Good.
  • 使用taskset -c 0,1运行具有相同的效果。好。
  • Running with taskset -c 2,3 causes all threads to go onto the same core (either core 2 or 3). This is undesired. Threads should distribute over cores 2 and 3. Right?
  • 使用taskset -c 2,3运行会导致所有线程进入同一个核心(核心2或3)。这是不希望的。线程应该分布在核心2和3上。对吗?

This post describes a similar issue (although the example given is farther away from the pthreads API). The OP was happy to workaround this by using a different scheduler. I'm not certain this is ideal for my use-case however.

这篇文章描述了一个类似的问题(虽然给出的例子离pthreads API更远)。 OP很乐意通过使用不同的调度程序来解决这个问题。我不确定这对我的用例来说是理想的。

Is there a way to have the threads distributed over the isolated cores using the default scheduler?

有没有办法使用默认调度程序将线程分布在隔离的内核上?

Is this a kernel bug which I should report?

这是我应该报告的内核错误吗?

EDIT:

编辑:

The right thing does indeed happen if you use a real-time scheduler like the fifo scheduler. See man sched and man chrt for details.

如果您使用像fifo调度程序这样的实时调度程序,那么确实会发生正确的事情。有关细节,请参阅man sched和man chrt。

1 个解决方案

#1


3  

From the Linux Kernel Parameter Doc:

从Linux内核参数文档:

This option can be used to specify one or more CPUs to isolate from the general SMP balancing and scheduling algorithms.

此选项可用于指定一个或多个CPU,以与通用SMP平衡和调度算法隔离。

So this options would effectively prevent scheduler doing thread migration from one core to another less contended core (SMP balancing). As typical isolcpus are used together with pthread affinity control to pin threads with knowledge of CPU layout to gain predictable performance.

因此,这些选项将有效地防止调度程序从一个核心迁移到另一个较少竞争的核心(SMP平衡)。由于典型的isolcpus与pthread亲和力控制一起使用,因此可以了解CPU布局以获得可预测的性能。

https://www.kernel.org/doc/Documentation/kernel-parameters.txt

https://www.kernel.org/doc/Documentation/kernel-parameters.txt

--Edit--

- 编辑 -

Ok I see why you are confused. Yeah personally I would assume consistent behavior on this option. The problem lies around two functions, select_task_rq_fair and select_task_rq_rt, which is responsible for selecting new run_queue (which is essentially selecting which next_cpu to run on). I did a quick trace (Systemtap) of both functions, for CFS it would always return the same first core in the mask; for RT, it would return other cores. I haven't got a chance to look into the logic in each selection algorithm but you can send an email to the maintainer in Linux devel mailing list for fix.

好的,我知道你为什么感到困惑。是的,我个人会假设这个选项一致的行为。问题在于两个函数select_task_rq_fair和select_task_rq_rt,它负责选择新的run_queue(实质上是选择运行哪个next_cpu)。我做了两个函数的快速跟踪(Systemtap),对于CFS,它总是返回掩码中相同的第一个核心;对于RT,它将返回其他核心。我没有机会查看每个选择算法中的逻辑,但您可以向Linux devel邮件列表中的维护者发送电子邮件以进行修复。

#1


3  

From the Linux Kernel Parameter Doc:

从Linux内核参数文档:

This option can be used to specify one or more CPUs to isolate from the general SMP balancing and scheduling algorithms.

此选项可用于指定一个或多个CPU,以与通用SMP平衡和调度算法隔离。

So this options would effectively prevent scheduler doing thread migration from one core to another less contended core (SMP balancing). As typical isolcpus are used together with pthread affinity control to pin threads with knowledge of CPU layout to gain predictable performance.

因此,这些选项将有效地防止调度程序从一个核心迁移到另一个较少竞争的核心(SMP平衡)。由于典型的isolcpus与pthread亲和力控制一起使用,因此可以了解CPU布局以获得可预测的性能。

https://www.kernel.org/doc/Documentation/kernel-parameters.txt

https://www.kernel.org/doc/Documentation/kernel-parameters.txt

--Edit--

- 编辑 -

Ok I see why you are confused. Yeah personally I would assume consistent behavior on this option. The problem lies around two functions, select_task_rq_fair and select_task_rq_rt, which is responsible for selecting new run_queue (which is essentially selecting which next_cpu to run on). I did a quick trace (Systemtap) of both functions, for CFS it would always return the same first core in the mask; for RT, it would return other cores. I haven't got a chance to look into the logic in each selection algorithm but you can send an email to the maintainer in Linux devel mailing list for fix.

好的,我知道你为什么感到困惑。是的,我个人会假设这个选项一致的行为。问题在于两个函数select_task_rq_fair和select_task_rq_rt,它负责选择新的run_queue(实质上是选择运行哪个next_cpu)。我做了两个函数的快速跟踪(Systemtap),对于CFS,它总是返回掩码中相同的第一个核心;对于RT,它将返回其他核心。我没有机会查看每个选择算法中的逻辑,但您可以向Linux devel邮件列表中的维护者发送电子邮件以进行修复。