为什么在中断上下文中执行的内核代码/线程无法休眠？

I am reading following article by Robert Love

我正在阅读Robert Love的以下文章

http://www.linuxjournal.com/article/6916

that says

"...Let's discuss the fact that work queues run in process context. This is in contrast to the other bottom-half mechanisms, which all run in interrupt context. Code running in interrupt context is unable to sleep, or block, because interrupt context does not have a backing process with which to reschedule. Therefore, because interrupt handlers are not associated with a process, there is nothing for the scheduler to put to sleep and, more importantly, nothing for the scheduler to wake up..."

“...让我们讨论工作队列在进程上下文中运行的事实。这与其他下半部机制形成对比,后者都在中断上下文中运行。在中断上下文中运行的代码无法休眠或阻塞,因为中断上下文没有重新安排的后台进程。因此,由于中断处理程序与进程没有关联,调度程序没有任何东西可以进入休眠状态,更重要的是,调度程序无需唤醒...“

I don't get it. AFAIK, scheduler in the kernel is O(1), that is implemented through the bitmap. So what stops the scehduler from putting interrupt context to sleep and taking next schedulable process and passing it the control?

我不明白。 AFAIK,内核中的调度程序是O(1),它是通过位图实现的。那么什么阻止了scehduler将中断上下文置于睡眠状态并采取下一个可调度进程并将其传递给控件?

11 个解决方案

#1

I think it's a design idea.

我认为这是一个设计理念。

Sure, you can design a system that you can sleep in interrupt, but except to make to the system hard to comprehend and complicated(many many situation you have to take into account), that's does not help anything. So from a design view, declare interrupt handler as can not sleep is very clear and easy to implement.

当然,你可以设计一个你可以在中断时睡觉的系统,但除了让系统难以理解和复杂(你需要考虑许多情况),这对任何事都没有帮助。所以从设计的角度来看,声明中断处理程序因为无法入睡而非常清晰且易于实现。

From Robert Love (a kernel hacker): http://permalink.gmane.org/gmane.linux.kernel.kernelnewbies/1791

来自Robert Love(内核黑客):http://permalink.gmane.org/gmane.linux.kernel.kernelnewbies/1791

You cannot sleep in an interrupt handler because interrupts do not have a backing process context, and thus there is nothing to reschedule back into. In other words, interrupt handlers are not associated with a task, so there is nothing to "put to sleep" and (more importantly) "nothing to wake up". They must run atomically.

您无法在中断处理程序中睡眠,因为中断没有后备进程上下文,因此没有任何内容可以重新安排回来。换句话说,中断处理程序与任务无关,因此没有“睡觉”和(更重要的是)“无需唤醒”。他们必须原子地运行。

This is not unlike other operating systems. In most operating systems, interrupts are not threaded. Bottom halves often are, however.

这与其他操作系统不同。在大多数操作系统中,中断没有线程化。然而,下半部通常是。

The reason the page fault handler can sleep is that it is invoked only by code that is running in process context. Because the kernel's own memory is not pagable, only user-space memory accesses can result in a page fault. Thus, only a few certain places (such as calls to copy_{to,from}_user()) can cause a page fault within the kernel. Those places must all be made by code that can sleep (i.e., process context, no locks, et cetera).

页面错误处理程序可以休眠的原因是它仅由在进程上下文中运行的代码调用。因为内核自己的内存不可分页,所以只有用户空间内存访问才会导致页面错误。因此,只有少数特定位置(例如对copy_ {to,from} _user()的调用)可能导致内核中的页面错误。这些地方必须全部由可以睡眠的代码(即,处理上下文,没有锁等)制作。

#2

So what stops the scehduler from putting interrupt context to sleep and taking next schedulable process and passing it the control?

那么什么阻止了scehduler将中断上下文置于睡眠状态并采取下一个可调度进程并将其传递给控件?

The problem is that the interrupt context is not a process, and therefore cannot be put to sleep.

问题是中断上下文不是进程,因此无法进入休眠状态。

When an interrupt occurs, the processor saves the registers onto the stack and jumps to the start of the interrupt service routine. This means that when the interrupt handler is running, it is running in the context of the process that was executing when the interrupt occurred. The interrupt is executing on that process's stack, and when the interrupt handler completes, that process will resume executing.

发生中断时,处理器将寄存器保存到堆栈中并跳转到中断服务程序的开始。这意味着当中断处理程序正在运行时,它正在中断发生时正在执行的进程的上下文中运行。中断正在该进程的堆栈上执行,当中断处理程序完成时,该进程将继续执行。

If you tried to sleep or block inside an interrupt handler, you would wind up not only stopping the interrupt handler, but also the process it interrupted. This could be dangerous, as the interrupt handler has no way of knowing what the interrupted process was doing, or even if it is safe for that process to be suspended.

如果你试图在中断处理程序中睡眠或阻塞,你不仅会停止中断处理程序,还会中断它所中断的进程。这可能很危险,因为中断处理程序无法知道中断进程正在做什么,或者即使该进程被挂起也是安全的。

A simple scenario where things could go wrong would be a deadlock between the interrupt handler and the process it interrupts.

事情可能出错的一个简单场景是中断处理程序和它中断的进程之间的死锁。

Process1 enters kernel mode.

Process1进入内核模式。

Process1 acquires LockA.

Process1获得LockA。

Interrupt occurs.
ISR starts executing using Process1's stack.

ISR使用Process1的堆栈开始执行。

ISR tries to acquire LockA.

ISR试图收购LockA。

ISR calls sleep to wait for LockA to be released.

ISR调用sleep等待LockA被释放。

At this point, you have a deadlock. Process1 can't resume execution until the ISR is done with its stack. But the ISR is blocked waiting for Process1 to release LockA.

此时,你有一个僵局。在ISR完成其堆栈之前,Process1无法恢复执行。但ISR被阻止等待Process1释放LockA。

#3

Because the thread switching infrastructure is unusable at that point. When servicing an interrupt, only stuff of higher priority can execute - See the Intel Software Developer's Manual on interrupt, task and processor priority. If you did allow another thread to execute (which you imply in your question that it would be easy to do), you wouldn't be able to let it do anything - if it caused a page fault, you'd have to use services in the kernel that are unusable while the interrupt is being serviced (see below for why).

因为线程切换基础设施在那时是不可用的。处理中断时,只能执行更高优先级的内容 - 有关中断,任务和处理器优先级,请参阅“英特尔软件开发人员手册”。如果你确实允许另一个线程执行(你在问题中暗示这很容易),你将无法让它做任何事情 - 如果它导致页面错误,你必须使用服务在内核中,在中断服务期间无法使用(请参阅下面的原因)。

Typically, your only goal in an interrupt routine is to get the device to stop interrupting and queue something at a lower interrupt level (in unix this is typically a non-interrupt level, but for Windows, it's dispatch, apc or passive level) to do the heavy lifting where you have access to more features of the kernel/os. See - Implementing a handler.

通常,中断例程中的唯一目标是让设备停止中断并在较低的中断级别排队(在unix中这通常是非中断级别,但对于Windows,它是dispatch,apc或被动级别)在您可以访问kernel / os的更多功能的情况下进行繁重的工作。请参阅 - 实现处理程序。

It's a property of how O/S's have to work, not something inherent in Linux. An interrupt routine can execute at any point so the state of what you interrupted is inconsistent. If you interrupted the thread scheduling code, its state is inconsistent so you can't be sure you can "sleep" and switch threads. Even if you protect the thread switching code from being interrupted, thread switching is a very high level feature of the O/S and if you protected everything it relies on, an interrupt becomes more of a suggestion than the imperative implied by its name.

这是O / S必须如何工作的属性,而不是Linux中固有的东西。中断例程可以在任何时刻执行,因此您中断的状态不一致。如果您中断了线程调度代码,则其状态不一致,因此您无法确定可以“休眠”并切换线程。即使您保护线程切换代码不被中断,线程切换也是O / S的一个非常高级别的功能,如果您保护它所依赖的所有内容,中断就会比其名称隐含的命令更具建议性。

#4

So what stops the scehduler from putting interrupt context to sleep and taking next schedulable process and passing it the control?

那么什么阻止了scehduler将中断上下文置于睡眠状态并采取下一个可调度进程并将其传递给控件?

Scheduling happens on timer interrupts. The basic rule is that only one interrupt can be open at a time, so if you go to sleep in the "got data from device X" interrupt, the timer interrupt cannot run to schedule it out.

调度发生在定时器中断上。基本规则是一次只能打开一个中断,因此如果您在“从设备X获取数据”中断进入休眠状态,则定时器中断无法运行以将其调度出来。

Interrupts also happen many times and overlap. If you put the "got data" interrupt to sleep, and then get more data, what happens? It's confusing (and fragile) enough that the catch-all rule is: no sleeping in interrupts. You will do it wrong.

中断也会多次发生并重叠。如果您将“获取数据”中断置于休眠状态,然后获取更多数据,会发生什么?令人困惑(并且脆弱),以至于所有规则都是:没有在中断中睡觉。你会做错的。

#5

Even if you could put an ISR to sleep, you wouldn't want to do it. You want your ISRs to be as fast as possible to reduce the risk of missing subsequent interrupts.

即使您可以让ISR进入睡眠状态,您也不会想要这样做。您希望ISR尽可能快,以降低丢失后续中断的风险。

#6

High-level interrupt handlers mask the operations of all lower-priority interrupts, including those of the system timer interrupt. Consequently, the interrupt handler must avoid involving itself in an activity that might cause it to sleep. If the handler sleeps, then the system may hang because the timer is masked and incapable of scheduling the sleeping thread. Does this make sense?

高级中断处理程序屏蔽所有低优先级中断的操作,包括系统定时器中断的操作。因此,中断处理程序必须避免将自身卷入可能导致其休眠的活动中。如果处理程序休眠,则系统可能会挂起,因为计时器被屏蔽并且无法调度休眠线程。这有意义吗?

#7

If a higher-level interrupt routine gets to the point where the next thing it must do has to happen after a period of time, then it needs to put a request into the timer queue, asking that another interrupt routine be run (at lower priority level) some time later.

如果更高级别的中断例程到达必须在一段时间之后发生的下一件事,那么它需要将一个请求放入定时器队列,要求运行另一个中断例程(优先级较低)一段时间后。

When that interrupt routine runs, it would then raise priority level back to the level of the original interrupt routine, and continue execution. This has the same effect as a sleep.

当该中断例程运行时,它会将优先级提升回原始中断例程的级别,并继续执行。这与睡眠效果相同。

#8

The linux kernel has two ways to allocate interrupt stack. One is on the kernel stack of the interrupted process, the other is a dedicated interrupt stack per CPU. If the interrupt context is saved on the dedicated interrupt stack per CPU, then indeed the interrupt context is completely not associated with any process. The "current" macro will produce an invalid pointer to current running process, since the "current" macro with some architecture are computed with the stack pointer. The stack pointer in the interrupt context may point to the dedicated interrupt stack, not the kernel stack of some process.

linux内核有两种分配中断堆栈的方法。一个是在中断进程的内核堆栈上,另一个是每个CPU的专用中断堆栈。如果每个CPU将中断上下文保存在专用中断堆栈上,那么实际上中断上下文与任何进程完全无关。 “当前”宏将产生指向当前运行进程的无效指针,因为具有某种体系结构的“当前”宏是使用堆栈指针计算的。中断上下文中的堆栈指针可能指向专用中断堆栈,而不是某个进程的内核堆栈。

#9

Disallowing an interrupt handler to block is a design choice. When some data is on the device, the interrupt handler intercepts the current process, prepares the transfer of the data and enables the interrupt; before the handler enables the current interrupt, the device has to hang. We want keep our I/O busy and our system responsive, then we had better not block the interrupt handler.

不允许阻止中断处理程序是一种设计选择。当某些数据在设备上时,中断处理程序拦截当前进程,准备数据传输并启用中断;在处理程序启用当前中断之前,设备必须挂起。我们希望保持I / O忙,我们的系统响应,然后我们最好不要阻止中断处理程序。

I don't think the "unstable states" are an essential reason. Processes, no matter they are in user-mode or kernel-mode, should be aware that they may be interrupted by interrupts. If some kernel-mode data structure will be accessed by both interrupt handler and the current process, and race condition exists, then the current process should disable local interrupts, and moreover for multi-processor architectures, spinlocks should be used to during the critical sections.

我不认为“不稳定状态”是一个重要原因。进程,无论它们处于用户模式还是内核模式,都应该意识到它们可能被中断中断。如果中断处理程序和当前进程都会访问某些内核模式数据结构,并且存在竞争条件,那么当前进程应该禁用本地中断,而且对于多处理器体系结构,应该在关键部分使用自旋锁。

I also don't think if the interrupt handler were blocked, it cannot be waken up. When we say "block", basically it means that the blocked process is waiting for some event/resource, so it links itself into some wait-queue for that event/resource. Whenever the resource is released, the releasing process is responsible for waking up the waiting process(es).

我也不认为如果中断处理程序被阻止,它就无法被唤醒。当我们说“阻塞”时,基本上它意味着阻塞的进程正在等待某个事件/资源,因此它将自己链接到该事件/资源的某个等待队列。每当资源被释放时,释放过程负责唤醒等待过程。

However, the really annoying thing is that the blocked process can do nothing during the blocking time; it did nothing wrong for this punishment, which is unfair. And nobody could surely predict the blocking time, so the innocent process has to wait for unclear reason and for unlimited time.

然而,真正恼人的是被阻止的进程在阻塞时间内无能为力;这种惩罚没有错,这是不公平的。没有人可以肯定地预测阻塞时间,所以无辜的过程必须等待不明原因和无限时间。

#10

It is just a design/implementation choices in Linux OS. The advantage of this design is simple, but it may not be good for real time OS requirements.

它只是Linux OS中的设计/实现选择。这种设计的优点很简单,但它可能不适合实时操作系统要求。

Other OSes have other designs/implementations.

其他OS具有其他设计/实现。

For example, in Solaris, the interrupts could have different priorities, that allows most of devices interrupts are invoked in interrupt threads. The interrupt threads allows sleep because each of interrupt threads has separate stack in the context of the thread. The interrupt threads design is good for real time threads which should have higher priorities than interrupts.

例如,在Solaris中,中断可能具有不同的优先级,允许在中断线程中调用大多数设备中断。中断线程允许休眠,因为每个中断线程在线程的上下文中都有单独的堆栈。中断线程设计适用于实时线程,其优先级应高于中断。

#11

By nature, the question is whether in interrupt handler you can get a valid "current" (address to the current process task_structure), if yes, it's possible to modify the content there accordingly to make it into "sleep" state, which can be back by scheduler later if the state get changed somehow. The answer may be hardware-dependent.

本质上,问题是在中断处理程序中是否可以获得有效的“当前”(当前进程task_structure的地址),如果是,则可以相应地修改其中的内容以使其进入“睡眠”状态,这可以是如果状态以某种方式改变,则稍后由调度程序返回。答案可能取决于硬件。

But in ARM, it's impossible since 'current' is irrelevant to process under interrupt mode. See the code below:

但在ARM中,由于“当前”与中断模式下的处理无关,因此不可能实现。请参阅以下代码:

#linux/arch/arm/include/asm/thread_info.h 
94 static inline struct thread_info *current_thread_info(void)
95 {
96  register unsigned long sp asm ("sp");
97  return (struct thread_info *)(sp & ~(THREAD_SIZE - 1));
98 }

sp in USER mode and SVC mode are the "same" ("same" here not mean they're equal, instead, user mode's sp point to user space stack, while svc mode's sp r13_svc point to the kernel stack, where the user process's task_structure was updated at previous task switch, When a system call occurs, the process enter kernel space again, when the sp (sp_svc) is still not changed, these 2 sp are associated with each other, in this sense, they're 'same'), So under SVC mode, kernel code can get the valid 'current'. But under other privileged modes, say interrupt mode, sp is 'different', point to dedicated address defined in cpu_init(). The 'current' calculated under these mode will be irrelevant to the interrupted process, accessing it will result in unexpected behaviors. That's why it's always said that system call can sleep but interrupt handler can't, system call works on process context but interrupt not.

用户模式和SVC模式下的sp是“相同的”(这里“相同”并不意味着它们相等,相反,用户模式的sp指向用户空间堆栈,而svc模式的sp r13_svc指向内核堆栈,用户进程的位置task_structure在上一个任务切换时更新,当系统调用发生时,进程再次进入内核空间,当sp(sp_svc)仍未更改时,这两个sp相互关联,在这个意义上,它们是相同的'),因此在SVC模式下,内核代码可以获得有效的'当前'。但在其他特权模式下,例如中断模式,sp是'不同',指向cpu_init()中定义的专用地址。在这些模式下计算的“当前”将与中断的进程无关,访问它将导致意外行为。这就是为什么总是说系统调用可以休眠而中断处理程序不能,系统调用在进程上下文上工作但不中断。

#1

I think it's a design idea.

我认为这是一个设计理念。

From Robert Love (a kernel hacker): http://permalink.gmane.org/gmane.linux.kernel.kernelnewbies/1791

来自Robert Love(内核黑客):http://permalink.gmane.org/gmane.linux.kernel.kernelnewbies/1791

This is not unlike other operating systems. In most operating systems, interrupts are not threaded. Bottom halves often are, however.

这与其他操作系统不同。在大多数操作系统中,中断没有线程化。然而,下半部通常是。

#2

So what stops the scehduler from putting interrupt context to sleep and taking next schedulable process and passing it the control?

那么什么阻止了scehduler将中断上下文置于睡眠状态并采取下一个可调度进程并将其传递给控件?

The problem is that the interrupt context is not a process, and therefore cannot be put to sleep.

问题是中断上下文不是进程,因此无法进入休眠状态。

A simple scenario where things could go wrong would be a deadlock between the interrupt handler and the process it interrupts.

事情可能出错的一个简单场景是中断处理程序和它中断的进程之间的死锁。

Process1 enters kernel mode.

Process1进入内核模式。

Process1 acquires LockA.

Process1获得LockA。

Interrupt occurs.
ISR starts executing using Process1's stack.

ISR使用Process1的堆栈开始执行。

ISR tries to acquire LockA.

ISR试图收购LockA。

ISR calls sleep to wait for LockA to be released.

ISR调用sleep等待LockA被释放。

At this point, you have a deadlock. Process1 can't resume execution until the ISR is done with its stack. But the ISR is blocked waiting for Process1 to release LockA.

此时,你有一个僵局。在ISR完成其堆栈之前,Process1无法恢复执行。但ISR被阻止等待Process1释放LockA。

#3

#4

So what stops the scehduler from putting interrupt context to sleep and taking next schedulable process and passing it the control?

那么什么阻止了scehduler将中断上下文置于睡眠状态并采取下一个可调度进程并将其传递给控件?

#5

Even if you could put an ISR to sleep, you wouldn't want to do it. You want your ISRs to be as fast as possible to reduce the risk of missing subsequent interrupts.

即使您可以让ISR进入睡眠状态,您也不会想要这样做。您希望ISR尽可能快,以降低丢失后续中断的风险。

#6

#7

When that interrupt routine runs, it would then raise priority level back to the level of the original interrupt routine, and continue execution. This has the same effect as a sleep.

当该中断例程运行时,它会将优先级提升回原始中断例程的级别,并继续执行。这与睡眠效果相同。

#8

#9

#10

It is just a design/implementation choices in Linux OS. The advantage of this design is simple, but it may not be good for real time OS requirements.

它只是Linux OS中的设计/实现选择。这种设计的优点很简单,但它可能不适合实时操作系统要求。

Other OSes have other designs/implementations.

其他OS具有其他设计/实现。

#11

But in ARM, it's impossible since 'current' is irrelevant to process under interrupt mode. See the code below:

但在ARM中,由于“当前”与中断模式下的处理无关,因此不可能实现。请参阅以下代码:

#linux/arch/arm/include/asm/thread_info.h 
94 static inline struct thread_info *current_thread_info(void)
95 {
96  register unsigned long sp asm ("sp");
97  return (struct thread_info *)(sp & ~(THREAD_SIZE - 1));
98 }

秒客网

为什么在中断上下文中执行的内核代码/线程无法休眠？

11 个解决方案

#1

#2

#3

#4

#5

#6

#7

#8

#9

#10

#11

#1

#2

#3

#4

#5

#6

#7

#8

#9

#10

#11

相关文章