I'm trying to understand the relationship between C language system calls API, syscall assembler instruction and the exception mechanism (interrupts) used to switch contexts between processes. There's a lot to study out on my own, so please bear with me.

我试图理解C语言系统调用API、syscall汇编指令和用于在进程之间切换上下文的异常机制(interrupts)之间的关系。我自己要学习的东西很多，所以请耐心点。

Is my understanding correct that C language system calls are implemented by compiler as syscall's with respective code in assembly, which in turn, are implemented by OS as exceptions mechanism (interrupts)?

我的理解是否正确，C语言系统调用是由编译器以syscall的方式实现的，程序集中有各自的代码，而这些代码又由OS作为异常机制(中断)实现?

So the call to the write function in the following C code:

因此，调用下面C代码中的write函数:

#include <unistd.h>

int main(void)
{
    write(2, "There was an error writing to standard out\n", 44);
    return 0;
}

Is compiled to assembly as a syscall instruction:

作为syscall指令编译为汇编:

mov eax,4       ; system call number (sys_write)
syscall

And the instruction, in turn, is implemented by OS as exceptions mechanism (interrupt)?

而指令又由OS作为异常机制来实现(中断)?

4 个解决方案

#1

TL;DR

The syscall instruction itself acts like a glorified jump, it's a hardware-supported way to efficiently and safely jump from unprivileged user-space into the kernel.
The syscall instruction jumps to a kernel entry-point that dispatches the call.

syscall指令本身就像一个美化的跳转，它是一种硬件支持的方式，可以有效、安全地从无特权的用户空间跳转到内核。syscall指令跳转到分发调用的内核入口点。

Before x86_64 two other mechanisms were used: the int instruction and the sysenter instruction.
They have different entry-points (still present today in 32-bit kernels, and 64-bit kernels that can run 32-bit user-space programs).
The former uses the x86 interrupt machinery and can be confused with the exceptions dispatching (that also uses the interrupt machinery).
However, exceptions are spurious events while int is used to generate a software interrupt, again, a glorified jump.

在x86_64之前还使用了另外两种机制:int指令和sysenter指令。它们有不同的入口点(现在仍然存在于32位内核中，以及可以运行32位用户空间程序的64位内核)。前者使用x86中断机制，可以与异常调度(也使用中断机制)混淆。然而，异常是假事件，而int被用于生成软件中断，同样，是一个美化的跳转。

The C language doesn't concern itself with system calls, it relies on the C runtime to perform all the interactions with the environment of the future program.

C语言不关心系统调用，它依赖于C运行时执行与未来程序环境的所有交互。

The C runtime implements the above-mentioned interactions through an environment specific mechanism.
There could be various layers of software abstractions but in the end the OS APIs get called.

C运行时通过特定于环境的机制实现上述交互。可能有不同的软件抽象层，但最终会调用OS api。

The term API is used to denote a contract, strictly speaking using an API doesn't require to invoke a piece of kernel code (the trend is to implement non-critical functions in userspace to limit the exploitable code), here we are only interested in the subset of the API that requires a privilege switch.

API这个词是用来表示一个合同,严格来说不需要使用一个API来调用一个内核代码(这一趋势是在用户空间实现非关键功能限制可利用的代码),这里我们只对API的子集感兴趣,需要一种特权开关。

Under Linux, the kernel exposes a set of services accessible from userspace, these entry-points are called system calls.
Under Windows, the kernel services (that are accessed with the same mechanism of the Linux analogues) are considered private in the sense that they are not required to be stable across versions.
A set of DLL/EXE exported functions are used as entry-points instead (e.g. ntoskrnl.exe, hal.dll, kernel32.dll, user32.dll) that in turn use the kernel services through a (private) system call.
Note that under Linux, most system calls have a POSIX wrapper around it, so it's possible to use these wrappers, that are ordinary C functions, to invoke a system call.
The underlying ABI is different, so is for the error reporting; the wrapper translates between the two worlds.

在Linux下，内核公开一组可从用户空间访问的服务，这些入口点称为系统调用。在Windows下，内核服务(使用Linux模拟程序的相同机制访问)被认为是私有的，因为它们不需要跨版本保持稳定。一组DLL/EXE导出的函数被用作入口点(例如ntoskrnl)。exe,哈尔。dll,kernel32。dll, user32.dll)通过(私有)系统调用使用内核服务的。注意，在Linux下，大多数系统调用都有一个POSIX包装器，因此可以使用这些包装器(普通的C函数)来调用系统调用。基础ABI是不同的，错误报告也是不同的;包装在两个世界之间转换。

The C runtime calls the OS APIs, in the case of Linux the system calls are used directly because they are public (in the sense that are stable across versions), while for Windows the usual DLLs, like kernel32.dll, are marked as dependencies and used.

C运行时调用OS api，对于Linux，系统调用是直接使用的，因为它们是公共的(在跨版本稳定的意义上)，而对于Windows，通常的dll，比如kernel32。dll，标记为依赖项并使用。

We are reduced to the point where an user-mode program, being it part of the C runtime (Linux) or part of an API DLL (Windows), need to invoke a code in the kernel.

我们被简化为一个用户模式程序，它是C运行时(Linux)的一部分，或者是API DLL (Windows)的一部分，需要在内核中调用代码。

The x86 architecture historically offered different ways to do so, for example, a call gate.
Another way is through the int instruction, it has a few advantages:

x86体系结构提供了不同的实现方法，例如调用门。另一种方法是通过int指令，它有几个优点:

It is what the BIOS and the DOS did in their times.
In real-mode, using an int instructions is suitable because a vector number (e.g. 21h) is easier to remember than a far address (e.g. 0f000h:0fff0h).
这就是BIOS和DOS在它们的时代所做的事情。在实际模式中，使用int指令是合适的，因为向量号(例如21h)比远地址(例如0f000h:0fff0h)更容易记住。
It saves the flags.
它节省了旗帜。
It is easy to set up (setting up ISR is relatively easy).
它很容易设置(设置ISR相对容易)。

With the modernization of the architecture this mechanism turned out to have a big disadvantage: it is slow. Before the introduction of the sysenter (note, sysenter not syscall) instruction there was no faster alternative (a call gate would be equally slow).

随着体系结构的现代化，这种机制有一个很大的缺点:速度慢。在引入sysenter(注意，sysenter不是syscall)指令之前，没有更快的替代方法(调用门也同样慢)。

With the advent of the Pentium Pro/II[1] a new pair of instructions sysenter and sysexit were introduced to make system calls faster.
Linux started using them since the version 2.5 and are still used today on 32-bit systems I believe.
I won't explain the whole mechanism of the sysenter instruction and the companion VDSO necessary to use it, it is only needed to say that it was faster than the int mechanism (I can't find an article from Andy Glew where he says that sysenter turned out to be slow on Pentium III, I don't know how it performs nowadays).

随着奔腾Pro/II[1]的出现，引入了一对新的指令sysenter和sysexit以使系统调用更快。Linux从2.5版本开始使用它们，我相信现在仍然在32位系统上使用。我不会解释的整个机制sysenter指令和同伴VDSO必要使用它,只是需要说这是速度比int机制(我找不到一篇文章从安迪Glew他说sysenter奔腾III是缓慢的,我不知道现在它执行)。

With the advent of x86-64 the AMD response to sysenter, i.e. the syscall/sysret pair, began the de-facto way to switch from user-mode to kernel-mode.
This is due to the fact that sysenter is actually fast and very simple (it copies rip and rflags into rcx and r11 respectively, masks rflags and jump to an address set in IA32_LSTAR).

随着x86-64的出现，AMD对sysenter的响应(即syscall/sysret对)开始从用户模式切换到内核模式。这是由于sysenter实际上速度很快而且非常简单(它将rip和rflags分别复制到rcx和r11，屏蔽rflags并跳转到IA32_LSTAR中设置的地址)。

64-bit versions of both Linux and Windows use syscall.

Linux和Windows的64位版本都使用syscall。

To recap, control can be given to the kernel through three mechanism:

综上所述，可以通过三种机制对内核进行控制:

Software interrupts.
This was int 80h for 32-bit Linux (pre 2.5) and int 2eh for 32-bit Windows.
软件中断。这是32位Linux的int 80h (pre 2.5)和32位Windows的int 2eh。
Via sysenter.
Used by 32-bit versions of Linux since 2.5.
通过sysenter。自2.5以来被32位版本的Linux使用。
Via syscall.
Used by 64-bit versions of Linux and Windows.
通过系统调用。用于64位版本的Linux和Windows。

Here is a nice page to put it in a better shape.

这里有一个很好的页面，可以让它看起来更好。

The C runtime is usually a static library, thus pre-compiled, that uses one of the three methods above.

C运行时通常是一个静态库，因此是预编译的，它使用上述三种方法之一。

The syscall instruction transfers control to a kernel entry-point (see entry_64.s) directly.
It is an instruction that just does so, it is not implemented by the OS, it is used by the OS.

syscall指令将控制转移到内核入口点(参见entry_64.s)。它是一个这样做的指令，不是由操作系统实现的，而是由操作系统使用的。

The term exception is overloaded in CS, C++ has exceptions, so do Java and C#.
The OS can have a language agnostic exception trapping mechanism (under windows it was once called SEH, now has been rewritten).
The CPU also has exceptions.
I believe we are talking about the last meaning.

在CS中，术语异常被重载，c++有异常，Java和c#也是如此。操作系统可以有一个语言不可知的异常捕获机制(在windows下，它曾经被称为SEH，现在已经被重写)。CPU也有例外。我相信我们在讨论最后的意义。

Exceptions are dispatched through interrupts, they are a kind of interrupt.
It goes unsaid that while exceptions are synchronous (they happen at specific, replayable points) they are "unwanted", they are exceptional, in the sense that programmers tend to avoid them and when they happen is due to either a bug, an unhandled corner case or a bad situation.
They, thus, are not used to transfer control to the kernel (they could).

异常通过中断发送，它们是一种中断。尽管异常是同步的(它们发生在特定的、可重播的点上)，但它们是“不需要的”，这是不言而喻的，因为程序员往往会避免它们，当它们发生时，要么是由于错误、未处理的死角情况，要么是由于糟糕的情况。因此，它们不用于将控制转移到内核(它们可以)。

Software interrupts (that are synchronous too) were used instead; the mechanism is almost exactly the same (exceptions can have a status code pushed on the kernel stack) but the semantic is different.
We never deferenced a null-pointer, accessed an unmapped page or similar to invoke a system call, we used the int instruction instead.

软件中断(也是同步的)被使用;机制几乎完全相同(异常可以在内核堆栈上推入状态码)，但是语义是不同的。我们从不延迟空指针、访问未映射页面或类似于调用系统调用，而是使用int指令。

#2

Is my understanding correct that C language system calls are implemented by compiler as syscall's with respective code in assembly […]?

我的理解是否正确，C语言系统调用是由编译器以syscall的方式在程序集[…]中实现的?

No.

不。

The C compiler handles system calls the same way that it handles calls to any other function:

C编译器处理系统调用的方式与处理任何其他函数调用的方式相同:

; write(2, "There was an error writing to standard out\n", 44);
mov    $44, %edx
lea    .LC0(%rip), %rsi  ; address of the string
mov    $2, %edi
call   write

The implementation of these functions in libc (your system's C library) will probably contain a syscall instruction, or whatever the equivalent is on your system's architecture.

在libc(您的系统的C库)中实现这些函数，可能会包含syscall指令，或者系统架构上的任何等价类。

#3

EDIT

编辑

Yes, the C application calls a C library function which buried in the C library solution is a system specific call or set of calls, which use an architecturally specific way to reach the operating system, which has an exception/interrupt handler setup to deal with these system calls. Actually doesnt have to be architecturally specific, can simply jump/call to a well known address, but with modern desire for security and protection modes, a simple call wont have those added features, still functionally correct though.

是的，C应用程序调用一个C库函数，它隐藏在C库解决方案中是一个特定于系统的调用或调用集合，它使用一种特定于体系结构的方式到达操作系统，该操作系统有一个异常/中断处理程序设置来处理这些系统调用。实际上不需要特定于体系结构，可以简单地跳转/调用一个众所周知的地址，但是对于安全性和保护模式的现代需求，一个简单的调用不会有那些添加的特性，尽管功能上仍然是正确的。

How the library is implemented is implementation defined. And how the compiler connects your code to that library runtime or link time has a number of combinations as to how that can happen, there is no one way it can or needs to happen, so it is implementation defined as well. So long as it is functionally correct and doesnt interfere with the C standards then it can work.

实现库的方式是定义的实现。编译器如何将代码连接到库运行时或链接时间有很多组合关于如何发生，没有一种方法可以或需要发生，所以它也是定义的实现。只要它在功能上是正确的，并且不影响C标准，那么它就可以工作。

With operating systems like windows and linux and others on our phones and tables there is a strong desire to isolate the applications from the system so they cannot do damage in various ways, so protection is desired, and you need to have an architecturally specific way to make a function call into the operating system that is not a normal call as it switches modes. If the architecture has more than one way to do this then the operating system can choose one or more of the ways as part of their design.

与操作系统windows和linux等在我们的手机和表有一个强烈的愿望来隔离应用程序从系统,所以他们不能做损害以不同的方式,所以需要保护,需要一个架构上特定的方式让一个函数调用操作系统,不是一个正常的开关模式调用。如果体系结构有不止一种方法来实现这一点，那么操作系统可以选择其中一种或多种方法作为设计的一部分。

A "software interrupt" is one common way as with hardware interrupts most solutions include a table of handler addresses, by extending that table and having some of the vectors be tied to a software created "interrupt" (hitting a special instruction rather than a signal changing state on an input) but go through the same stop, save some state, call the vector, etc.

“软件中断”是一个常见的方式与硬件中断大多数解决方案包括一个处理程序表地址,通过扩展表和一些向量被绑定到一个软件创建的“中断”(打一个特殊的指令,而不是一个一个输入信号改变状态),但经过相同的停止,保存一些状态,称为向量,等等。

#4

Not a direct answer to the question but this might interest you (I don't have enough karma to comment) - it explains all the user space execution (including glibc and how it does syscalls) in detail:

这并不是对问题的直接回答，但这可能会让您感兴趣(我没有足够的因果关系来进行评论)——它详细地解释了所有的用户空间执行(包括glibc和它如何进行syscalls):

http://www.maizure.org/projects/printf/index.html

You'll probably be interested in particular in 'Step 8 - Final string written to standard output':

您可能会对“步骤8 -写入标准输出的最终字符串”感兴趣:

And what does __libc_write look like...?

__libc_write是什么样子?
000000000040f9c0 <__libc_write>:
  40f9c0:  83 3d c5 bb 2a 00 00   cmpl   $0x0,0x2abbc5(%rip)  # 6bb58c <__libc_multiple_threads>
  40f9c7:  75 14                  jne    40f9dd <__write_nocancel+0x14>

000000000040f9c9 <__write_nocancel>:
  40f9c9: b8 01 00 00 00          mov    $0x1,%eax
  40f9ce: 0f 05                   syscall 
  ...cut...
Write simply checks the threading state and, assuming all is well, moves the write syscall number (1) in to EAX and enters the kernel.

Write只检查线程状态，假设一切正常，将写syscall number(1)移到EAX并进入内核。

Some notes:

一些注意事项:

x86-64 Linux write syscall is 1, old x86 was 4

x86-64 Linux写syscall是1，老x86是4

rdi refers to stdout

rdi指stdout

rsi points to the string

rsi指向字符串

rdx is the string size count

rdx是字符串大小的计数

Note that this was for the author's x86-64 Linux system.

注意，这是针对作者的x86-64 Linux系统的。

For x86, this provides some help:

对于x86来说，这提供了一些帮助:

http://www.tldp.org/LDP/khg/HyperNews/get/syscall/syscall86.html

Under Linux the execution of a system call is invoked by a maskable interrupt or exception class transfer, caused by the instruction int 0x80. We use vector 0x80 to transfer control to the kernel. This interrupt vector is initialized during system startup, along with other important vectors like the system clock vector.

在Linux下，系统调用的执行由一个可屏蔽的中断或异常类转移调用，这是由指令int 0x80引起的。我们使用向量0x80将控制传递给内核。在系统启动时初始化这个中断向量，以及其他重要的向量，如系统时钟向量。

But as a general answer for a Linux kernel:

但是作为Linux内核的一个通用答案:

Is my understanding correct that C language system calls are implemented by compiler as syscall's with respective code in assembly, which in turn, are implemented by OS as exceptions mechanism (interrupts)?

我的理解是否正确，C语言系统调用是由编译器以syscall的方式实现的，程序集中有各自的代码，而这些代码又由OS作为异常机制(中断)实现?

Yes

是的

#1

TL;DR

The C language doesn't concern itself with system calls, it relies on the C runtime to perform all the interactions with the environment of the future program.

C语言不关心系统调用，它依赖于C运行时执行与未来程序环境的所有交互。

The C runtime implements the above-mentioned interactions through an environment specific mechanism.
There could be various layers of software abstractions but in the end the OS APIs get called.

C运行时通过特定于环境的机制实现上述交互。可能有不同的软件抽象层，但最终会调用OS api。

We are reduced to the point where an user-mode program, being it part of the C runtime (Linux) or part of an API DLL (Windows), need to invoke a code in the kernel.

我们被简化为一个用户模式程序，它是C运行时(Linux)的一部分，或者是API DLL (Windows)的一部分，需要在内核中调用代码。

The x86 architecture historically offered different ways to do so, for example, a call gate.
Another way is through the int instruction, it has a few advantages:

x86体系结构提供了不同的实现方法，例如调用门。另一种方法是通过int指令，它有几个优点:

It is what the BIOS and the DOS did in their times.
In real-mode, using an int instructions is suitable because a vector number (e.g. 21h) is easier to remember than a far address (e.g. 0f000h:0fff0h).
这就是BIOS和DOS在它们的时代所做的事情。在实际模式中，使用int指令是合适的，因为向量号(例如21h)比远地址(例如0f000h:0fff0h)更容易记住。
It saves the flags.
它节省了旗帜。
It is easy to set up (setting up ISR is relatively easy).
它很容易设置(设置ISR相对容易)。

随着体系结构的现代化，这种机制有一个很大的缺点:速度慢。在引入sysenter(注意，sysenter不是syscall)指令之前，没有更快的替代方法(调用门也同样慢)。

64-bit versions of both Linux and Windows use syscall.

Linux和Windows的64位版本都使用syscall。

To recap, control can be given to the kernel through three mechanism:

综上所述，可以通过三种机制对内核进行控制:

Software interrupts.
This was int 80h for 32-bit Linux (pre 2.5) and int 2eh for 32-bit Windows.
软件中断。这是32位Linux的int 80h (pre 2.5)和32位Windows的int 2eh。
Via sysenter.
Used by 32-bit versions of Linux since 2.5.
通过sysenter。自2.5以来被32位版本的Linux使用。
Via syscall.
Used by 64-bit versions of Linux and Windows.
通过系统调用。用于64位版本的Linux和Windows。

Here is a nice page to put it in a better shape.

这里有一个很好的页面，可以让它看起来更好。

The C runtime is usually a static library, thus pre-compiled, that uses one of the three methods above.

C运行时通常是一个静态库，因此是预编译的，它使用上述三种方法之一。

The syscall instruction transfers control to a kernel entry-point (see entry_64.s) directly.
It is an instruction that just does so, it is not implemented by the OS, it is used by the OS.

syscall指令将控制转移到内核入口点(参见entry_64.s)。它是一个这样做的指令，不是由操作系统实现的，而是由操作系统使用的。

#2

Is my understanding correct that C language system calls are implemented by compiler as syscall's with respective code in assembly […]?

我的理解是否正确，C语言系统调用是由编译器以syscall的方式在程序集[…]中实现的?

No.

不。

The C compiler handles system calls the same way that it handles calls to any other function:

C编译器处理系统调用的方式与处理任何其他函数调用的方式相同:

; write(2, "There was an error writing to standard out\n", 44);
mov    $44, %edx
lea    .LC0(%rip), %rsi  ; address of the string
mov    $2, %edi
call   write

The implementation of these functions in libc (your system's C library) will probably contain a syscall instruction, or whatever the equivalent is on your system's architecture.

在libc(您的系统的C库)中实现这些函数，可能会包含syscall指令，或者系统架构上的任何等价类。

#3

EDIT

编辑

#4

Not a direct answer to the question but this might interest you (I don't have enough karma to comment) - it explains all the user space execution (including glibc and how it does syscalls) in detail:

http://www.maizure.org/projects/printf/index.html

You'll probably be interested in particular in 'Step 8 - Final string written to standard output':

您可能会对“步骤8 -写入标准输出的最终字符串”感兴趣:

And what does __libc_write look like...?

__libc_write是什么样子?
000000000040f9c0 <__libc_write>:
  40f9c0:  83 3d c5 bb 2a 00 00   cmpl   $0x0,0x2abbc5(%rip)  # 6bb58c <__libc_multiple_threads>
  40f9c7:  75 14                  jne    40f9dd <__write_nocancel+0x14>

000000000040f9c9 <__write_nocancel>:
  40f9c9: b8 01 00 00 00          mov    $0x1,%eax
  40f9ce: 0f 05                   syscall 
  ...cut...
Write simply checks the threading state and, assuming all is well, moves the write syscall number (1) in to EAX and enters the kernel.

Write只检查线程状态，假设一切正常，将写syscall number(1)移到EAX并进入内核。

Some notes:

一些注意事项:

x86-64 Linux write syscall is 1, old x86 was 4

x86-64 Linux写syscall是1，老x86是4

rdi refers to stdout

rdi指stdout

rsi points to the string

rsi指向字符串

rdx is the string size count

rdx是字符串大小的计数

Note that this was for the author's x86-64 Linux system.

注意，这是针对作者的x86-64 Linux系统的。

For x86, this provides some help:

对于x86来说，这提供了一些帮助:

http://www.tldp.org/LDP/khg/HyperNews/get/syscall/syscall86.html

Under Linux the execution of a system call is invoked by a maskable interrupt or exception class transfer, caused by the instruction int 0x80. We use vector 0x80 to transfer control to the kernel. This interrupt vector is initialized during system startup, along with other important vectors like the system clock vector.

在Linux下，系统调用的执行由一个可屏蔽的中断或异常类转移调用，这是由指令int 0x80引起的。我们使用向量0x80将控制传递给内核。在系统启动时初始化这个中断向量，以及其他重要的向量，如系统时钟向量。

But as a general answer for a Linux kernel:

但是作为Linux内核的一个通用答案:

Is my understanding correct that C language system calls are implemented by compiler as syscall's with respective code in assembly, which in turn, are implemented by OS as exceptions mechanism (interrupts)?

我的理解是否正确，C语言系统调用是由编译器以syscall的方式实现的，程序集中有各自的代码，而这些代码又由OS作为异常机制(中断)实现?

Yes

是的

秒客网

系统调用API、syscall指令和异常机制(中断)之间的关系

4 个解决方案

#1

TL;DR

#2

#3

#4

#1

TL;DR

#2

#3

#4

相关文章