编程语言编译器首先转换为汇编还是直接转换为机器代码?

时间:2022-05-24 12:48:00

I'm primarily interested in popular and widely used compilers, such as gcc. But if things are done differently with different compilers, I'd like to know that, too.

我主要对流行和广泛使用的编译器感兴趣,比如gcc。但是如果不同的编译器对事情做了不同的事情,我也想知道。

Taking gcc as an example, does it compile a short program written in C directly to machine code, or does it first translate it to human-readable assembly, and only then uses an (in-built?) assembler to translate the assembly program into binary, machine code -- a series of instructions to the CPU?

以gcc为例,它是否将用C编写的短程序直接编译为机器代码,或者首先将其转换为人类可读的汇编,然后才使用(内置?)汇编程序将汇编程序转换为二进制,机器代码 - CPU的一系列指令?

Is using assembly code to create a binary executable a significantly expensive operation? Or is it a relatively simple and quick thing to do?

使用汇编代码创建二进制可执行文件是一项非常昂贵的操作吗?或者这是一个相对简单快速的事情?

(Let's assume we're dealing with only the x86 family of processors, and all programs are written for Linux.)

(假设我们只处理x86系列处理器,所有程序都是为Linux编写的。)

I'd be very grateful for any help and thought on the matter. Thank you!

我对此事的任何帮助和想法都非常感激。谢谢!

12 个解决方案

#1


gcc actually produces assembler and assembles it using the as assembler. Not all compilers do this - the MS compilers produce object code directly, though you can make them generate assembler output. Translating assembler to object code is a pretty simple process, at least compared with compilation.

gcc实际上生成汇编程序并使用as汇编程序汇编它。并非所有编译器都这样做 - MS编译器直接生成目标代码,但您可以使它们生成汇编器输出。将汇编程序转换为目标代码是一个非常简单的过程,至少与编译相比。

Some compilers produce other high-level language code as their output - for example, cfront, the first C++ compiler produced C as its output which was then compiled by a C compiler.

一些编译器生成其他高级语言代码作为其输出 - 例如,cfront,第一个C ++编译器生成C作为其输出,然后由C编译器编译。

Note that neither direct compilation or assembly actually produce an executable. That is done by the linker, which takes the various object code files produced by compilation/assembly, resolves all the names they contain and produces the final executable binary.

请注意,直接编译或汇编实际上都不会生成可执行文件。这是由链接器完成的,它接受编译/汇编产生的各种目标代码文件,解析它们包含的所有名称并生成最终的可执行二进制文件。

#2


Almost all compilers, including gcc, produce assembly code because it's easier---both to produce and to debug the compiler. The major exceptions are usually just-in-time compilers or interactive compilers, whose authors don't want the performance overhead or the hassle of forking a whole process to run the assembler. Some interesting examples include

几乎所有编译器(包括gcc)都会生成汇编代码,因为它更容易生成和调试编译器。主要的例外情况通常是即时编译器或交互式编译器,其作者不希望性能开销或分支整个进程运行汇编程序的麻烦。一些有趣的例子包括

  • Standard ML of New Jersey, which runs interactively and compiles every expression on the fly.

    新泽西州的标准ML,它以交互方式运行并动态编译每个表达。

  • The tinycc compiler, which is designed to be fast enough to compile, load, and run a C script in well under 100 milliseconds, and therefore doesn't want the overhead of calling the assembler and linker.

    tinycc编译器,其设计速度足以在100毫秒内编译,加载和运行C脚本,因此不需要调用汇编器和链接器的开销。

What these cases have in common is a desire for "instantaneous" response. Assemblers and linkers are plenty fast, but not quite good enough for interactive response. Yet.

这些案例的共同点是对“瞬时”反应的渴望。汇编程序和链接器速度很快,但不足以进行交互式响应。然而。

There are also a large family of languages, such as Smalltalk, Java, and Lua, which compile to bytecode, not assembly code, but whose implementations may later translate that bytecode directly to machine code without benefit of an assembler.

还有一大类语言,例如Smalltalk,Java和Lua,它们编译为字节码,而不是汇编代码,但是它的实现可能稍后将该字节码直接转换为机器代码,而无需汇编程序的好处。

(Footnote: in the early 1990s, Mary Fernandez and I wrote the New Jersey Machine Code Toolkit, for which the code is online, which generates C libraries that compiler writers can use to bypass the standard assembler and linker. Mary used it to roughly double the speed of her optimizing linker when generating a.out. If you don't write to disk, speedups are even greater...)

(脚注:在20世纪90年代早期,Mary Fernandez和我编写了新泽西机器代码工具包,其代码在线,它生成C库,编译器编写者可以使用它来绕过标准汇编器和链接器.Mary使用它大致加倍生成a.out时优化链接器的速度。如果不写入磁盘,速度会更快......)

#3


Compilers, in general, parse the source code into an Abstract Syntax Tree (an AST), then into some intermediate language. Only then, usually after some optimizations, they emit the target language.

通常,编译器将源代码解析为抽象语法树(AST),然后解析为某种中间语言。只有这样,通常在一些优化之后,它们才会发出目标语言。

About gcc, it can compile to a wide variety of targets. I don't know if for x86 it compiles to assembly first, but I did give you some insight onto compilers - and you asked for that too.

关于gcc,它可以编译为各种各样的目标。我不知道对于x86它是否首先编译为汇编,但我确实给了你一些关于编译器的见解 - 你也问过这个问题。

#4


According to chapter 2 of Introduction to Reverse Engineering Software (by Mike Perry and Nasko Oskov), both gcc and cl.exe (the back end compiler for MSVC++) have the -S switch you can use to output the assembly that each compiler produces.

根据反向工程软件简介(由Mike Perry和Nasko Oskov撰写)的第2章,gcc和cl.exe(MSVC ++的后端编译器)都有-S开关,可用于输出每个编译器生成的程序集。

You can also run gcc in verbose mode (gcc -v) to get a list of commands that it executes to see what it's doing behind the scenes.

您还可以在详细模式(gcc -v)中运行gcc以获取它执行的命令列表,以查看它在幕后执行的操作。

#5


GCC compiles to assembler. Some other compilers don't. For example, LLVM-GCC compiles to LLVM-assembly or LLVM-bytecode, which is then compiled to machine code. Almost all compilers have some sort of internal representation, LLVM-GCC use LLVM, and, IIRC, GCC uses something called GIMPLE.

GCC编译成汇编程序。其他一些编译器则没有。例如,LLVM-GCC编译为LLVM-assembly或LLVM-bytecode,然后编译为机器代码。几乎所有编译器都有某种内部表示,LLVM-GCC使用LLVM,而IIRC,GCC使用称为GIMPLE的东西。

#6


None of the answers clarifies the fact that an ASSEMBLER is the first layer of abstraction between BINARY CODE and MACHINE DEPENDENT SYMBOLIC CODE. A compiler is the second layer of abstraction between MACHINE DEPENDENT SYMBOLIC CODE and MACHINE INDEPENDENT SYMBOLIC CODE.

没有一个答案澄清了ASSEMBLER是BINARY CODE和MACHINE DEPENDENT SYMBOLIC CODE之间的第一层抽象这一事实。编译器是MACHINE DEPENDENT SYMBOLIC CODE和MACHINE INDEPENDENT SYMBOLIC CODE之间的第二层抽象。

If a compiler directly converts code to binary code, by definition, it will be called assembler and not a compiler.

如果编译器直接将代码转换为二进制代码,根据定义,它将被称为汇编程序而不是编译器。

It is more appropriate to say that a compiler uses INTERMEDIATE CODE which may or may not be assembly language e.g. Java uses byte code as intermediate code and byte code is assembler for java virtual machine (JVM).

更合适的是说编译器使用INTERMEDIATE CODE,它可能是也可能不是汇编语言,例如Java使用字节代码作为中间代码,字节代码是java虚拟机(JVM)的汇编程序。

EDIT: You may wonder why an assembler always produces machine dependent code and why a compiler is capable of producing machine independent code. The answer is very simple. An assembler is direct mapping of machine code and therefore assembly language it produces is always machine dependent. On the contrary, we can write more than one versions of a compiler for different machines. So to run our code independently of machine, we must compile same code but on the compiler version written for that machine.

编辑:您可能想知道为什么汇编程序总是生成机器相关代码以及编译器为什么能够生成与机器无关的代码。答案很简单。汇编程序是机器代码的直接映射,因此它生成的汇编语言始终取决于机器。相反,我们可以为不同的机器编写多个版本的编译器。因此,要独立于机器运行我们的代码,我们必须编译相同的代码,但是编写为该机器编写的编译器版本。

#7


Visual C++ has a switch to output assembly code, so I think it generates assembly code before outputting machine code.

Visual C ++有一个输出汇编代码的开关,所以我认为它在输出机器代码之前会生成汇编代码。

#8


In most multi-pass compilers assembly language is generated during the code generation steps. This allows you to write the lexer, syntax and semantic phases once and then generate executable code using a single assembler back end. this is used a lot in cross compilers such a C compilers that generates for a range of different cpu's.

在大多数多遍编译器中,在代码生成步骤期间生成汇编语言。这允许您编写词法分析器,语法和语义阶段一次,然后使用单个汇编程序后端生成可执行代码。这在交叉编译器中经常使用,例如为一系列不同的cpu生成的C编译器。

Just about every compiler has some form of this wheter its an implicit or explicity step.

几乎每个编译器都有某种形式,这是一个隐含的或明确的步骤。

#9


There are many phases of compilation. In abstract, there is the front end that reads the source code, breaks it up into tokens and finally into a parse tree.

编译有很多阶段。在摘要中,有前端读取源代码,将其分解为令牌,最后分解为解析树。

The back end is responsible for first generating a sequential code like three address code eg:

后端负责首先生成像三个地址代码的顺序代码,例如:

code:

x = y + z + w

into:

reg1 = y + z
x = reg1 + w

Then optimizing it, translating it into assembly and finally into machine language. All steps are layered carefully so that when needed, one of them can be replaced

然后优化它,将其转换为装配,最后转换为机器语言。所有步骤都经过仔细分层,以便在需要时可以更换其中一个步骤

#10


You'd probably be interested to listen to this pod cast: Internals of GCC

你可能有兴趣听听这个播客:GCC的内部

#11


Java compilers compile to java byte code (binary format) and then run this using a virtual machine (jvm).

Java编译器编译为java字节代码(二进制格式),然后使用虚拟机(jvm)运行它。

Whilst this may seem slow it - it can be faster because the JVM can take advantage of later CPU instructions and new optimizations. A C++ compiler won't do this - you have to target the instruction set at compile time.

虽然这可能看起来很慢 - 但它可以更快,因为JVM可以利用以后的CPU指令和新的优化。 C ++编译器不会这样做 - 你必须在编译时定位指令集。

#12


Although all compilers not convert the source code into an intermediate level code but there is a bridge of taking the source code to machine level code in several compilers

虽然所有编译器都没有将源代码转换为中间级代码,但是有一个桥梁将源代码转换为几个编译器中的机器级代码

#1


gcc actually produces assembler and assembles it using the as assembler. Not all compilers do this - the MS compilers produce object code directly, though you can make them generate assembler output. Translating assembler to object code is a pretty simple process, at least compared with compilation.

gcc实际上生成汇编程序并使用as汇编程序汇编它。并非所有编译器都这样做 - MS编译器直接生成目标代码,但您可以使它们生成汇编器输出。将汇编程序转换为目标代码是一个非常简单的过程,至少与编译相比。

Some compilers produce other high-level language code as their output - for example, cfront, the first C++ compiler produced C as its output which was then compiled by a C compiler.

一些编译器生成其他高级语言代码作为其输出 - 例如,cfront,第一个C ++编译器生成C作为其输出,然后由C编译器编译。

Note that neither direct compilation or assembly actually produce an executable. That is done by the linker, which takes the various object code files produced by compilation/assembly, resolves all the names they contain and produces the final executable binary.

请注意,直接编译或汇编实际上都不会生成可执行文件。这是由链接器完成的,它接受编译/汇编产生的各种目标代码文件,解析它们包含的所有名称并生成最终的可执行二进制文件。

#2


Almost all compilers, including gcc, produce assembly code because it's easier---both to produce and to debug the compiler. The major exceptions are usually just-in-time compilers or interactive compilers, whose authors don't want the performance overhead or the hassle of forking a whole process to run the assembler. Some interesting examples include

几乎所有编译器(包括gcc)都会生成汇编代码,因为它更容易生成和调试编译器。主要的例外情况通常是即时编译器或交互式编译器,其作者不希望性能开销或分支整个进程运行汇编程序的麻烦。一些有趣的例子包括

  • Standard ML of New Jersey, which runs interactively and compiles every expression on the fly.

    新泽西州的标准ML,它以交互方式运行并动态编译每个表达。

  • The tinycc compiler, which is designed to be fast enough to compile, load, and run a C script in well under 100 milliseconds, and therefore doesn't want the overhead of calling the assembler and linker.

    tinycc编译器,其设计速度足以在100毫秒内编译,加载和运行C脚本,因此不需要调用汇编器和链接器的开销。

What these cases have in common is a desire for "instantaneous" response. Assemblers and linkers are plenty fast, but not quite good enough for interactive response. Yet.

这些案例的共同点是对“瞬时”反应的渴望。汇编程序和链接器速度很快,但不足以进行交互式响应。然而。

There are also a large family of languages, such as Smalltalk, Java, and Lua, which compile to bytecode, not assembly code, but whose implementations may later translate that bytecode directly to machine code without benefit of an assembler.

还有一大类语言,例如Smalltalk,Java和Lua,它们编译为字节码,而不是汇编代码,但是它的实现可能稍后将该字节码直接转换为机器代码,而无需汇编程序的好处。

(Footnote: in the early 1990s, Mary Fernandez and I wrote the New Jersey Machine Code Toolkit, for which the code is online, which generates C libraries that compiler writers can use to bypass the standard assembler and linker. Mary used it to roughly double the speed of her optimizing linker when generating a.out. If you don't write to disk, speedups are even greater...)

(脚注:在20世纪90年代早期,Mary Fernandez和我编写了新泽西机器代码工具包,其代码在线,它生成C库,编译器编写者可以使用它来绕过标准汇编器和链接器.Mary使用它大致加倍生成a.out时优化链接器的速度。如果不写入磁盘,速度会更快......)

#3


Compilers, in general, parse the source code into an Abstract Syntax Tree (an AST), then into some intermediate language. Only then, usually after some optimizations, they emit the target language.

通常,编译器将源代码解析为抽象语法树(AST),然后解析为某种中间语言。只有这样,通常在一些优化之后,它们才会发出目标语言。

About gcc, it can compile to a wide variety of targets. I don't know if for x86 it compiles to assembly first, but I did give you some insight onto compilers - and you asked for that too.

关于gcc,它可以编译为各种各样的目标。我不知道对于x86它是否首先编译为汇编,但我确实给了你一些关于编译器的见解 - 你也问过这个问题。

#4


According to chapter 2 of Introduction to Reverse Engineering Software (by Mike Perry and Nasko Oskov), both gcc and cl.exe (the back end compiler for MSVC++) have the -S switch you can use to output the assembly that each compiler produces.

根据反向工程软件简介(由Mike Perry和Nasko Oskov撰写)的第2章,gcc和cl.exe(MSVC ++的后端编译器)都有-S开关,可用于输出每个编译器生成的程序集。

You can also run gcc in verbose mode (gcc -v) to get a list of commands that it executes to see what it's doing behind the scenes.

您还可以在详细模式(gcc -v)中运行gcc以获取它执行的命令列表,以查看它在幕后执行的操作。

#5


GCC compiles to assembler. Some other compilers don't. For example, LLVM-GCC compiles to LLVM-assembly or LLVM-bytecode, which is then compiled to machine code. Almost all compilers have some sort of internal representation, LLVM-GCC use LLVM, and, IIRC, GCC uses something called GIMPLE.

GCC编译成汇编程序。其他一些编译器则没有。例如,LLVM-GCC编译为LLVM-assembly或LLVM-bytecode,然后编译为机器代码。几乎所有编译器都有某种内部表示,LLVM-GCC使用LLVM,而IIRC,GCC使用称为GIMPLE的东西。

#6


None of the answers clarifies the fact that an ASSEMBLER is the first layer of abstraction between BINARY CODE and MACHINE DEPENDENT SYMBOLIC CODE. A compiler is the second layer of abstraction between MACHINE DEPENDENT SYMBOLIC CODE and MACHINE INDEPENDENT SYMBOLIC CODE.

没有一个答案澄清了ASSEMBLER是BINARY CODE和MACHINE DEPENDENT SYMBOLIC CODE之间的第一层抽象这一事实。编译器是MACHINE DEPENDENT SYMBOLIC CODE和MACHINE INDEPENDENT SYMBOLIC CODE之间的第二层抽象。

If a compiler directly converts code to binary code, by definition, it will be called assembler and not a compiler.

如果编译器直接将代码转换为二进制代码,根据定义,它将被称为汇编程序而不是编译器。

It is more appropriate to say that a compiler uses INTERMEDIATE CODE which may or may not be assembly language e.g. Java uses byte code as intermediate code and byte code is assembler for java virtual machine (JVM).

更合适的是说编译器使用INTERMEDIATE CODE,它可能是也可能不是汇编语言,例如Java使用字节代码作为中间代码,字节代码是java虚拟机(JVM)的汇编程序。

EDIT: You may wonder why an assembler always produces machine dependent code and why a compiler is capable of producing machine independent code. The answer is very simple. An assembler is direct mapping of machine code and therefore assembly language it produces is always machine dependent. On the contrary, we can write more than one versions of a compiler for different machines. So to run our code independently of machine, we must compile same code but on the compiler version written for that machine.

编辑:您可能想知道为什么汇编程序总是生成机器相关代码以及编译器为什么能够生成与机器无关的代码。答案很简单。汇编程序是机器代码的直接映射,因此它生成的汇编语言始终取决于机器。相反,我们可以为不同的机器编写多个版本的编译器。因此,要独立于机器运行我们的代码,我们必须编译相同的代码,但是编写为该机器编写的编译器版本。

#7


Visual C++ has a switch to output assembly code, so I think it generates assembly code before outputting machine code.

Visual C ++有一个输出汇编代码的开关,所以我认为它在输出机器代码之前会生成汇编代码。

#8


In most multi-pass compilers assembly language is generated during the code generation steps. This allows you to write the lexer, syntax and semantic phases once and then generate executable code using a single assembler back end. this is used a lot in cross compilers such a C compilers that generates for a range of different cpu's.

在大多数多遍编译器中,在代码生成步骤期间生成汇编语言。这允许您编写词法分析器,语法和语义阶段一次,然后使用单个汇编程序后端生成可执行代码。这在交叉编译器中经常使用,例如为一系列不同的cpu生成的C编译器。

Just about every compiler has some form of this wheter its an implicit or explicity step.

几乎每个编译器都有某种形式,这是一个隐含的或明确的步骤。

#9


There are many phases of compilation. In abstract, there is the front end that reads the source code, breaks it up into tokens and finally into a parse tree.

编译有很多阶段。在摘要中,有前端读取源代码,将其分解为令牌,最后分解为解析树。

The back end is responsible for first generating a sequential code like three address code eg:

后端负责首先生成像三个地址代码的顺序代码,例如:

code:

x = y + z + w

into:

reg1 = y + z
x = reg1 + w

Then optimizing it, translating it into assembly and finally into machine language. All steps are layered carefully so that when needed, one of them can be replaced

然后优化它,将其转换为装配,最后转换为机器语言。所有步骤都经过仔细分层,以便在需要时可以更换其中一个步骤

#10


You'd probably be interested to listen to this pod cast: Internals of GCC

你可能有兴趣听听这个播客:GCC的内部

#11


Java compilers compile to java byte code (binary format) and then run this using a virtual machine (jvm).

Java编译器编译为java字节代码(二进制格式),然后使用虚拟机(jvm)运行它。

Whilst this may seem slow it - it can be faster because the JVM can take advantage of later CPU instructions and new optimizations. A C++ compiler won't do this - you have to target the instruction set at compile time.

虽然这可能看起来很慢 - 但它可以更快,因为JVM可以利用以后的CPU指令和新的优化。 C ++编译器不会这样做 - 你必须在编译时定位指令集。

#12


Although all compilers not convert the source code into an intermediate level code but there is a bridge of taking the source code to machine level code in several compilers

虽然所有编译器都没有将源代码转换为中间级代码,但是有一个桥梁将源代码转换为几个编译器中的机器级代码