为什么用-fpic和-pie编译的程序有重新定位表?

时间:2023-01-02 22:53:43

If a trivial program is compiled with the following command:

如果用以下命令编译一个普通的程序:

arm-none-eabi-gcc -shared -fpic -pie --specs=nosys.specs simple.c -o simple.exe

and the relocation entries are printed with the command:

重新定位条目打印为:

arm-none-eabi-readelf simple.exe -r

There are a bunch of relocation entries section (see below).

有一堆重新定位条目部分(见下面)。

Since -fpic / -pie flags cause the compiler to generate a position independent executable, my naive (and clearly incorrect) assumption is that there is no need for a relocation table because the loader can place the executable image anywhere without issue. So why is there a relocation table there at all, and does this indicate that the code isn't actually position independent?

由于-fpic / -pie标志会导致编译器生成独立于位置的可执行文件,所以我天真地(显然是错误的)假设不需要重新定位表,因为加载程序可以将可执行映像放置在任何地方,而不会出现问题。那么,为什么会有一个重新定位表呢?这是否表明代码实际上不是独立于位置的呢?

Relocation section '.rel.dyn' at offset 0x82d4 contains 37 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
000084a8  00000017 R_ARM_RELATIVE   
000084d0  00000017 R_ARM_RELATIVE   
00008508  00000017 R_ARM_RELATIVE   
00008510  00000017 R_ARM_RELATIVE   
0000855c  00000017 R_ARM_RELATIVE   
00008560  00000017 R_ARM_RELATIVE   
00008564  00000017 R_ARM_RELATIVE   
00008678  00000017 R_ARM_RELATIVE   
0000867c  00000017 R_ARM_RELATIVE   
0000870c  00000017 R_ARM_RELATIVE   
00008710  00000017 R_ARM_RELATIVE   
00008714  00000017 R_ARM_RELATIVE   
00008718  00000017 R_ARM_RELATIVE   
00008978  00000017 R_ARM_RELATIVE   
000089dc  00000017 R_ARM_RELATIVE   
000089e0  00000017 R_ARM_RELATIVE   
00008abc  00000017 R_ARM_RELATIVE   
00008ae4  00000017 R_ARM_RELATIVE   
00018af4  00000017 R_ARM_RELATIVE   
00018af8  00000017 R_ARM_RELATIVE   
00018afc  00000017 R_ARM_RELATIVE   
00018c04  00000017 R_ARM_RELATIVE   
00018c08  00000017 R_ARM_RELATIVE   
00018c0c  00000017 R_ARM_RELATIVE   
00018c34  00000017 R_ARM_RELATIVE   
00019028  00000017 R_ARM_RELATIVE   
000084cc  00000c02 R_ARM_ABS32       00000000   __libc_fini
0000850c  00000602 R_ARM_ABS32       00000000   __deregister_frame_inf
00008558  00001302 R_ARM_ABS32       00000000   __register_frame_info
00008568  00001202 R_ARM_ABS32       00000000   _Jv_RegisterClasses
00008664  00000d02 R_ARM_ABS32       00000000   __stack
00008668  00000a02 R_ARM_ABS32       00000000   hardware_init_hook
0000866c  00000802 R_ARM_ABS32       00000000   software_init_hook
00008670  00000502 R_ARM_ABS32       0001902c   __bss_start__
00008674  00000702 R_ARM_ABS32       00019048   __bss_end__
0000897c  00001402 R_ARM_ABS32       00000000   free
00008ac0  00000402 R_ARM_ABS32       00000000   malloc

Relocation section '.rel.plt' at offset 0x83fc contains 4 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
00018be8  00000416 R_ARM_JUMP_SLOT   00000000   malloc
00018bec  00000616 R_ARM_JUMP_SLOT   00000000   __deregister_frame_inf
00018bf0  00001316 R_ARM_JUMP_SLOT   00000000   __register_frame_info
00018bf4  00001416 R_ARM_JUMP_SLOT   00000000   free

1 个解决方案

#1


0  

An executable consists of several sections. While actual implementation details differ, these can be roughly categorized in four groups:

可执行文件由几个部分组成。虽然实际实现细节不同,但大致可分为四类:

  1. Read-Only Executable Code, also known as "Text"
  2. 只读可执行代码,也称为“文本”
  3. Read-Only Constant Data (global constants)
  4. 只读常量数据(全局常量)
  5. (Initialized) Read-Write Data (global variables with initializers)
  6. (初始化)读写数据(带有初始化器的全局变量)
  7. Uninitialized Read-Write Data (other global variables, initialized to 0)
  8. 未初始化的读写数据(其他全局变量,初始化为0)

Non-position-independent code contains a lot of references to the addresses of functions, global variables and global constsants.

非位置无关的代码包含了对函数地址、全局变量和全局常量的大量引用。

Read-Only Data and Initialized Read-Write Data sometimes contain references to the addresses of functions, global variables and global constsants:

只读数据和初始化的读写数据有时包含对函数地址、全局变量和全局常量的引用:

int x;
int *y = &x; // y needs a relocation.

The loader can relocate code based on relocations, there are only two problems:

加载程序可以根据重新定位重新定位代码,只有两个问题:

  1. Relocations take time on program startup / library loading
  2. 重新定位在程序启动/库加载时需要时间
  3. If we relocate, we now have an in-RAM modified copy of the text segment, which is different for every process that loads our library, so we will be wasting RAM.
  4. 如果重新放置,我们现在有一个内存中修改的文本段副本,这对于装载库的每个进程都是不同的,因此我们将浪费RAM。

Now for the real answer: PIC was intended to solve the above problems by getting rid of text relocations, not to get rid of all relocations.

现在真正的答案是:PIC的目的是通过去除文本重定位来解决上述问题,而不是去除所有重定位。

There are comparatively few relocations in read-only data and initialized data, so neither (1.) nor (2.) are usually an issue. We don't even care about (2.) for read-write data, as we need separate copies of that for each process, anyway. And in fact, there is no way for the compiler to make data position-independent, because if you asked for a global int* y = &x; then the compiler has no choice but to put the pointer there.

在只读数据和初始化数据中有相对较少的重新定位,所以(1.)和(2)通常都是一个问题。我们甚至不关心(2)读-写数据,因为每个进程都需要它的独立副本。实际上,编译器无法使数据位置独立,因为如果你要求全局int* y = &x;然后编译器别无选择,只能把指针放在那里。

Now, how is code made position-independent? That depends on the platform, but it often involves a few relatively inefficient operations, or the processor imposes arbitrary limits on the maximum offsets used in the more efficient instructions for accessing data & code in a position-independent way. Also, dynamic linking means the address of some functions isn't even known as a relative offset, either. So, compilers tend to use tables that contain the actual addresses, and the code will look up the actual addresses from the table. The tables, variously known as GOT, TOC, PLT and probably a few other names on different platforms, will likely be Constant Data with lots of relocations.

现在,如何使代码独立于位置?这取决于平台,但它通常涉及一些相对低效的操作,或者处理器对以位置无关的方式访问数据和代码的更有效指令中使用的最大偏移量施加任意限制。此外,动态链接意味着一些函数的地址甚至不被称为相对偏移量。因此,编译器倾向于使用包含实际地址的表,代码将从表中查找实际地址。这些表,在不同的平台上有不同的名称,TOC, PLT,可能还有一些其他的名字,很可能是大量的重新定位的数据。

If relocations can't be avoided, the idea is to put them all into one place to minimize problems (1.) and (2.).

如果无法避免重新定位,那么我们的想法是将它们放在一个地方,以最小化问题(1.)和(2.)。

#1


0  

An executable consists of several sections. While actual implementation details differ, these can be roughly categorized in four groups:

可执行文件由几个部分组成。虽然实际实现细节不同,但大致可分为四类:

  1. Read-Only Executable Code, also known as "Text"
  2. 只读可执行代码,也称为“文本”
  3. Read-Only Constant Data (global constants)
  4. 只读常量数据(全局常量)
  5. (Initialized) Read-Write Data (global variables with initializers)
  6. (初始化)读写数据(带有初始化器的全局变量)
  7. Uninitialized Read-Write Data (other global variables, initialized to 0)
  8. 未初始化的读写数据(其他全局变量,初始化为0)

Non-position-independent code contains a lot of references to the addresses of functions, global variables and global constsants.

非位置无关的代码包含了对函数地址、全局变量和全局常量的大量引用。

Read-Only Data and Initialized Read-Write Data sometimes contain references to the addresses of functions, global variables and global constsants:

只读数据和初始化的读写数据有时包含对函数地址、全局变量和全局常量的引用:

int x;
int *y = &x; // y needs a relocation.

The loader can relocate code based on relocations, there are only two problems:

加载程序可以根据重新定位重新定位代码,只有两个问题:

  1. Relocations take time on program startup / library loading
  2. 重新定位在程序启动/库加载时需要时间
  3. If we relocate, we now have an in-RAM modified copy of the text segment, which is different for every process that loads our library, so we will be wasting RAM.
  4. 如果重新放置,我们现在有一个内存中修改的文本段副本,这对于装载库的每个进程都是不同的,因此我们将浪费RAM。

Now for the real answer: PIC was intended to solve the above problems by getting rid of text relocations, not to get rid of all relocations.

现在真正的答案是:PIC的目的是通过去除文本重定位来解决上述问题,而不是去除所有重定位。

There are comparatively few relocations in read-only data and initialized data, so neither (1.) nor (2.) are usually an issue. We don't even care about (2.) for read-write data, as we need separate copies of that for each process, anyway. And in fact, there is no way for the compiler to make data position-independent, because if you asked for a global int* y = &x; then the compiler has no choice but to put the pointer there.

在只读数据和初始化数据中有相对较少的重新定位,所以(1.)和(2)通常都是一个问题。我们甚至不关心(2)读-写数据,因为每个进程都需要它的独立副本。实际上,编译器无法使数据位置独立,因为如果你要求全局int* y = &x;然后编译器别无选择,只能把指针放在那里。

Now, how is code made position-independent? That depends on the platform, but it often involves a few relatively inefficient operations, or the processor imposes arbitrary limits on the maximum offsets used in the more efficient instructions for accessing data & code in a position-independent way. Also, dynamic linking means the address of some functions isn't even known as a relative offset, either. So, compilers tend to use tables that contain the actual addresses, and the code will look up the actual addresses from the table. The tables, variously known as GOT, TOC, PLT and probably a few other names on different platforms, will likely be Constant Data with lots of relocations.

现在,如何使代码独立于位置?这取决于平台,但它通常涉及一些相对低效的操作,或者处理器对以位置无关的方式访问数据和代码的更有效指令中使用的最大偏移量施加任意限制。此外,动态链接意味着一些函数的地址甚至不被称为相对偏移量。因此,编译器倾向于使用包含实际地址的表,代码将从表中查找实际地址。这些表,在不同的平台上有不同的名称,TOC, PLT,可能还有一些其他的名字,很可能是大量的重新定位的数据。

If relocations can't be avoided, the idea is to put them all into one place to minimize problems (1.) and (2.).

如果无法避免重新定位,那么我们的想法是将它们放在一个地方,以最小化问题(1.)和(2.)。