如何“链接”对象文件到可执行/编译的二进制文件?

时间:2023-02-12 04:54:45

Problem

问题

I wish to inject an object file into an existing binary. As a concrete example, consider a source Hello.c:

我希望将对象文件注入现有的二进制文件。作为一个具体的例子,请考虑一个来源Hello.c:

#include <stdlib.h>

int main(void)
{
    return EXIT_SUCCESS;
}

It can be compiled to an executable named Hello through gcc -std=gnu99 -Wall Hello.c -o Hello. Furthermore, now consider Embed.c:

可以通过gcc -std=gnu99 -Wall Hello将其编译为一个名为Hello的可执行文件。c - o你好。此外,现在考虑Embed.c:

func1(void)
{
}

An object file Embed.o can be created from this through gcc -c Embed.c. My question is how to generically insert Embed.o into Hello in such a way that the necessary relocations are performed, and the appropriate ELF internal tables (e.g. symbol table, PLT, etc.) are patched properly?

嵌入对象文件。o可以通过gcc -c嵌入来创建。我的问题是如何一般地插入Embed。o以这样一种方式向你好,需要进行必要的重新定位,合适的ELF内部表(如符号表、PLT等)被正确地修补了吗?


Assumptions

假设

It can be assumed that the object file to be embedded has its dependencies statically linked already. Any dynamic dependencies, such as the C runtime can be assumed to be present also in the target executable.

可以假定要嵌入的对象文件已经静态地链接了它的依赖项。任何动态依赖项(如C运行时)都可以假定在目标可执行文件中也存在。


Current Attempts/Ideas

当前的尝试/想法

  • Use libbfd to copy sections from the object file into the binary. The progress I have made with this is that I can create a new object with the sections from the original binary and the sections from the object file. The problem is that since the object file is relocatable, its sections can not be copied properly to the output without performing the relocations first.
  • 使用libbfd将对象文件中的部分复制到二进制文件中。我在这方面取得的进展是,我可以使用原始二进制文件中的节和对象文件中的节创建一个新对象。问题是,由于对象文件是可重定位的,如果不首先执行重定位,就不能将其部分正确复制到输出。
  • Convert the binary back to an object file and relink with ld. So far I tried using objcopy to perform the conversion objcopy --input elf64-x86-64 --output elf64-x86-64 Hello Hello.o. Evidently this does not work as I intend since ld -o Hello2 Embed.o Hello.o will then result in ld: error: Hello.o: unsupported ELF file type 2. I guess this should be expected though since Hello is not an object file.
  • 将二进制文件转换回对象文件并使用ld重新链接。到目前为止,我尝试使用objcopy来执行转换objcopy—输入elf64-x86-64—输出elf64-x86-64 Hello.o。显然,由于ld -o Hello2嵌入,这并不能正常工作。哦,你好。o将导致ld: error: Hello。o:不支持的ELF文件类型2。我想这应该是意料之中的,因为Hello不是对象文件。
  • Find an existing tool which performs this sort of insertion?
  • 找到执行这种插入的现有工具吗?

Rationale (Optional Read)

理由读(可选)

I am making a static executable editor, where the vision is to allow the instrumentation of arbitrary user-defined routines into an existing binary. This will work in two steps:

我正在创建一个静态的可执行编辑器,其中的目标是允许将任意用户定义的例程的工具插入到现有的二进制文件中。这将分为两个步骤:

  1. The injection of an object file (containing the user-defined routines) into the binary. This is a mandatory step and can not be worked around by alternatives such as injection of a shared object instead.
  2. 将对象文件(包含用户定义的例程)注入二进制文件的过程。这是一个强制性的步骤,不能通过替代方法(例如注入共享对象)来解决。
  3. Performing static analysis on the new binary and using this to statically detour routines from the original code to the newly added code.
  4. 对新的二进制代码执行静态分析,并使用它静态地将例程从原始代码转到新添加的代码。

I have, for the most part, already completed the work necessary for step 2, but I am having trouble with the injection of the object file. The problem is definitely solvable given that other tools use the same method of object injection (e.g. EEL).

在大多数情况下,我已经完成了步骤2所需的工作,但是我在对象文件的注入方面遇到了麻烦。考虑到其他工具使用相同的对象注入方法(如EEL),这个问题肯定是可以解决的。

7 个解决方案

#1


4  

If it were me, I'd look to create Embed.c into a shared object, libembed.so, like so:

如果是我,我会创建Embed。c到共享对象libembed。所以,像这样:

gcc -Wall -shared -fPIC -o libembed.so Embed.c

That should created a relocatable shared object from Embed.c. With that, you can force your target binary to load this shared object by setting the environment variable LD_PRELOAD when running it (see more information here):

这将创建一个可重定位的共享对象。通过这样,您可以强制您的目标二进制文件在运行它时设置环境变量LD_PRELOAD来加载这个共享对象(在这里可以看到更多的信息):

LD_PRELOAD=/path/to/libembed.so Hello

The "trick" here will be to figure out how to do your instrumentation, especially considering it's a static executable. There, I can't help you, but this is one way to have code present in a process' memory space. You'll probably want to do some sort of initialization in a constructor, which you can do with an attribute (if you're using gcc, at least):

这里的“诀窍”是弄清楚如何使用工具,特别是考虑到它是静态可执行文件。在这里,我不能帮助您,但是这是在一个进程的内存空间中拥有代码的一种方法。您可能希望在构造函数中进行某种初始化,您可以使用属性(至少如果您正在使用gcc):

void __attribute__ ((constructor)) my_init()
{
    // put code here!
}

#2


0  

You cannot do this in any practical way. The intended solution is to make that object into a shared lib and then call dlopen on it.

你不能用任何实际的方法做这件事。预期的解决方案是将该对象变为共享库,然后在其上调用dlopen。

#3


0  

The problem is that .o's are not fully linked yet, and most references are still symbolic. Binaries (shared libraries and executables) are one step closer to finally linked code.

问题是。o还没有完全链接,大多数引用仍然是象征性的。二进制文件(共享库和可执行文件)离最终链接的代码更近了一步。

Doing the linking step to a shared lib, doesn't mean you must load it via the dynamic lib loader. The suggestion is more that an own loader for a binary or shared lib might be simpler than for .o.

执行到共享库的链接步骤并不意味着必须通过动态库加载器加载它。建议更多的是,二进制或共享库的自加载程序可能比.o更简单。

Another possibility would be to customize that linking process yourself and call the linker and link it to be loaded on some fixed address. You might also look at the preparation of e.g. bootloaders, which also involve a basic linking step to do exactly this (fixate a piece of code to a known loading address).

另一种可能是自定义链接过程,并调用链接器并将其链接到某个固定地址上。您还可以查看引导加载程序的准备工作,它还涉及到一个基本的链接步骤来完成这一点(将一段代码固定到一个已知的加载地址)。

If you don't link to a fixed address, and want to relocate runtime you will have to write a basic linker that takes the object file, relocates it to the destination address by doing the appropriate fixups.

如果您不链接到一个固定地址,并且想要重新定位运行时,您将不得不编写一个基本的链接器来获取对象文件,通过进行适当的修复将其重新定位到目标地址。

I assume you already have it, seeing it is your master thesis, but this book: http://www.iecc.com/linker/ is the standard introduction about this.

我想你已经有了,因为这是你的硕士论文,但是这本书:http://www.iecc.com/linker/是关于这个的标准介绍。

#4


0  

Have you looked at the DyninstAPI? It appears support was recently added for linking a .o into a static executable.

你看过DyninstAPI吗?似乎最近添加了对将.o链接到静态可执行文件的支持。

From the release site:

从发布的网站:

Binary rewriter support for statically linked binaries on x86 and x86_64 platforms

在x86和x86_64平台上支持静态链接的二进制文件

#5


0  

You must make room for the relocatable code to fit in the executable by extending the executables text segment, just like a virus infection. Then after writing the relocatable code into that space, update the symbol table by adding symbols for anything in that relocatable object, and then apply the necessary relocation computations. I've written code that does this pretty well with 32bit ELF's.

您必须通过扩展可执行文本段(就像病毒感染一样)为可重定位代码留出空间,使其适合可执行文件。然后,在将可重定位代码写入该空间之后,通过为该可重定位对象中的任何内容添加符号来更新符号表,然后应用必要的重定位计算。我用32位精灵的代码写的很好。

#6


0  

Interesting thread. I have another concrete example of why this makes sense.

有趣的线程。我有另一个具体的例子来说明为什么这是有意义的。

I am playing with building a binary runtime encryption tool which should work on already compiled programs. What I would like to do is this:

我正在玩的是构建一个二进制运行时加密工具,该工具应该在已经编译好的程序上工作。我想做的是:

1) Encrypt certain sections of an elf (.text and such)

1)加密精灵的某些部分(。文本等)

2) Relink the elf with my decryption routines and a __attribute__((constructor)) function that calls the decryption on the encrypted sections

2)使用我的解密例程和__attribute__(构造函数)函数重新链接elf,该函数调用加密段上的解密

That way this will work with any programs without them knowing.

这样就可以在不知情的情况下运行任何程序。

I haven't found an easy way of doing this, so I may have to split the elf apart and add stuff to it myself.

我还没有找到一个简单的方法来做这件事,所以我可能得把精灵分开,自己添加一些东西。

#7


0  

Assuming source code for first executable is available and is compiled with a linker script that allocates space for later object file(s), there is a relatively simpler solution. Since I am currently working on an ARM project examples below are compiled with the GNU ARM cross-compiler.

假设第一个可执行文件的源代码是可用的,并且使用链接器脚本进行编译,该脚本为后面的对象文件分配空间,那么有一个相对简单的解决方案。由于我目前正在处理一个ARM项目,下面的示例是用GNU ARM交叉编译器编译的。

Primary source code file, hello.c

主源代码文件,hello.c

#include <stdio.h>

int main ()
{

   return 0;
}

is built with a simple linker script allocating space for an object to be embedded later:

是使用一个简单的链接器脚本构建的,该脚本为以后要嵌入的对象分配空间:

SECTIONS
{
    .text :
    {
        KEEP (*(embed)) ;

        *(.text .text*) ;
    }
}

Like:

如:

arm-none-eabi-gcc -nostartfiles -Ttest.ld -o hello hello.c
readelf -s hello

Num:    Value  Size Type    Bind   Vis      Ndx Name
 0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND 
 1: 00000000     0 SECTION LOCAL  DEFAULT    1 
 2: 00000000     0 SECTION LOCAL  DEFAULT    2 
 3: 00000000     0 SECTION LOCAL  DEFAULT    3 
 4: 00000000     0 FILE    LOCAL  DEFAULT  ABS hello.c
 5: 00000000     0 NOTYPE  LOCAL  DEFAULT    1 $a
 6: 00000000     0 FILE    LOCAL  DEFAULT  ABS 
 7: 00000000    28 FUNC    GLOBAL DEFAULT    1 main

Now lets compile the object to be embedded whose source is in embed.c

现在,让我们编译要嵌入的对象,该对象的源代码位于嵌入式中

void func1()
{
   /* Something useful here */
}

Recompile with the same linker script this time inserting new symbols:

使用同样的链接脚本重新编译,这次插入新的符号:

arm-none-eabi-gcc -c embed.c
arm-none-eabi-gcc -nostartfiles -Ttest.ld -o new_hello hello embed.o

See the results:

查看结果:

readelf -s new_hello
Num:    Value  Size Type    Bind   Vis      Ndx Name
 0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND 
 1: 00000000     0 SECTION LOCAL  DEFAULT    1 
 2: 00000000     0 SECTION LOCAL  DEFAULT    2 
 3: 00000000     0 SECTION LOCAL  DEFAULT    3 
 4: 00000000     0 FILE    LOCAL  DEFAULT  ABS hello.c
 5: 00000000     0 NOTYPE  LOCAL  DEFAULT    1 $a
 6: 00000000     0 FILE    LOCAL  DEFAULT  ABS 
 7: 00000000     0 FILE    LOCAL  DEFAULT  ABS embed.c
 8: 0000001c     0 NOTYPE  LOCAL  DEFAULT    1 $a
 9: 00000000     0 FILE    LOCAL  DEFAULT  ABS 
10: 0000001c    20 FUNC    GLOBAL DEFAULT    1 func1
11: 00000000    28 FUNC    GLOBAL DEFAULT    1 main

#1


4  

If it were me, I'd look to create Embed.c into a shared object, libembed.so, like so:

如果是我,我会创建Embed。c到共享对象libembed。所以,像这样:

gcc -Wall -shared -fPIC -o libembed.so Embed.c

That should created a relocatable shared object from Embed.c. With that, you can force your target binary to load this shared object by setting the environment variable LD_PRELOAD when running it (see more information here):

这将创建一个可重定位的共享对象。通过这样,您可以强制您的目标二进制文件在运行它时设置环境变量LD_PRELOAD来加载这个共享对象(在这里可以看到更多的信息):

LD_PRELOAD=/path/to/libembed.so Hello

The "trick" here will be to figure out how to do your instrumentation, especially considering it's a static executable. There, I can't help you, but this is one way to have code present in a process' memory space. You'll probably want to do some sort of initialization in a constructor, which you can do with an attribute (if you're using gcc, at least):

这里的“诀窍”是弄清楚如何使用工具,特别是考虑到它是静态可执行文件。在这里,我不能帮助您,但是这是在一个进程的内存空间中拥有代码的一种方法。您可能希望在构造函数中进行某种初始化,您可以使用属性(至少如果您正在使用gcc):

void __attribute__ ((constructor)) my_init()
{
    // put code here!
}

#2


0  

You cannot do this in any practical way. The intended solution is to make that object into a shared lib and then call dlopen on it.

你不能用任何实际的方法做这件事。预期的解决方案是将该对象变为共享库,然后在其上调用dlopen。

#3


0  

The problem is that .o's are not fully linked yet, and most references are still symbolic. Binaries (shared libraries and executables) are one step closer to finally linked code.

问题是。o还没有完全链接,大多数引用仍然是象征性的。二进制文件(共享库和可执行文件)离最终链接的代码更近了一步。

Doing the linking step to a shared lib, doesn't mean you must load it via the dynamic lib loader. The suggestion is more that an own loader for a binary or shared lib might be simpler than for .o.

执行到共享库的链接步骤并不意味着必须通过动态库加载器加载它。建议更多的是,二进制或共享库的自加载程序可能比.o更简单。

Another possibility would be to customize that linking process yourself and call the linker and link it to be loaded on some fixed address. You might also look at the preparation of e.g. bootloaders, which also involve a basic linking step to do exactly this (fixate a piece of code to a known loading address).

另一种可能是自定义链接过程,并调用链接器并将其链接到某个固定地址上。您还可以查看引导加载程序的准备工作,它还涉及到一个基本的链接步骤来完成这一点(将一段代码固定到一个已知的加载地址)。

If you don't link to a fixed address, and want to relocate runtime you will have to write a basic linker that takes the object file, relocates it to the destination address by doing the appropriate fixups.

如果您不链接到一个固定地址,并且想要重新定位运行时,您将不得不编写一个基本的链接器来获取对象文件,通过进行适当的修复将其重新定位到目标地址。

I assume you already have it, seeing it is your master thesis, but this book: http://www.iecc.com/linker/ is the standard introduction about this.

我想你已经有了,因为这是你的硕士论文,但是这本书:http://www.iecc.com/linker/是关于这个的标准介绍。

#4


0  

Have you looked at the DyninstAPI? It appears support was recently added for linking a .o into a static executable.

你看过DyninstAPI吗?似乎最近添加了对将.o链接到静态可执行文件的支持。

From the release site:

从发布的网站:

Binary rewriter support for statically linked binaries on x86 and x86_64 platforms

在x86和x86_64平台上支持静态链接的二进制文件

#5


0  

You must make room for the relocatable code to fit in the executable by extending the executables text segment, just like a virus infection. Then after writing the relocatable code into that space, update the symbol table by adding symbols for anything in that relocatable object, and then apply the necessary relocation computations. I've written code that does this pretty well with 32bit ELF's.

您必须通过扩展可执行文本段(就像病毒感染一样)为可重定位代码留出空间,使其适合可执行文件。然后,在将可重定位代码写入该空间之后,通过为该可重定位对象中的任何内容添加符号来更新符号表,然后应用必要的重定位计算。我用32位精灵的代码写的很好。

#6


0  

Interesting thread. I have another concrete example of why this makes sense.

有趣的线程。我有另一个具体的例子来说明为什么这是有意义的。

I am playing with building a binary runtime encryption tool which should work on already compiled programs. What I would like to do is this:

我正在玩的是构建一个二进制运行时加密工具,该工具应该在已经编译好的程序上工作。我想做的是:

1) Encrypt certain sections of an elf (.text and such)

1)加密精灵的某些部分(。文本等)

2) Relink the elf with my decryption routines and a __attribute__((constructor)) function that calls the decryption on the encrypted sections

2)使用我的解密例程和__attribute__(构造函数)函数重新链接elf,该函数调用加密段上的解密

That way this will work with any programs without them knowing.

这样就可以在不知情的情况下运行任何程序。

I haven't found an easy way of doing this, so I may have to split the elf apart and add stuff to it myself.

我还没有找到一个简单的方法来做这件事,所以我可能得把精灵分开,自己添加一些东西。

#7


0  

Assuming source code for first executable is available and is compiled with a linker script that allocates space for later object file(s), there is a relatively simpler solution. Since I am currently working on an ARM project examples below are compiled with the GNU ARM cross-compiler.

假设第一个可执行文件的源代码是可用的,并且使用链接器脚本进行编译,该脚本为后面的对象文件分配空间,那么有一个相对简单的解决方案。由于我目前正在处理一个ARM项目,下面的示例是用GNU ARM交叉编译器编译的。

Primary source code file, hello.c

主源代码文件,hello.c

#include <stdio.h>

int main ()
{

   return 0;
}

is built with a simple linker script allocating space for an object to be embedded later:

是使用一个简单的链接器脚本构建的,该脚本为以后要嵌入的对象分配空间:

SECTIONS
{
    .text :
    {
        KEEP (*(embed)) ;

        *(.text .text*) ;
    }
}

Like:

如:

arm-none-eabi-gcc -nostartfiles -Ttest.ld -o hello hello.c
readelf -s hello

Num:    Value  Size Type    Bind   Vis      Ndx Name
 0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND 
 1: 00000000     0 SECTION LOCAL  DEFAULT    1 
 2: 00000000     0 SECTION LOCAL  DEFAULT    2 
 3: 00000000     0 SECTION LOCAL  DEFAULT    3 
 4: 00000000     0 FILE    LOCAL  DEFAULT  ABS hello.c
 5: 00000000     0 NOTYPE  LOCAL  DEFAULT    1 $a
 6: 00000000     0 FILE    LOCAL  DEFAULT  ABS 
 7: 00000000    28 FUNC    GLOBAL DEFAULT    1 main

Now lets compile the object to be embedded whose source is in embed.c

现在,让我们编译要嵌入的对象,该对象的源代码位于嵌入式中

void func1()
{
   /* Something useful here */
}

Recompile with the same linker script this time inserting new symbols:

使用同样的链接脚本重新编译,这次插入新的符号:

arm-none-eabi-gcc -c embed.c
arm-none-eabi-gcc -nostartfiles -Ttest.ld -o new_hello hello embed.o

See the results:

查看结果:

readelf -s new_hello
Num:    Value  Size Type    Bind   Vis      Ndx Name
 0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND 
 1: 00000000     0 SECTION LOCAL  DEFAULT    1 
 2: 00000000     0 SECTION LOCAL  DEFAULT    2 
 3: 00000000     0 SECTION LOCAL  DEFAULT    3 
 4: 00000000     0 FILE    LOCAL  DEFAULT  ABS hello.c
 5: 00000000     0 NOTYPE  LOCAL  DEFAULT    1 $a
 6: 00000000     0 FILE    LOCAL  DEFAULT  ABS 
 7: 00000000     0 FILE    LOCAL  DEFAULT  ABS embed.c
 8: 0000001c     0 NOTYPE  LOCAL  DEFAULT    1 $a
 9: 00000000     0 FILE    LOCAL  DEFAULT  ABS 
10: 0000001c    20 FUNC    GLOBAL DEFAULT    1 func1
11: 00000000    28 FUNC    GLOBAL DEFAULT    1 main