使用Objdump的结果构建控制流图

I'm attempting to build a control-flow graph of the assembly results that are returned via a call to objdump -d . Currently the best method I've come up with is to put each line of the result into a linked list, and separate out the memory address, opcode, and operands for each line. I'm separating them out by relying on the regular nature of objdump results (the memory address is from character 2 to character 7 in the string that represents each line) .

我正在尝试构建一个通过调用objdump -d返回的汇编结果的控制流图。目前我提出的最好的方法是将结果的每一行放入链表中,并将每行的内存地址,操作码和操作数分开。我依靠objdump结果的常规性质将它们分开(内存地址是字符串2到字符串中代表每一行的字符7)。

Once this is done I start the actual CFG instruction. Each node in the CFG holds a starting and ending memory address, a pointer to the previous basic block, and pointers to any child basic blocks. I'm then going through the objdump results and comparing the opcode against an array of all control-flow opcodes in x86_64. If the opcode is a control-flow one, I record the address as the end of the basic block, and depending on the opcode either add two child pointers (conditional opcode) or one (call or return ) .

完成后,我启动实际的CFG指令。 CFG中的每个节点都包含一个起始和结束内存地址,一个指向前一个基本块的指针,以及指向任何子基本块的指针。然后我将浏览objdump结果并将操作码与x86_64中所有控制流操作码的数组进行比较。如果操作码是控制流操作码,我将地址记录为基本块的末尾,并根据操作码添加两个子指针(条件操作码)或一个(调用或返回)。

I'm in the process of implementing this in C, and it seems like it will work but feels very tenuous. Does anyone have any suggestions, or anything that I'm not taking into account?

我正在用C实现这个过程,看起来它会起作用但感觉非常脆弱。有没有人有任何建议,或者我没有考虑到的任何事情?

Thanks for taking the time to read this!

感谢您抽时间阅读!

edit:

The idea is to use it to compare stack traces of system calls generated by DynamoRIO against the expected CFG for a target binary, I'm hoping that building it like this will facilitate that. I haven't re-used what's available because A) I hadn't really though about it and B) I need to get the graph into a usable data structure so I can do path comparisons. I'm going to take a look at some of the utilities on the page you lined to, thanks for pointing me in the right direction. Thanks for your comments, I really appreciate it!

我的想法是使用它来比较DynamoRIO生成的系统调用的堆栈跟踪与目标二进制文件的预期CFG,我希望像这样构建它会促进这一点。我没有重复使用可用的东西,因为A)我并没有真正关于它和B)我需要将图形转换为可用的数据结构,以便我可以进行路径比较。我将看一下您排行的页面上的一些实用程序,感谢您指出我正确的方向。感谢您的评论,我真的很感激!

2 个解决方案

#1

You should use an IL that was designed for program analysis. There are a few.

您应该使用专为程序分析而设计的IL。有几个。

The DynInst project (dyninst.org) has a lifter that can translate from ELF binaries into CFGs for functions/programs (or it did the last time I looked). DynInst is written in C++.

DynInst项目(dyninst.org)有一个升级器,可以从ELF二进制文件转换为函数/程序的CFG(或者我最后一次查看)。 DynInst是用C ++编写的。

BinNavi uses the ouput from IDA (the Interactive Disassembler) to build an IL out of control flow graphs that IDA identifies. I would also recommend a copy of IDA, it will let you spot check CFGs visually. Once you have a program in BinNavi you can get its IL representation of a function/CFG.

BinNavi使用来自IDA(交互式反汇编程序)的输出来构建IDA识别的IL失控流程图。我还会推荐一份IDA副本,它会让你直观地检查CFG。一旦你在BinNavi有一个程序,你就可以获得它的函数/ CFG的IL表示。

Function pointers are just the start of your troubles for statically identifying the control flow graph. Jump tables (the kinds generated for switch case statements in certain cases, by hand in others) throw a wrench in as well. Every code analysis framework I know of deals with those in a very heuristics-heavy approach. Then you have exceptions and exception handling, and also self-modifying code.

函数指针只是静态识别控制流图的麻烦的开始。跳转表(在某些情况下为交换机案例语句生成的类型,在其他情况下手动生成)也会引发一个问题。我所知道的每个代码分析框架都采用极具启发性的方法处理。然后你有异常和异常处理,以及自修改代码。

Good luck! You're getting a lot of information out of the DynamoRIO trace already, I suggest you utilize as much information as you can from that trace...

祝好运!您已经从DynamoRIO跟踪中获取了大量信息,我建议您尽可能多地利用该跟踪中的信息......

#2

I found your question since I was interested in looking for the same thing. I found nothing and wrote a simple python script for this and threw it on github: https://github.com/zestrada/playground/blob/master/objdump_cfg/objdump_to_cfg.py

我找到了你的问题,因为我有兴趣寻找相同的东西。我没有找到任何内容并为此编写了一个简单的python脚本并将其扔在github上:https://github.com/zestrada/playground/blob/master/objdump_cfg/objdump_to_cfg.py

Note that I have some heuristics to deal with functions that never return, the gcc stack protector on 32bit x86, etc... You may or may not want such things.

请注意,我有一些启发式方法来处理永不返回的函数,32位x86上的gcc堆栈保护程序等等......您可能想要也可能不想要这样的东西。

I treat indirect calls similar to how you do (basically have a node in the graph that is a source when returning from an indirect).

我将间接调用视为与你的方式类似(基本上在图中有一个节点,当从间接返回时是一个源)。

Hopefully this is helpful for anyone looking to do similar analysis with similar restrictions.

希望这对于任何希望使用类似限制进行类似分析的人都有帮助。

#1