从ANSI C代码获取控制流图

时间:2023-02-02 20:23:06

I'm building tool for testing ansi c applications. Simply load code, view control flow graph, run test, mark all vertexes which was hit. I'm trying to build CFG all by myself from parsing code. Unfortunately It gets messed up if code is nested. GCC gives ability to get CFG from compiled code. I might write parser for its output, but I need line numbers for setting breakpoints. Is there way for getting line numbers when outputting Control Flow Graph with -fdump-tree-cfg or -fdump-tree-vcg?

我正在构建用于测试ansi c应用程序的工具。只需加载代码,查看控制流图,运行测试,标记所有被击中的顶点。我正在尝试通过解析代码来自己构建CFG。不幸的是,如果代码嵌套,它会搞砸。 GCC提供从编译代码中获取CFG的能力。我可能会为其输出编写解析器,但我需要行号来设置断点。使用-fdump-tree-cfg或-fdump-tree-vcg输出控制流图时,是否可以获取行号?

3 个解决方案

#1


15  

For the control flow graph of a C Program you could look at existing Python parsers for C:

对于C程序的控制流图,您可以查看C的现有Python解析器:

Call graphs are a closely related construct to control flow graphs. There are several approaches available to create call graphs (function dependencies) for C code. This might prove of help for progressing with control flow graph generation. Ways to create dependency graphs in C:

调用图是控制流图的密切相关的构造。有几种方法可用于为C代码创建调用图(函数依赖性)。这可能有助于推进控制流图生成。在C中创建依赖图的方法:

  • Using cflow:

    • cflow +pycflow2dot +dot (GPL, BSD) cflow is robust, because it can handle code which cannot compile, e.g. missing includes. If preprocessor directives are heavily used, it may need the --cpp option to preprocess the code.
    • cflow + pycflow2dot + dot(GPL,BSD)cflow是健壮的,因为它可以处理无法编译的代码,例如:遗失包括。如果大量使用预处理程序指令,则可能需要使用--cpp选项来预处理代码。

    • cflow + cflow2dot + dot (GPL v2, GPL v3, Eclipse Public License (EPL) v1) (note that cflow2dot needs some path fixing before it works)
    • cflow + cflow2dot + dot(GPL v2,GPL v3,Eclipse公共许可证(EPL)v1)(注意cflow2dot在运行之前需要一些路径修复)

    • cflow +cflow2dot.bash (GPL v2, ?)
    • cflow + cflow2dot.bash(GPL v2 ,?)

    • cflow +cflow2vcg (GPL v2 , GPL v2)
    • cflow + cflow2vcg(GPL v2,GPL v2)

    • enhanced cflow (GPL v2) with list to exclude symbols from graph
    • 增强型cflow(GPL v2),带有列表以从图中排除符号

  • 使用cflow:cflow + pycflow2dot + dot(GPL,BSD)cflow是健壮的,因为它可以处理无法编译的代码,例如:遗失包括。如果大量使用预处理程序指令,则可能需要使用--cpp选项来预处理代码。 cflow + cflow2dot + dot(GPL v2,GPL v3,Eclipse公共许可证(EPL)v1)(注意cflow2dot在运行之前需要一些路径修复)cflow + cflow2dot.bash(GPL v2,?)cflow + cflow2vcg(GPL v2, GPL v2)增强了cflow(GPL v2),列表可以从图中排除符号

  • Using cscope:

    • cscope (BSD)
    • cscope +callgraphviz +dot +xdot
    • cscope + callgraphviz + dot + xdot

    • cscope +vim CCTree (C Call-Tree Explorer)
    • cscope + vim CCTree(C Call-Tree Explorer)

    • cscope +ccglue
    • cscope +CodeQuery for C, C++, Python & Java
    • cscope + CodeQuery for C,C ++,Python和Java

    • cscope +Python html producer
    • cscope + Python html制作人

    • cscope +calltree.sh
  • 使用cscope:cscope(BSD)cscope + callgraphviz + dot + xdot cscope + vim CCTree(C Call-Tree Explorer)cscope + ccglue cscope + CodeQuery for C,C ++,Python&Java cscope + Python html producer cscope + calltree.sh

  • ncc (cflow like)

    ncc(像cflow一样)

  • KCachegrind (KDE dependency viewer)
  • KCachegrind(KDE依赖查看器)

  • Calltree

The following tools unfortunately require that the code be compilable, because they depend on output from gcc:

遗憾的是,以下工具要求代码可编译,因为它们依赖于gcc的输出:

  • CodeViz (GPL v2) (weak point: needs compilable source, because it uses gcc to dump cdepn files)
  • CodeViz(GPL v2)(弱点:需要可编译源,因为它使用gcc转储cdepn文件)

  • gcc +egypt +dot (GPL v*, Perl = GPL | Artistic license, EPL v1) (egypt uses gcc to produce RTL, so fails for any buggy source code, or even in case you just want to focus on a single file from a larger project. Therefore, it is not very useful compared to the more robust cflow-based toolchains. Note that egypt has by default good support for excluding library calls from the graph, to make it cleaner.
  • gcc + egypt + dot(GPL v *,Perl = GPL | Artistic license,EPL v1)(埃及使用gcc生成RTL,因此任何错误的源代码都会失败,或者即使您只想关注单个文件来自因此,与更强大的基于cflow的工具链相比,它并不是非常有用。请注意,埃及默认支持从图中排除库调用,以使其更清晰。

Also, file dependency graphs for C/C++ can be created with crowfood.

此外,可以使用crowfood创建C / C ++的文件依赖关系图。

#2


7  

So I've made some more research and it is not hard to get line numbers for nodes. Just add lineno option to one of those options to get it. So use -fdump-tree-cfg-lineno or -fdump-tree-vcg-lineno. It took me some time to check if those numbers are reliable. In case of graph in VCG format label of each node contains two numbers. Those are line numbers for start and end of code portion represented by this node.

所以我做了一些研究,并不难获得节点的行号。只需将lineno选项添加到其中一个选项即可获得它。所以使用-fdump-tree-cfg-lineno或-fdump-tree-vcg-lineno。我花了一些时间来检查这些数字是否可靠。在VCG格式的图形的情况下,每个节点的标签包含两个数字。这些是由该节点表示的代码部分的开始和结束的行号。

#3


1  

Dynamic analysis methods

动态分析方法

In this answer I describe a few dynamic analysis methods.

在这个答案中,我描述了一些动态分析方法。

Dynamic methods actually run the program to determine the call graph.

动态方法实际上运行程序以确定调用图。

The opposite of dynamic methods are static methods, which try to determine it from the source alone without running the program.

与动态方法相反的是静态方法,它试图在不运行程序的情况下单独从源中确定它。

Advantages of dynamic methods:

动态方法的优点:

  • catches function pointers and virtual C++ calls. These are present in large numbers in any non-trivial software.
  • 捕获函数指针和虚拟C ++调用。这些在任何非平凡的软件中大量出现。

Disadvantages of dynamic methods:

动态方法的缺点:

  • you have to run the program, which might be slow, or require a setup that you don't have, e.g. cross-compilation
  • 你必须运行程序,这可能很慢,或者需要你没有的设置,例如交叉编译

  • only functions that were actually called will show. E.g., some functions could be called or not depending on the command line arguments.
  • 只显示实际调用的函数。例如,根据命令行参数,可以调用或不调用某些函数。

KcacheGrind

https://kcachegrind.github.io/html/Home.html

Test program:

int f2(int i) { return i + 2; }
int f1(int i) { return f2(2) + i + 1; }
int f0(int i) { return f1(1) + f2(2); }
int pointed(int i) { return i; }
int not_called(int i) { return 0; }

int main(int argc, char **argv) {
    int (*f)(int);
    f0(1);
    f1(1);
    f = pointed;
    if (argc == 1)
        f(1);
    if (argc == 2)
        not_called(1);
    return 0;
}

Usage:

sudo apt-get install -y kcachegrind valgrind

# Compile the program as usual, no special flags.
gcc -ggdb3 -O0 -o main -std=c99 main.c

# Generate a callgrind.out.<PID> file.
valgrind --tool=callgrind ./main

# Open a GUI tool to visualize callgrind data.
kcachegrind callgrind.out.1234

You are now left inside an awesome GUI program that contains a lot of interesting performance data.

您现在被置于一个非常棒的GUI程序中,其中包含许多有趣的性能数据。

On the bottom right, select the "Call graph" tab. This shows an interactive call graph that correlates to performance metrics in other windows as you click the functions.

在右下角,选择“调用图”选项卡。这会显示一个交互式调用图,当您单击这些函数时,它会与其他窗口中的性能指标相关联。

To export the graph, right click it and select "Export Graph". The exported PNG looks like this:

要导出图形,请右键单击它并选择“导出图形”。导出的PNG如下所示:

从ANSI C代码获取控制流图

From that we can see that:

从那我们可以看出:

  • the root node is _start, which is the actual ELF entry point, and contains glibc initialization boilerplate
  • 根节点是_start,它是实际的ELF入口点,并包含glibc初始化样板

  • f0, f1 and f2 are called as expected from one another
  • f0,f1和f2按预期彼此调用

  • pointed is also shown, even though we called it with a function pointer. It might not have been called if we had passed a command line argument.
  • 尽管我们用函数指针调用它,但也会显示指向。如果我们通过命令行参数,它可能没有被调用。

  • not_called is not shown because it didn't get called in the run, because we didn't pass an extra command line argument.
  • not_called未显示,因为它没有在运行中被调用,因为我们没有传递额外的命令行参数。

The cool thing about valgrind is that it does not require any special compilation options.

关于valgrind的一个很酷的事情是它不需要任何特殊的编译选项。

Therefore, you could use it even if you don't have the source code, only the executable.

因此,即使您没有源代码,也只能使用可执行文件,您可以使用它。

valgrind manages to do that by running your code through a lightweight "virtual machine".

valgrind通过轻量级“虚拟机”运行代码来实现这一目标。

Tested on Ubuntu 18.04.

在Ubuntu 18.04上测试过。

gcc -finstrument-functions + etrace

gcc -finstrument-functions + etrace

https://github.com/elcritch/etrace

-finstrument-functions adds callbacks, etrace parses the ELF file and implements all callbacks.

-finstrument-functions添加回调,etrace解析ELF文件并实现所有回调。

I couldn't get it working however unfortunately: Why doesn't `-finstrument-functions` work for me?

不幸的是,我无法让它工作:为什么`-finstrument-functions`对我不起作用?

Claimed output is of format:

声明的输出格式为:

\-- main
|   \-- Crumble_make_apple_crumble
|   |   \-- Crumble_buy_stuff
|   |   |   \-- Crumble_buy
|   |   |   \-- Crumble_buy
|   |   |   \-- Crumble_buy
|   |   |   \-- Crumble_buy
|   |   |   \-- Crumble_buy
|   |   \-- Crumble_prepare_apples
|   |   |   \-- Crumble_skin_and_dice
|   |   \-- Crumble_mix
|   |   \-- Crumble_finalize
|   |   |   \-- Crumble_put
|   |   |   \-- Crumble_put
|   |   \-- Crumble_cook
|   |   |   \-- Crumble_put
|   |   |   \-- Crumble_bake

Likely the most efficient method besides specific hardware tracing support, but has the downside that you have to recompile the code.

可能是除了特定硬件跟踪支持之外最有效的方法,但有一个缺点,你必须重新编译代码。

#1


15  

For the control flow graph of a C Program you could look at existing Python parsers for C:

对于C程序的控制流图,您可以查看C的现有Python解析器:

Call graphs are a closely related construct to control flow graphs. There are several approaches available to create call graphs (function dependencies) for C code. This might prove of help for progressing with control flow graph generation. Ways to create dependency graphs in C:

调用图是控制流图的密切相关的构造。有几种方法可用于为C代码创建调用图(函数依赖性)。这可能有助于推进控制流图生成。在C中创建依赖图的方法:

  • Using cflow:

    • cflow +pycflow2dot +dot (GPL, BSD) cflow is robust, because it can handle code which cannot compile, e.g. missing includes. If preprocessor directives are heavily used, it may need the --cpp option to preprocess the code.
    • cflow + pycflow2dot + dot(GPL,BSD)cflow是健壮的,因为它可以处理无法编译的代码,例如:遗失包括。如果大量使用预处理程序指令,则可能需要使用--cpp选项来预处理代码。

    • cflow + cflow2dot + dot (GPL v2, GPL v3, Eclipse Public License (EPL) v1) (note that cflow2dot needs some path fixing before it works)
    • cflow + cflow2dot + dot(GPL v2,GPL v3,Eclipse公共许可证(EPL)v1)(注意cflow2dot在运行之前需要一些路径修复)

    • cflow +cflow2dot.bash (GPL v2, ?)
    • cflow + cflow2dot.bash(GPL v2 ,?)

    • cflow +cflow2vcg (GPL v2 , GPL v2)
    • cflow + cflow2vcg(GPL v2,GPL v2)

    • enhanced cflow (GPL v2) with list to exclude symbols from graph
    • 增强型cflow(GPL v2),带有列表以从图中排除符号

  • 使用cflow:cflow + pycflow2dot + dot(GPL,BSD)cflow是健壮的,因为它可以处理无法编译的代码,例如:遗失包括。如果大量使用预处理程序指令,则可能需要使用--cpp选项来预处理代码。 cflow + cflow2dot + dot(GPL v2,GPL v3,Eclipse公共许可证(EPL)v1)(注意cflow2dot在运行之前需要一些路径修复)cflow + cflow2dot.bash(GPL v2,?)cflow + cflow2vcg(GPL v2, GPL v2)增强了cflow(GPL v2),列表可以从图中排除符号

  • Using cscope:

    • cscope (BSD)
    • cscope +callgraphviz +dot +xdot
    • cscope + callgraphviz + dot + xdot

    • cscope +vim CCTree (C Call-Tree Explorer)
    • cscope + vim CCTree(C Call-Tree Explorer)

    • cscope +ccglue
    • cscope +CodeQuery for C, C++, Python & Java
    • cscope + CodeQuery for C,C ++,Python和Java

    • cscope +Python html producer
    • cscope + Python html制作人

    • cscope +calltree.sh
  • 使用cscope:cscope(BSD)cscope + callgraphviz + dot + xdot cscope + vim CCTree(C Call-Tree Explorer)cscope + ccglue cscope + CodeQuery for C,C ++,Python&Java cscope + Python html producer cscope + calltree.sh

  • ncc (cflow like)

    ncc(像cflow一样)

  • KCachegrind (KDE dependency viewer)
  • KCachegrind(KDE依赖查看器)

  • Calltree

The following tools unfortunately require that the code be compilable, because they depend on output from gcc:

遗憾的是,以下工具要求代码可编译,因为它们依赖于gcc的输出:

  • CodeViz (GPL v2) (weak point: needs compilable source, because it uses gcc to dump cdepn files)
  • CodeViz(GPL v2)(弱点:需要可编译源,因为它使用gcc转储cdepn文件)

  • gcc +egypt +dot (GPL v*, Perl = GPL | Artistic license, EPL v1) (egypt uses gcc to produce RTL, so fails for any buggy source code, or even in case you just want to focus on a single file from a larger project. Therefore, it is not very useful compared to the more robust cflow-based toolchains. Note that egypt has by default good support for excluding library calls from the graph, to make it cleaner.
  • gcc + egypt + dot(GPL v *,Perl = GPL | Artistic license,EPL v1)(埃及使用gcc生成RTL,因此任何错误的源代码都会失败,或者即使您只想关注单个文件来自因此,与更强大的基于cflow的工具链相比,它并不是非常有用。请注意,埃及默认支持从图中排除库调用,以使其更清晰。

Also, file dependency graphs for C/C++ can be created with crowfood.

此外,可以使用crowfood创建C / C ++的文件依赖关系图。

#2


7  

So I've made some more research and it is not hard to get line numbers for nodes. Just add lineno option to one of those options to get it. So use -fdump-tree-cfg-lineno or -fdump-tree-vcg-lineno. It took me some time to check if those numbers are reliable. In case of graph in VCG format label of each node contains two numbers. Those are line numbers for start and end of code portion represented by this node.

所以我做了一些研究,并不难获得节点的行号。只需将lineno选项添加到其中一个选项即可获得它。所以使用-fdump-tree-cfg-lineno或-fdump-tree-vcg-lineno。我花了一些时间来检查这些数字是否可靠。在VCG格式的图形的情况下,每个节点的标签包含两个数字。这些是由该节点表示的代码部分的开始和结束的行号。

#3


1  

Dynamic analysis methods

动态分析方法

In this answer I describe a few dynamic analysis methods.

在这个答案中,我描述了一些动态分析方法。

Dynamic methods actually run the program to determine the call graph.

动态方法实际上运行程序以确定调用图。

The opposite of dynamic methods are static methods, which try to determine it from the source alone without running the program.

与动态方法相反的是静态方法,它试图在不运行程序的情况下单独从源中确定它。

Advantages of dynamic methods:

动态方法的优点:

  • catches function pointers and virtual C++ calls. These are present in large numbers in any non-trivial software.
  • 捕获函数指针和虚拟C ++调用。这些在任何非平凡的软件中大量出现。

Disadvantages of dynamic methods:

动态方法的缺点:

  • you have to run the program, which might be slow, or require a setup that you don't have, e.g. cross-compilation
  • 你必须运行程序,这可能很慢,或者需要你没有的设置,例如交叉编译

  • only functions that were actually called will show. E.g., some functions could be called or not depending on the command line arguments.
  • 只显示实际调用的函数。例如,根据命令行参数,可以调用或不调用某些函数。

KcacheGrind

https://kcachegrind.github.io/html/Home.html

Test program:

int f2(int i) { return i + 2; }
int f1(int i) { return f2(2) + i + 1; }
int f0(int i) { return f1(1) + f2(2); }
int pointed(int i) { return i; }
int not_called(int i) { return 0; }

int main(int argc, char **argv) {
    int (*f)(int);
    f0(1);
    f1(1);
    f = pointed;
    if (argc == 1)
        f(1);
    if (argc == 2)
        not_called(1);
    return 0;
}

Usage:

sudo apt-get install -y kcachegrind valgrind

# Compile the program as usual, no special flags.
gcc -ggdb3 -O0 -o main -std=c99 main.c

# Generate a callgrind.out.<PID> file.
valgrind --tool=callgrind ./main

# Open a GUI tool to visualize callgrind data.
kcachegrind callgrind.out.1234

You are now left inside an awesome GUI program that contains a lot of interesting performance data.

您现在被置于一个非常棒的GUI程序中,其中包含许多有趣的性能数据。

On the bottom right, select the "Call graph" tab. This shows an interactive call graph that correlates to performance metrics in other windows as you click the functions.

在右下角,选择“调用图”选项卡。这会显示一个交互式调用图,当您单击这些函数时,它会与其他窗口中的性能指标相关联。

To export the graph, right click it and select "Export Graph". The exported PNG looks like this:

要导出图形,请右键单击它并选择“导出图形”。导出的PNG如下所示:

从ANSI C代码获取控制流图

From that we can see that:

从那我们可以看出:

  • the root node is _start, which is the actual ELF entry point, and contains glibc initialization boilerplate
  • 根节点是_start,它是实际的ELF入口点,并包含glibc初始化样板

  • f0, f1 and f2 are called as expected from one another
  • f0,f1和f2按预期彼此调用

  • pointed is also shown, even though we called it with a function pointer. It might not have been called if we had passed a command line argument.
  • 尽管我们用函数指针调用它,但也会显示指向。如果我们通过命令行参数,它可能没有被调用。

  • not_called is not shown because it didn't get called in the run, because we didn't pass an extra command line argument.
  • not_called未显示,因为它没有在运行中被调用,因为我们没有传递额外的命令行参数。

The cool thing about valgrind is that it does not require any special compilation options.

关于valgrind的一个很酷的事情是它不需要任何特殊的编译选项。

Therefore, you could use it even if you don't have the source code, only the executable.

因此,即使您没有源代码,也只能使用可执行文件,您可以使用它。

valgrind manages to do that by running your code through a lightweight "virtual machine".

valgrind通过轻量级“虚拟机”运行代码来实现这一目标。

Tested on Ubuntu 18.04.

在Ubuntu 18.04上测试过。

gcc -finstrument-functions + etrace

gcc -finstrument-functions + etrace

https://github.com/elcritch/etrace

-finstrument-functions adds callbacks, etrace parses the ELF file and implements all callbacks.

-finstrument-functions添加回调,etrace解析ELF文件并实现所有回调。

I couldn't get it working however unfortunately: Why doesn't `-finstrument-functions` work for me?

不幸的是,我无法让它工作:为什么`-finstrument-functions`对我不起作用?

Claimed output is of format:

声明的输出格式为:

\-- main
|   \-- Crumble_make_apple_crumble
|   |   \-- Crumble_buy_stuff
|   |   |   \-- Crumble_buy
|   |   |   \-- Crumble_buy
|   |   |   \-- Crumble_buy
|   |   |   \-- Crumble_buy
|   |   |   \-- Crumble_buy
|   |   \-- Crumble_prepare_apples
|   |   |   \-- Crumble_skin_and_dice
|   |   \-- Crumble_mix
|   |   \-- Crumble_finalize
|   |   |   \-- Crumble_put
|   |   |   \-- Crumble_put
|   |   \-- Crumble_cook
|   |   |   \-- Crumble_put
|   |   |   \-- Crumble_bake

Likely the most efficient method besides specific hardware tracing support, but has the downside that you have to recompile the code.

可能是除了特定硬件跟踪支持之外最有效的方法,但有一个缺点,你必须重新编译代码。