gdb漂亮打印python一个递归结构

I'm not very familiar with Python, and I am just discovering GDB python scripting capabilities; the motivation of my question is to enhance the GDB printing of values inside the MELT monitor which will later be connected to GCC MELT. But here is a simpler variant.

我对Python不太熟悉，我只是发现了GDB python脚本功能;我的问题的动机是增强MELT监视器内的值的GDB打印，该监视器稍后将连接到GCC MELT。但这是一个更简单的变体。

My system is Linux/Debian/Sid/x86-64. the GCC compiler is 4.8.2; the GDB debugger is 7.6.2; its python is 3.3

我的系统是Linux / Debian / Sid / x86-64。 GCC编译器是4.8.2; GDB调试器是7.6.2;它的python是3.3

I want to debug a C program with a "discriminated union" type:

我想用“区别联合”类型调试C程序：

// file tiny.c in the public domain by Basile Starynkevitch
// compile with gcc -g3 -Wall -std=c99 tiny.c -o tiny
// debug with gdb tiny
// under gdb: python tiny-gdb.py
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

typedef union my_un myval_t;
enum tag_en {
  tag_none,
  tag_int,
  tag_string,
  tag_sequence
};
struct boxint_st;
struct boxstring_st;
struct boxsequence_st;
union my_un {
  void* ptr;
  enum tag_en *ptag;
  struct boxint_st *pint;
  struct boxstring_st *pstr;
  struct boxsequence_st *pseq;
};

struct boxint_st {
  enum tag_en tag;      // for tag_int
  int ival;
};
struct boxstring_st {
  enum tag_en tag;      // for tag_string
  char strval[];        // a zero-terminated C string 
};
struct boxsequence_st {
  enum tag_en tag;      // for tag_sequence
  unsigned slen;
  myval_t valtab[];     // of length slen
};


int main (int argc, char **argv) {
  printf ("start %s, argc=%d", argv[0], argc);
  struct boxint_st *iv42 = malloc (sizeof (struct boxint_st));
  iv42->tag = tag_int;
  iv42->ival = 42;
  struct boxstring_st *istrhello =
    malloc (sizeof (struct boxstring_st) + sizeof ("hello") + 1);
  istrhello->tag = tag_string;
  strcpy (istrhello->strval, "hello");
  struct boxsequence_st *iseq3 =
    malloc (sizeof (struct boxsequence_st) + 3 * sizeof (myval_t));
  iseq3->tag = tag_sequence;
  iseq3->slen = 3;
  iseq3->valtab[0] = (myval_t)iv42;
  iseq3->valtab[1] = (myval_t)istrhello;
  iseq3->valtab[2] = (myval_t)NULL;
  printf ("before %s:%d gdb print iseq3\n", __FILE__, __LINE__);
}

Here is my Python file to be read under gdb

这是我在gdb下读取的Python文件

 # file tiny-gdb.py in the public domain by Basile Starynkevitch
 ## see also tiny.c file
class my_val_Printer:
    """pretty prints a my_val"""
    def __init__ (self, val):
        self.val = val
    def to_string (self):
        outs = "my_val@" + self.val['ptr']
        mytag = self.val['ptag'].dereference();
        if (mytag):
            outs = outs + mytag.to_string()
    def display_hint (self):
        return 'my_val'

def my_val_lookup(val):
    lookup = val.type.tag
    if (lookup == None):
        return None
    if lookup == "my_val":
        return my_val_Printer(val)
    return None

I'm stuck with the following basic questions.

我坚持以下基本问题。

How to install my pretty printer in python under GDB? (I'm seeing several ways in the documentation, and I can't choose the appropriate one).
如何在GDB下的python中安装我漂亮的打印机？（我在文档中看到了几种方法，我无法选择合适的方法）。
How to ensure that GDB pretty-prints both union my_un and its typedef-ed synonym myval_t the same way.
如何确保GDB以同样的方式打印union my_un及其typedef-ed同义词myval_t。
How should the pretty printer detect NULL pointers?
漂亮的打印机应该如何检测NULL指针？
How can my pretty printer recurse for struct boxsequence_st ? This means detecting that the pointer is non-nil, then dereferencing its ptag, comparing that tag to tag_sequence, pretty printing the valtab flexible array member.
我的漂亮的打印机如何为struct boxsequence_st递归？这意味着检测到指针是非零的，然后取消引用其标签，将该标记与tag_sequence进行比较，相当打印valtab灵活的数组成员。
How to avoid recursing too deeply the pretty printing?
如何避免漂亮的印刷过于复杂？

1 个解决方案

#1

I don't have enough experience with the gdb Python api to call this an answer; I consider this just some research notes from a fellow developer. My code attached below is quite crude and ugly, too. However, this does work with gdb-7.4 and python-2.7.3. An example debugging run:

我没有足够的经验使用gdb Python api来称呼这个答案;我认为这只是一位开发人员的一些研究笔记。我下面附带的代码也非常粗糙和丑陋。但是，这适用于gdb-7.4和python-2.7.3。调试运行示例：

$ gcc -Wall -g3 tiny.c -o tiny
$ gdb tiny
(gdb) b 58
(gdb) run
(gdb) print iseq3
$1 = (struct boxsequence_st *) 0x602050
(gdb) print iv42
$2 = (struct boxint_st *) 0x602010
(gdb) print istrhello
$3 = (struct boxstring_st *) 0x602030

All of the above are bog-standard pretty-printed outputs -- my reasoning is that I often want to see what the pointers are, so I didn't want to override those. However, dreferencing the pointers uses the prettyprinter shown further below:

以上所有都是标准的漂亮打印输出 - 我的理由是我经常想看看指针是什么，所以我不想覆盖它们。但是，引用指针使用下面进一步显示的prettyprinter：

(gdb) print *iseq3
$4 = (struct boxsequence_st)(3) = {(struct boxint_st)42, (struct boxstring_st)"hello"(5), NULL}
(gdb) print *iv42
$5 = (struct boxint_st)42
(gdb) print *istrhello
$6 = (struct boxstring_st)"hello"(5)
(gdb) set print array
(gdb) print *iseq3
$7 = (struct boxsequence_st)(3) = {
  (struct boxint_st)42,
  (struct boxstring_st)"hello"(5),
  NULL
}
(gdb) info auto-load
Loaded  Script                                                                 
Yes     /home/.../tiny-gdb.py

The last line shows that when debugging tiny, tiny-gdb.py in the same directory gets loaded automatically (although you can disable this, I do believe this is the default behaviour).

最后一行显示，当调试很小时，同一目录中的tiny-gdb.py会自动加载（尽管你可以禁用它，但我相信这是默认行为）。

The tiny-gdb.py file used for above:

上面使用的tiny-gdb.py文件：

def deref(reference):
    target = reference.dereference()
    if str(target.address) == '0x0':
        return 'NULL'
    else:
        return target

class cstringprinter:
    def __init__(self, value, maxlen=4096):
        try:
            ends = gdb.selected_inferior().search_memory(value.address, maxlen, b'\0')
            if ends is not None:
                maxlen = ends - int(str(value.address), 16)
                self.size = str(maxlen)
            else:
                self.size = '%s+' % str(maxlen)
            self.data = bytearray(gdb.selected_inferior().read_memory(value.address, maxlen))
        except:
            self.data = None
    def to_string(self):
        if self.data is None:
            return 'NULL'
        else:
            return '\"%s\"(%s)' % (str(self.data).encode('string_escape').replace('"', '\\"').replace("'", "\\\\'"), self.size)

class boxintprinter:
    def __init__(self, value):
        self.value = value.cast(gdb.lookup_type('struct boxint_st'))
    def to_string(self):
        return '(struct boxint_st)%s' % str(self.value['ival'])

class boxstringprinter:
    def __init__(self, value):
        self.value = value.cast(gdb.lookup_type('struct boxstring_st'))
    def to_string(self):
        return '(struct boxstring_st)%s' % (self.value['strval'])

class boxsequenceprinter:
    def __init__(self, value):
        self.value = value.cast(gdb.lookup_type('struct boxsequence_st'))
    def display_hint(self):
        return 'array'
    def to_string(self):
        return '(struct boxsequence_st)(%s)' % str(self.value['slen'])
    def children(self):
        value = self.value
        tag = str(value['tag'])
        count = int(str(value['slen']))
        result = []
        if tag == 'tag_none':
            for i in xrange(0, count):
                result.append( ( '#%d' % i, deref(value['valtab'][i]['ptag']) ))
        elif tag == 'tag_int':
            for i in xrange(0, count):
                result.append( ( '#%d' % i, deref(value['valtab'][i]['pint']) ))
        elif tag == 'tag_string':
            for i in xrange(0, count):
                result.append( ( '#%d' % i, deref(value['valtab'][i]['pstr']) ))
        elif tag == 'tag_sequence':
            for i in xrange(0, count):
                result.append( ( '#%d' % i, deref(value['valtab'][i]['pseq']) ))
        return result

def typefilter(value):
    "Pick a pretty-printer for 'value'."
    typename = str(value.type.strip_typedefs().unqualified())

    if typename == 'char []':
        return cstringprinter(value)

    if (typename == 'struct boxint_st' or
        typename == 'struct boxstring_st' or
        typename == 'struct boxsequence_st'):
        tag = str(value['tag'])
        if tag == 'tag_int':
            return boxintprinter(value)
        if tag == 'tag_string':
            return boxstringprinter(value)
        if tag == 'tag_sequence':
            return boxsequenceprinter(value)

    return None

gdb.pretty_printers.append(typefilter)

The reasoning behind my choices are as follows:

我选择的原因如下：

How to install pretty-printers to gdb?

如何在gdb上安装漂亮的打印机？

There are two parts to this question: where to install the Python files, and how to hook the pretty-printers to gdb.

这个问题分为两部分：安装Python文件的位置，以及如何将漂亮的打印机挂钩到gdb。

Because the pretty-printer selection cannot rely on the inferred type alone, but has to peek into the actual data fields, you cannot use the regular expression matching functions. Instead, I chose to add my own pretty-printer selector function, typefilter(), to the global pretty-printers list, as described in the documentation. I did not implement the enable/disable functionality, because I believe it is easier to just load/not load the relevant Python script instead.

因为漂亮的打印机选择不能单独依赖于推断类型，而是必须查看实际的数据字段，所以不能使用正则表达式匹配函数。相反，我选择将我自己的漂亮打印机选择器函数typefilter（）添加到全局漂亮打印机列表中，如文档中所述。我没有实现启用/禁用功能，因为我认为只是加载/不加载相关的Python脚本更容易。

(typefilter() gets called once per every variable reference, unless some other pretty-printer has already accepted it.)

（每个变量引用都会调用一次typefilter（），除非其他漂亮的打印机已经接受了它。）

The file location issue is a more complicated one. For application-specific pretty-printers, putting them into a single Python script file sounds sensible, but for a library, some splitting seems to be in order. The documentation recommends packaging the functions into a Python module, so that a simple python import module enables the pretty-printer. Fortunately, Python packaging is quite straightforward. If you were to import gdb to the top and save it to /usr/lib/pythonX.Y/tiny.py, where X.Y is the python version used, you only need to run python import tiny in gdb to enable the pretty-printer.

文件位置问题更复杂。对于特定于应用程序的漂亮打印机，将它们放入单个Python脚本文件听起来是合理的，但对于库，一些分裂似乎是有序的。文档建议将函数打包到Python模块中，以便简单的python导入模块启用漂亮的打印机。幸运的是，Python包装非常简单。如果你要将gdb导入到顶部并将其保存到/usr/lib/pythonX.Y/tiny.py，其中XY是使用的python版本，你只需要在gdb中运行python import tiny来启用漂亮的打印机。

Of course, properly packaging the pretty-printer is a very good idea, especially if you intend to distribute it, but it does pretty much boil down to adding some variables et cetera to the beginning of the script, assuming you keep it as a single file. For more complex pretty-printers, using a directory layout might be a good idea.

当然，正确包装漂亮的打印机是一个非常好的主意，特别是如果你打算分发它，但它几乎归结为在脚本的开头添加一些变量等等，假设你把它保持为单个文件。对于更复杂的漂亮打印机，使用目录布局可能是个好主意。
If you have a value val, then val.type is the gdb.Type object describing its type; converting it to string yields a human-readable type name.

如果你有一个值val，那么val.type是描述其类型的gdb.Type对象;将其转换为字符串会产生一个人类可读的类型名称。

val.type.strip_typedefs() yields the actual type with all typedefs stripped. I even added .unqualified(), so that all const/volatile/etc. type qualifiers are removed.

val.type.strip_typedefs（）生成实际类型，并删除所有typedef。我甚至添加了.unqualified（），以便所有const / volatile /等。类型限定符被删除。
NULL pointer detection is a bit tricky.

NULL指针检测有点棘手。

The best way I found, was to examine the stringified .address member of the target gdb.Value object, and see if it is "0x0".

我找到的最好方法是检查目标gdb.Value对象的字符串化.address成员，看看它是否为“0x0”。

To make life easier, I was able to write a simple deref() function, which tries to dereference a pointer. If the target points to (void *)0, it returns the string "NULL", otherwise it returns the target gdb.Value object.

为了让生活更轻松，我能够编写一个简单的deref（）函数，它试图取消引用指针。如果目标指向（void *）0，则返回字符串“NULL”，否则返回目标gdb.Value对象。

The way I use deref() is based on the fact that "array" type pretty-printers yield a list of 2-tuples, where the first item is the name string, and the second item is either a gdb.Value object, or a string. This list is returned by the children() method of the pretty-printer object.

我使用deref（）的方式是基于“数组”类型的漂亮打印机产生2元组列表的事实，其中第一项是名称字符串，第二项是gdb.Value对象，或者一个字符串。此列表由pretty-printer对象的children（）方法返回。
Handling "discriminated union" types would be much easier, if you had a separate type for the generic entity. That is, if you had

如果您有一个通用实体的单独类型，那么处理“区别联合”类型会容易得多。也就是说，如果你有的话
```
struct box_st {
    enum tag_en tag;
};
```
and it was used everywhere when the tag value is still uncertain; and the specific structure types only used where their tag value is fixed. This would allow a much simpler type inference.

当标签值仍然不确定时，它在任何地方都被使用;并且仅在标签值固定的地方使用特定结构类型。这将允许更简单的类型推断。

As it is, in tiny.c the struct box*_st types can be used interchangeably. (Or, more specifically, we cannot rely on a specific tag value based on the type alone.)

实际上，在tiny.c中，struct box * _st类型可以互换使用。（或者，更具体地说，我们不能仅仅基于类型依赖于特定的标记值。）

The sequence case is actually quite simple, because valtab[] can be treated as simply as an array of void pointers. The sequence tag is used to pick the correct union member. In fact, if valtab[] was simply a void pointer array, then gdb.Value.cast(gdb.lookup_type()) or gdb.Value.reinterpret_cast(gdb.lookup_type()) can be used to change each pointer type as necessary, just like I do for the boxed structure types.

序列情况实际上非常简单，因为valtab []可以简单地视为void指针数组。序列标记用于选择正确的联合成员。实际上，如果valtab []只是一个void指针数组，那么gdb.Value.cast（gdb.lookup_type（））或gdb.Value.reinterpret_cast（gdb.lookup_type（））可用于根据需要更改每个指针类型就像我对盒装结构类型一样。
Recursion limits?

递归限制？

You can use the @ operator in print command to specify how many elements are printed, but that does not help with nesting.

您可以在print命令中使用@运算符来指定打印的元素数量，但这对嵌套没有帮助。

If you add iseq3->valtab[2] = (myval_t)iseq3; to tiny.c, you get an infinitely recursive sequence. gdb does print it nicely, especially with set print array, but it does not notice or care about the recursion.

如果添加iseq3-> valtab [2] =（myval_t）iseq3;对于tiny.c，你得到一个无限递归的序列。 gdb确实很好地打印它，特别是对于set print数组，但它没有注意到或关心递归。

In my opinion, you might wish to write a gdb command in addition to a pretty-printer for deeply nested or recursive data structures. During my testing, I wrote a command that uses Graphviz to draw binary tree structures directly from within gdb; I'm absolutely convinced it beats plain text output.

在我看来，您可能希望除了针对深度嵌套或递归数据结构的漂亮打印机之外还编写一个gdb命令。在我测试期间，我编写了一个命令，使用Graphviz直接从gdb中绘制二叉树结构;我绝对相信它胜过纯文本输出。

Added: If you save the following as /usr/lib/pythonX.Y/tree.py:

补充：如果将以下内容保存为/usr/lib/pythonX.Y/tree.py：

import subprocess
import gdb

def pretty(value, field, otherwise=''):
    try:
        if str(value[field].type) == 'char []':
            data = str(gdb.selected_inferior().read_memory(value[field].address, 64))
            try:
                size = data.index("\0")
                return '\\"%s\\"' % data[0:size].encode('string_escape').replace('"', '\\"').replace("'", "\\'")
            except:
                return '\\"%s\\"..' % data.encode('string_escape').replace('"', '\\"').replace("'", "\\'")
        else:
            return str(value[field])
    except:
        return otherwise

class tee:
    def __init__(self, cmd, filename):
        self.file = open(filename, 'wb')
        gdb.write("Saving DOT to '%s'.\n" % filename)
        self.cmd = cmd
    def __del__(self):
        if self.file is not None:
            self.file.flush()
            self.file.close()
            self.file = None
    def __call__(self, arg):
        self.cmd(arg)
        if self.file is not None:
            self.file.write(arg)

def do_dot(value, output, visited, source, leg, label, left, right):
    if value.type.code != gdb.TYPE_CODE_PTR:
        return
    target = value.dereference()

    target_addr = int(str(target.address), 16)
    if target_addr == 0:
        return

    if target_addr in visited:
        if source is not None:
            path='%s.%s' % (source, target_addr)
            if path not in visited:
                visited.add(path)
                output('\t"%s" -> "%s" [ taillabel="%s" ];\n' % (source, target_addr, leg))
        return

    visited.add(target_addr)

    if source is not None:
        path='%s.%s' % (source, target_addr)
        if path not in visited:
            visited.add(path)
            output('\t"%s" -> "%s" [ taillabel="%s" ];\n' % (source, target_addr, leg))

    if label is None:
        output('\t"%s" [ label="%s" ];\n' % (target_addr, target_addr))
    elif "," in label:
        lab = ''
        for one in label.split(","):
            cur = pretty(target, one, '')
            if len(cur) > 0:
                if len(lab) > 0:
                    lab = '|'.join((lab,cur))
                else:
                    lab = cur
        output('\t"%s" [ shape=record, label="{%s}" ];\n' % (target_addr, lab))
    else:
        output('\t"%s" [ label="%s" ];\n' % (target_addr, pretty(target, label, target_addr)))

    if left is not None:
        try:
            target_left = target[left]
            do_dot(target_left, output, visited, target_addr, left, label, left, right)
        except:
            pass

    if right is not None:
        try:
            target_right = target[right]
            do_dot(target_right, output, visited, target_addr, right, label, left, right)
        except:
            pass

class Tree(gdb.Command):

    def __init__(self):
        super(Tree, self).__init__('tree', gdb.COMMAND_DATA, gdb.COMPLETE_SYMBOL, False)

    def do_invoke(self, name, filename, left, right, label, cmd, arg):
        try:
            node = gdb.selected_frame().read_var(name)
        except:
            gdb.write('No symbol "%s" in current context.\n' % str(name))
            return
        if len(arg) < 1:
            cmdlist = [ cmd ]
        else:
            cmdlist = [ cmd, arg ]
        sub = subprocess.Popen(cmdlist, bufsize=16384, stdin=subprocess.PIPE, stdout=None, stderr=None)
        if filename is None:
            output = sub.stdin.write
        else:
            output = tee(sub.stdin.write, filename)
        output('digraph {\n')
        output('\ttitle = "%s";\n' % name)
        if len(label) < 1: label = None
        if len(left)  < 1: left  = None
        if len(right) < 1: right = None
        visited = set((0,))
        do_dot(node, output, visited, None, None, label, left, right)
        output('}\n')
        sub.communicate()
        sub.wait()

    def help(self):
        gdb.write('Usage: tree [OPTIONS] variable\n')
        gdb.write('Options:\n')
        gdb.write('   left=name          Name member pointing to left child\n')
        gdb.write('   right=name         Name right child pointer\n')
        gdb.write('   label=name[,name]  Define node fields\n')
        gdb.write('   cmd=dot arg=-Tx11  Specify the command (and one option)\n')
        gdb.write('   dot=filename.dot   Save .dot to a file\n')
        gdb.write('Suggestions:\n')
        gdb.write('   tree cmd=neato variable\n')

    def invoke(self, argument, from_tty):
        args = argument.split()
        if len(args) < 1:
            self.help()
            return
        num = 0
        cfg = { 'left':'left', 'right':'right', 'label':'value', 'cmd':'dot', 'arg':'-Tx11', 'dot':None }
        for arg in args[0:]:
            if '=' in arg:
                key, val = arg.split('=', 1)
                cfg[key] = val
            else:
                num += 1
                self.do_invoke(arg, cfg['dot'], cfg['left'], cfg['right'], cfg['label'], cfg['cmd'], cfg['arg'])
        if num < 1:
            self.help()

Tree()

you can use it in gdb:

你可以在gdb中使用它：

(gdb) python import tree
(gdb) tree
Usage: tree [OPTIONS] variable
Options:
   left=name          Name member pointing to left child
   right=name         Name right child pointer
   label=name[,name]  Define node fields
   cmd=dot arg=-Tx11  Specify the command (and one option)
   dot=filename.dot   Save .dot to a file
Suggestions:
   tree cmd=neato variable

If you have e.g.

如果您有例如

struct node {
    struct node *le;
    struct node *gt;
    long         key;
    char         val[];
}

struct node *sometree;

and you have X11 (local or remote) connection and Graphviz installed, you can use

并且您已经安装了X11（本地或远程）连接和Graphviz，您可以使用

(gdb) tree left=le right=gt label=key,val sometree

to view the tree structure. Because it retains a list of already visited nodes (as a Python set), it does not get fazed about recursive structures.

查看树结构。因为它保留了已访问过的节点列表（作为Python集），所以它不会对递归结构感到担忧。

I probably should have cleaned my Python snippets before posting, but no matter. Please do consider these only initial testing versions; use at your own risk. :)

我可能应该在发布之前清理我的Python片段，但无论如何。请考虑这些仅初始测试版本;使用风险自负。 :)

#1

$ gcc -Wall -g3 tiny.c -o tiny
$ gdb tiny
(gdb) b 58
(gdb) run
(gdb) print iseq3
$1 = (struct boxsequence_st *) 0x602050
(gdb) print iv42
$2 = (struct boxint_st *) 0x602010
(gdb) print istrhello
$3 = (struct boxstring_st *) 0x602030