如何检查gcc是否正在执行尾递归优化？

How do I tell if gcc (more specifically, g++) is optimizing tail recursion in a particular function? (Because it's come up a few times: I don't want to test if gcc can optimize tail recursion in general. I want to know if it optimizes my tail recursive function.)

如何判断gcc(更具体地说,g ++)是否在特定函数中优化尾递归? (因为它出现了几次:我不想测试gcc是否可以优化尾递归。我想知道它是否优化了我的尾递归函数。)

If your answer is "look at the generated assembler", I'd like to know exactly what I'm looking for, and whether or not I could write a simple program that examines the assembler to see if there's optimization.

如果您的答案是“查看生成的汇编程序”,我想知道我正在寻找什么,以及我是否可以编写一个简单的程序来检查汇编程序以查看是否存在优化。

PS. I know this appears as part of the question Which, if any, C++ compilers do tail-recursion optimization? from 5 months ago. However, I don't think this part of that question was answered satisfactorily. (The answer there was "The easiest way to check if the compiler did the optimization (that I know of) is perform a call that would otherwise result in a stack overflow – or looking at the assembly output.")

PS。我知道这似乎是问题的一部分,如果有的话,C ++编译器会进行尾递归优化吗?从5个月前。但是,我不认为这个问题的这一部分得到了令人满意的答复。 (答案是“检查编译器是否进行了优化(我知道)的最简单方法是执行调用,否则会导致堆栈溢出 - 或者查看汇编输出。”)

8 个解决方案

#1

Let's use the example code from the other question. Compile it, but tell gcc not to assemble:

让我们使用其他问题的示例代码。编译它,但告诉gcc不要汇编:

gcc -std=c99 -S -O2 test.c

Now let's look at the _atoi function in the resultant test.s file (gcc 4.0.1 on Mac OS 10.5):

现在让我们看看生成的test.s文件中的_atoi函数(Mac OS 10.5上的gcc 4.0.1):

        .text
        .align 4,0x90
_atoi:
        pushl   %ebp
        testl   %eax, %eax
        movl    %esp, %ebp
        movl    %eax, %ecx
        je      L3
        .align 4,0x90
L5:
        movzbl  (%ecx), %eax
        testb   %al, %al
        je      L3
        leal    (%edx,%edx,4), %edx
        movsbl  %al,%eax
        incl    %ecx
        leal    -48(%eax,%edx,2), %edx
        jne     L5
        .align 4,0x90
L3:
        leave
        movl    %edx, %eax
        ret

The compiler has performed tail-call optimization on this function. We can tell because there is no call instruction in that code whereas the original C code clearly had a function call. Furthermore, we can see the jne L5 instruction, which jumps backward in the function, indicating a loop when there was clearly no loop in the C code. If you recompile with optimization turned off, you'll see a line that says call _atoi, and you also won't see any backward jumps.

编译器已对此函数执行尾调用优化。我们可以判断,因为该代码中没有调用指令,而原始的C代码显然有一个函数调用。此外,我们可以看到jne L5指令,它在函数中向后跳转,表示当C代码中没有明显的循环时的循环。如果你在关闭优化的情况下重新编译,你会看到一条叫做_atoi的行,你也看不到任何向后跳跃。

Whether you can automate this is another matter. The specifics of the assembler code will depend on the code you're compiling.

是否可以自动化这是另一回事。汇编代码的细节取决于您正在编译的代码。

You could discover it programmatically, I think. Make the function print out the current value of the stack pointer (register ESP on x86). If the function prints the same value for the first call as it does for the recursive call, then the compiler has performed the tail-call optimization. This idea requires modifying the function you hope to observe, though, and that might affect how the compiler chooses to optimize the function. If the test succeeds (prints the same ESP value both times), then I think it's reasonable to assume that the optimization would also be performed without your instrumentation, but if the test fails, we won't know whether the failure was due to the addition of the instrumentation code.

我想你可以通过编程方式发现它。使该函数打印出堆栈指针的当前值(在x86上注册ESP)。如果函数为第一次调用打印的值与递归调用的值相同,则编译器已执行尾调用优化。这个想法需要修改你希望观察到的功能,这可能会影响编译器选择优化函数的方式。如果测试成功(两次打印相同的ESP值),那么我认为假设优化也将在没有您的仪器的情况下执行是合理的,但如果测试失败,我们将无法知道失败是否是由于添加了仪器代码。

#2

EDIT My original post also prevented GCC from actually doing tail call eliminations. I've added some additional trickiness below that fools GCC into doing tail call elimination anyways.

编辑我的原始帖子也阻止了GCC实际进行尾部呼叫抵消。我在傻瓜GCC下面添加了一些额外的技巧,无论如何都要进行尾部调用消除。

Expanding on Steven's answer, you can programmatically check to see if you have the same stack frame:

扩展Steven的答案,您可以以编程方式检查是否具有相同的堆栈帧:

#include <stdio.h>

// We need to get a reference to the stack without spooking GCC into turning
// off tail-call elimination
int oracle2(void) { 
    char oracle; int oracle2 = (int)&oracle; return oracle2; 
}

void myCoolFunction(params, ..., int tailRecursionCheck) {
    int oracle = oracle2();
    if( tailRecursionCheck && tailRecursionCheck != oracle ) {
        printf("GCC did not optimize this call.\n");
    }
    // ... more code ...
    // The return is significant... GCC won't eliminate the call otherwise
    return myCoolFunction( ..., oracle);
}

int main(int argc, char *argv[]) {
    myCoolFunction(..., 0);
    return 0;
}

When calling the function non-recursively, pass in 0 the check parameter. Otherwise pass in oracle. If a tail recursive call that should've been eliminated was not, then you'll be informed at runtime.

在非递归地调用函数时,将check参数传入0。否则传入oracle。如果应该已经消除的尾递归调用不是,那么将在运行时通知您。

When testing this out, it looks like my version of GCC does not optimize the first tail call, but the remaining tail calls are optimized. Interesting.

测试时,看起来我的GCC版本没有优化第一个尾调用,但剩余的尾调用已经过优化。有趣。

#3

Look at the generated assembly code and see if it uses a call or jmp instruction for the recursive call on x86 (for other architectures, look up the corresponding instructions). You can use nm and objdump to get just the assembly corresponding to your function. Consider the following function:

查看生成的汇编代码,看看它是否在x86上使用call或jmp指令进行递归调用(对于其他体系结构,查找相应的指令)。您可以使用nm和objdump来获取与您的函数对应的程序集。考虑以下功能:

int fact(int n)
{
  return n <= 1 ? 1 : n * fact(n-1);
}

Compile as

gcc fact.c -c -o fact.o -O2

Then, to test if it's using tail recursion:

然后,测试它是否使用尾递归:

# get starting address and size of function fact from nm
ADDR=$(nm --print-size --radix=d fact.o | grep ' fact$' | cut -d ' ' -f 1,2)
# strip leading 0's to avoid being interpreted by objdump as octal addresses
STARTADDR=$(echo $ADDR | cut -d ' ' -f 1 | sed 's/^0*\(.\)/\1/')
SIZE=$(echo $ADDR | cut -d ' ' -f 2 | sed 's/^0*//')
STOPADDR=$(( $STARTADDR + $SIZE ))

# now disassemble the function and look for an instruction of the form
# call addr <fact+offset>
if objdump --disassemble fact.o --start-address=$STARTADDR --stop-address=$STOPADDR | \
    grep -qE 'call +[0-9a-f]+ <fact\+'
then
    echo "fact is NOT tail recursive"
else
    echo "fact is tail recursive"
fi

When ran on the above function, this script prints "fact is tail recursive". When instead compiled with -O3 instead of -O2, this curiously prints "fact is NOT tail recursive".

当运行上面的函数时,这个脚本打印“fact is tail recursive”。当使用-O3而不是-O2编译时,这奇怪地打印出“事实不是尾递归”。

Note that this might yield false negatives, as ehemient pointed out in his comment. This script will only yield the right answer if the function contains no recursive calls to itself at all, and it also doesn't detect sibling recursion (e.g. where A() calls B() which calls A()). I can't think of a more robust method at the moment that doesn't involve having a human look at the generated assembly, but at least you can use this script to easily grab the assembly corresponding to a particular function within an object file.

请注意,这可能会产生假阴性,正如他的评论中指出的那样。如果函数根本不包含对自身的递归调用,则此脚本只会产生正确的答案,并且它也不会检测同级递归(例如,A()调用调用A()的B()。我现在想不出一个更强大的方法,不需要人工查看生成的程序集,但至少可以使用此脚本轻松获取对应于目标文件中特定函数的程序集。

#4

Expanding on PolyThinker's answer, here's a concrete example.

扩展PolyThinker的答案,这是一个具体的例子。

int foo(int a, int b) {
    if (a && b)
        return foo(a - 1, b - 1);
    return a + b;
}

i686-pc-linux-gnu-gcc-4.3.2 -Os -fno-optimize-sibling-calls output:

i686-pc-linux-gnu-gcc-4.3.2 -Os -fno-optimize-sibling-calls输出:

00000000 <foo>:
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   8b 55 08                mov    0x8(%ebp),%edx
   6:   8b 45 0c                mov    0xc(%ebp),%eax
   9:   85 d2                   test   %edx,%edx
   b:   74 16                   je     23 <foo+0x23>
   d:   85 c0                   test   %eax,%eax
   f:   74 12                   je     23 <foo+0x23>
  11:   51                      push   %ecx
  12:   48                      dec    %eax
  13:   51                      push   %ecx
  14:   50                      push   %eax
  15:   8d 42 ff                lea    -0x1(%edx),%eax
  18:   50                      push   %eax
  19:   e8 fc ff ff ff          call   1a <foo+0x1a>
  1e:   83 c4 10                add    $0x10,%esp
  21:   eb 02                   jmp    25 <foo+0x25>
  23:   01 d0                   add    %edx,%eax
  25:   c9                      leave
  26:   c3                      ret

i686-pc-linux-gnu-gcc-4.3.2 -Os output:

i686-pc-linux-gnu-gcc-4.3.2 -Os输出:

00000000 <foo>:
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   8b 55 08                mov    0x8(%ebp),%edx
   6:   8b 45 0c                mov    0xc(%ebp),%eax
   9:   85 d2                   test   %edx,%edx
   b:   74 08                   je     15 <foo+0x15>
   d:   85 c0                   test   %eax,%eax
   f:   74 04                   je     15 <foo+0x15>
  11:   48                      dec    %eax
  12:   4a                      dec    %edx
  13:   eb f4                   jmp    9 <foo+0x9>
  15:   5d                      pop    %ebp
  16:   01 d0                   add    %edx,%eax
  18:   c3                      ret

In the first case, <foo+0x11>-<foo+0x1d> pushes arguments for a function call, while in the second case, <foo+0x11>-<foo+0x14> modifies the variables and jmps to the same function, somewhere after the preamble. That's what you want to look for.

在第一种情况下, - 推送函数调用的参数,而在第二种情况下, - 将变量和jmps修改为相同的函数,在序言之后的某个地方。这就是你想要的东西。

I don't think you can do this programatically; there's too much possible variation. The "meat" of the function may be closer to or further away from the start, and you can't distinguish that jmp from a loop or conditional without looking at it. It might be a conditional jump instead of a jmp. gcc might leave a call in for some cases but apply sibling call optimization to other cases.

我认为你不能以编程方式做到这一点;有太多可能的变化。函数的“肉”可能更接近或远离开始,并且您无法在不看它的情况下将jmp与循环或条件区分开来。它可能是条件跳转而不是jmp。 gcc可能会在某些情况下保留呼叫,但将兄弟呼叫优化应用于其他情况。

FYI, gcc's "sibling calls" is slightly more general than tail-recursive calls -- effectively, any function call where re-using the same stack frame is okay is potentially a sibling call.

仅供参考,gcc的“兄弟调用”比尾递归调用略胜一筹 - 实际上,重新使用相同堆栈帧的任何函数调用都可能是兄弟调用。

[edit]

As an example of when just looking for a self-recursive call will mislead you,

作为一个例子,当只是寻找一个自我递归的电话会误导你,

int bar(int n) {
    if (n == 0)
        return bar(bar(1));
    if (n % 2)
        return n;
    return bar(n / 2);
}

GCC will apply sibling call optimization to two out of the three bar calls. I'd still call it tail-call-optimized, since that single unoptimized call never goes further than a single level, even though you'll find a call <bar+..> in the generated assembly.

GCC将对兄弟电话优化应用于三个电话中的两个。我仍然称它为尾调用优化,因为即使您在生成的程序集中找到一个调用 ,单个未优化的调用也不会超过单个级别。

#5

i am way too lazy to look at a disassembly. Try this:

我懒得去看一个拆卸。试试这个:

void so(long l)
{
    ++l;
    so(l);
}
int main(int argc, char ** argv)
{
    so(0);
    return 0;
}

compile and run this program. If it runs forever, the tail-recursion was optimized away. if it blows the stack, it wasn't.

编译并运行该程序。如果它永远运行,尾递归被优化掉了。如果它吹了堆栈,那不是。

EDIT: sorry, read too quickly, the OP wants to know if his particular function has its tail-recursion optimized away. OK...

编辑:抱歉,读得太快,OP想知道他的特定函数是否已经优化了它的尾递归。好...

...the principle is still the same - if the tail-recursion is being optimized away, then the stack frame will remain the same. You should be able to use the backtrace function to capture the stack frames from within your function, and determine if they are growing or not. If tail recursion is being optimized away, you will have only one return pointer in the buffer.

......原理仍然相同 - 如果尾部递归被优化掉,那么堆栈帧将保持不变。您应该能够使用回溯函数从函数中捕获堆栈帧,并确定它们是否正在增长。如果正在优化尾递归,那么缓冲区中只有一个返回指针。

#6

Another way I checked this is:

我检查这个的另一种方法是:

Compile your code with 'gcc -O2'

使用'gcc -O2'编译代码

start 'gdb'
Place a breakpoint in the function you are expecting to be tail-recursion optimized/eliminated

在您希望进行尾递归优化/消除的函数中放置一个断点

run your code

运行你的代码

If it has been tail call eliminated, then the breakpoint will be hit only once or never. For more on this see this

如果它被尾部调用消除,那么断点将只被击中一次或从不。有关详细信息,请参阅此处

#7

A simple method: Build a simple tail recursion program, compile it, and dissemble it to see if it is optimized.

一个简单的方法:构建一个简单的尾递归程序,编译它,然后拆分它以查看它是否已经过优化。

Just realized that you already had that in your question. If you know how to read assembly, it's quite easy to tell. Recursive functions will call themselves (with "call label") from within the function body, and a loop will be just "jmp label".

刚刚意识到你已经在你的问题中已经有了这个。如果您知道如何阅读汇编,那么很容易理解。递归函数将在函数体内调用自身(带有“call label”),循环将只是“jmp label”。

#8

You could craft input data that would lead to stack overflow because of too deep recursion of that function calls if there were no optimization and see if it happens. Of course, this is not trivial and sometimes big enough inputs will make the function run for intolerably long period of time.

你可以制作输入数据,这会导致堆栈溢出,因为如果没有优化,那么该函数调用的递归过于深入,看看它是否发生。当然,这不是微不足道的,有时足够大的输入将使该功能在无法忍受的长时间内运行。

#1