当进程出现分割错误时,如何在Linux中生成内核转储?

时间:2021-08-17 11:14:22

I have a process in Linux that's getting a segmentation fault. How can I tell it to generate a core dump when it fails?

我在Linux中有一个进程出现了分割错误。当它失败时,我如何告诉它生成一个核心转储?

9 个解决方案

#1


211  

This depends on what shell you are using. If you are using bash, then the ulimit command controls several settings relating to program execution, such as whether you should dump core. If you type

这取决于你用的是什么壳层。如果您正在使用bash,那么ulimit命令将控制与程序执行相关的多个设置,例如是否应该转储内核。如果你输入

ulimit -c unlimited

then that will tell bash that its programs can dump cores of any size. You can specify a size such as 52M instead of unlimited if you want, but in practice this shouldn't be necessary since the size of core files will probably never be an issue for you.

然后这会告诉bash它的程序可以转储任何大小的内核。您可以指定一个大小,比如52M而不是unlimited,但是实际上,这是不必要的,因为核心文件的大小可能永远不会成为您的问题。

In tcsh, you'd type

tcsh,你类型

limit coredumpsize unlimited

#2


45  

As explained above the real question being asked here is how to enable core dumps on a system where they are not enabled. That question is answered here.

如上所述,这里要问的真正问题是如何在不启用核心转储的系统中启用核心转储。这个问题在这里得到了解答。

If you've come here hoping to learn how to generate a core dump for a hung process, the answer is

如果您希望了解如何为挂起的进程生成核心转储,那么答案是

gcore <pid>

if gcore is not available on your system then

如果gcore在您的系统上不可用

kill -ABRT <pid>

Don't use kill -SEGV as that will often invoke a signal handler making it harder to diagnose the stuck process

不要使用kill -SEGV,因为这通常会调用一个信号处理程序,从而使诊断陷入困境的进程更加困难

#3


22  

What I did at the end was attach gdb to the process before it crashed, and then when it got the segfault I executed the generate-core-file command. That forced generation of a core dump.

我最后所做的是在进程崩溃之前将gdb附加到进程中,然后当它获得segfault时,我执行生成内核文件命令。强制生成一个核心转储文件。

#4


19  

Maybe you could do it this way, this program is a demonstration of how to trap a segmentation fault and shells out to a debugger (this is the original code used under AIX) and prints the stack trace up to the point of a segmentation fault. You will need to change the sprintf variable to use gdb in the case of Linux.

也许您可以这样做,这个程序演示了如何捕获分段错误并将其释放到调试器(这是AIX中使用的原始代码),并将堆栈跟踪输出到分段错误的点。您将需要更改sprintf变量,以便在Linux中使用gdb。

#include <stdio.h>
#include <signal.h>
#include <stdlib.h>
#include <stdarg.h>

static void signal_handler(int);
static void dumpstack(void);
static void cleanup(void);
void init_signals(void);
void panic(const char *, ...);

struct sigaction sigact;
char *progname;

int main(int argc, char **argv) {
    char *s;
    progname = *(argv);
    atexit(cleanup);
    init_signals();
    printf("About to seg fault by assigning zero to *s\n");
    *s = 0;
    sigemptyset(&sigact.sa_mask);
    return 0;
}

void init_signals(void) {
    sigact.sa_handler = signal_handler;
    sigemptyset(&sigact.sa_mask);
    sigact.sa_flags = 0;
    sigaction(SIGINT, &sigact, (struct sigaction *)NULL);

    sigaddset(&sigact.sa_mask, SIGSEGV);
    sigaction(SIGSEGV, &sigact, (struct sigaction *)NULL);

    sigaddset(&sigact.sa_mask, SIGBUS);
    sigaction(SIGBUS, &sigact, (struct sigaction *)NULL);

    sigaddset(&sigact.sa_mask, SIGQUIT);
    sigaction(SIGQUIT, &sigact, (struct sigaction *)NULL);

    sigaddset(&sigact.sa_mask, SIGHUP);
    sigaction(SIGHUP, &sigact, (struct sigaction *)NULL);

    sigaddset(&sigact.sa_mask, SIGKILL);
    sigaction(SIGKILL, &sigact, (struct sigaction *)NULL);
}

static void signal_handler(int sig) {
    if (sig == SIGHUP) panic("FATAL: Program hanged up\n");
    if (sig == SIGSEGV || sig == SIGBUS){
        dumpstack();
        panic("FATAL: %s Fault. Logged StackTrace\n", (sig == SIGSEGV) ? "Segmentation" : ((sig == SIGBUS) ? "Bus" : "Unknown"));
    }
    if (sig == SIGQUIT) panic("QUIT signal ended program\n");
    if (sig == SIGKILL) panic("KILL signal ended program\n");
    if (sig == SIGINT) ;
}

void panic(const char *fmt, ...) {
    char buf[50];
    va_list argptr;
    va_start(argptr, fmt);
    vsprintf(buf, fmt, argptr);
    va_end(argptr);
    fprintf(stderr, buf);
    exit(-1);
}

static void dumpstack(void) {
    /* Got this routine from http://www.whitefang.com/unix/faq_toc.html
    ** Section 6.5. Modified to redirect to file to prevent clutter
    */
    /* This needs to be changed... */
    char dbx[160];

    sprintf(dbx, "echo 'where\ndetach' | dbx -a %d > %s.dump", getpid(), progname);
    /* Change the dbx to gdb */

    system(dbx);
    return;
}

void cleanup(void) {
    sigemptyset(&sigact.sa_mask);
    /* Do any cleaning up chores here */
}

You may have to additionally add a parameter to get gdb to dump the core as shown here in this blog here.

您可能需要添加一个参数来让gdb转储内核,如图所示。

#5


17  

To check where the core dumps are generated, run:

要检查生成核心转储的位置,请运行:

sysctl kernel.core_pattern

where %e is the process name and %t the system time. You can change it in /etc/sysctl.conf and reloading by sysctl -p.

其中%e是进程名,%t是系统时间。您可以在/etc/sysctl中更改它。用sysctl -p重载。

If the core files are not generated (test it by: sleep 10 & and killall -SIGSEGV sleep), check the limits by: ulimit -a.

如果没有生成核心文件(测试它:sleep 10 &和killall -SIGSEGV睡眠),请检查限制:ulimit -a。

If your core file size is limited, run:

如果您的核心文件大小有限,请运行:

ulimit -c unlimited

to make it unlimited.

让它无限。

Then test again, if the core dumping is successful, you will see “(core dumped)” after the segmentation fault indication as below:

然后再次测试,如果核心转储成功,分割故障指示后您将看到“(核心转储)”如下:

Segmentation fault: 11 (core dumped)

分割故障:11(核心转储)


Ubuntu

In Ubuntu usually the dumps are handled by apport in /var/crash/, but in different format, however it's not enabled by default in stable releases. Read more at Ubuntu wiki.

在Ubuntu中,转储通常由/var/crash/中的apport处理,但是格式不同,但是在稳定的版本中默认情况下不会启用。阅读更多的Ubuntu wiki。

It uses core_pattern to directly pipe the core dump into apport:

它使用core_pattern直接将核心转储导入apport:

$ cat /proc/sys/kernel/core_pattern
|/usr/share/apport/apport %p %s %c

So even core files are disabled by ulimit, apport will still capture the crash (How do I enable or disable Apport?).

所以即使核心文件被ulimit禁用,apport仍然会捕获崩溃(我如何启用或禁用apport ?)


macOS

For macOS, see: How to generate core dumps in Mac OS X?

对于macOS,请参见:如何在Mac OS X中生成核心转储文件?

#6


15  

There are more things that may influence the generation of a core dump. I encountered these:

有更多的东西可能会影响核心转储的生成。我遇到了这些:

  • the directory for the dump must be writable. By default this is the current directory of the process, but that may be changed by setting /proc/sys/kernel/core_pattern.
  • 转储的目录必须是可写的。默认情况下,这是进程的当前目录,但是可以通过设置/proc/sys/kernel/core_pattern来更改这个目录。
  • in some conditions, the kernel value in /proc/sys/fs/suid_dumpable may prevent the core to be generated.
  • 在某些情况下,/proc/sys/fs/suid_dumpable中的内核值可能会阻止生成内核。

There are more situations which may prevent the generation that are described in the man page - try man core.

有更多的情况可能会阻止在man页面中描述的生成- try man core。

#7


8  

In order to activate the core dump do the following:

为了激活核心转储,请执行以下操作:

  1. In /etc/profile comment the line:

    在/etc/profile中评论行:

    # ulimit -S -c 0 > /dev/null 2>&1
    
  2. In /etc/security/limits.conf comment out the line:

    在/etc/security/limits.conf注释行:

    *               soft    core            0
    
  3. execute the cmd limit coredumpsize unlimited and check it with cmd limit:

    执行cmd限值coredumpsize不限值,用cmd限值检查:

    # limit coredumpsize unlimited
    # limit
    cputime      unlimited
    filesize     unlimited
    datasize     unlimited
    stacksize    10240 kbytes
    coredumpsize unlimited
    memoryuse    unlimited
    vmemoryuse   unlimited
    descriptors  1024
    memorylocked 32 kbytes
    maxproc      528383
    #
    
  4. to check if the corefile gets written you can kill the relating process with cmd kill -s SEGV <PID> (should not be needed, just in case no core file gets written this can be used as a check):

    为了检查corefile是否被写入,您可以使用cmd kill -s SEGV 来终止相关进程(不需要,只是在没有corefile被写入的情况下,这可以用作检查):

    # kill -s SEGV <PID>
    

Once the corefile has been written make sure to deactivate the coredump settings again in the relating files (1./2./3.) !

一旦编写了corefile,请确保在相关文件(1./2./3)中再次禁用coredump设置!

#8


6  

For Ubuntu 14.04

对于Ubuntu 14.04

  1. Check core dump enabled:

    检查核心转储启用:

    ulimit -a
    
  2. One of the lines should be :

    其中一行应是:

    core file size          (blocks, -c) unlimited
    
  3. If not :

    如果不是:

    gedit ~/.bashrc and add ulimit -c unlimited to end of file and save, re-run terminal.

    中~ /。将ulimit -c无限制添加到文件末尾并保存,重新运行终端。

  4. Build your application with debug information :

    使用调试信息构建您的应用程序:

    In Makefile -O0 -g

    在Makefile o0 - g

  5. Run application that create core dump (core dump file with name ‘core’ should be created near application_name file):

    运行创建核心转储的应用程序(应该在application_name文件附近创建名为“core”的核心转储文件):

    ./application_name
    
  6. Run under gdb:

    在gdb下运行:

    gdb application_name core
    

#9


4  

By default you will get a core file. Check to see that the current directory of the process is writable, or no core file will be created.

默认情况下,您将获得一个核心文件。检查进程的当前目录是否可写入,否则不会创建任何核心文件。

#1


211  

This depends on what shell you are using. If you are using bash, then the ulimit command controls several settings relating to program execution, such as whether you should dump core. If you type

这取决于你用的是什么壳层。如果您正在使用bash,那么ulimit命令将控制与程序执行相关的多个设置,例如是否应该转储内核。如果你输入

ulimit -c unlimited

then that will tell bash that its programs can dump cores of any size. You can specify a size such as 52M instead of unlimited if you want, but in practice this shouldn't be necessary since the size of core files will probably never be an issue for you.

然后这会告诉bash它的程序可以转储任何大小的内核。您可以指定一个大小,比如52M而不是unlimited,但是实际上,这是不必要的,因为核心文件的大小可能永远不会成为您的问题。

In tcsh, you'd type

tcsh,你类型

limit coredumpsize unlimited

#2


45  

As explained above the real question being asked here is how to enable core dumps on a system where they are not enabled. That question is answered here.

如上所述,这里要问的真正问题是如何在不启用核心转储的系统中启用核心转储。这个问题在这里得到了解答。

If you've come here hoping to learn how to generate a core dump for a hung process, the answer is

如果您希望了解如何为挂起的进程生成核心转储,那么答案是

gcore <pid>

if gcore is not available on your system then

如果gcore在您的系统上不可用

kill -ABRT <pid>

Don't use kill -SEGV as that will often invoke a signal handler making it harder to diagnose the stuck process

不要使用kill -SEGV,因为这通常会调用一个信号处理程序,从而使诊断陷入困境的进程更加困难

#3


22  

What I did at the end was attach gdb to the process before it crashed, and then when it got the segfault I executed the generate-core-file command. That forced generation of a core dump.

我最后所做的是在进程崩溃之前将gdb附加到进程中,然后当它获得segfault时,我执行生成内核文件命令。强制生成一个核心转储文件。

#4


19  

Maybe you could do it this way, this program is a demonstration of how to trap a segmentation fault and shells out to a debugger (this is the original code used under AIX) and prints the stack trace up to the point of a segmentation fault. You will need to change the sprintf variable to use gdb in the case of Linux.

也许您可以这样做,这个程序演示了如何捕获分段错误并将其释放到调试器(这是AIX中使用的原始代码),并将堆栈跟踪输出到分段错误的点。您将需要更改sprintf变量,以便在Linux中使用gdb。

#include <stdio.h>
#include <signal.h>
#include <stdlib.h>
#include <stdarg.h>

static void signal_handler(int);
static void dumpstack(void);
static void cleanup(void);
void init_signals(void);
void panic(const char *, ...);

struct sigaction sigact;
char *progname;

int main(int argc, char **argv) {
    char *s;
    progname = *(argv);
    atexit(cleanup);
    init_signals();
    printf("About to seg fault by assigning zero to *s\n");
    *s = 0;
    sigemptyset(&sigact.sa_mask);
    return 0;
}

void init_signals(void) {
    sigact.sa_handler = signal_handler;
    sigemptyset(&sigact.sa_mask);
    sigact.sa_flags = 0;
    sigaction(SIGINT, &sigact, (struct sigaction *)NULL);

    sigaddset(&sigact.sa_mask, SIGSEGV);
    sigaction(SIGSEGV, &sigact, (struct sigaction *)NULL);

    sigaddset(&sigact.sa_mask, SIGBUS);
    sigaction(SIGBUS, &sigact, (struct sigaction *)NULL);

    sigaddset(&sigact.sa_mask, SIGQUIT);
    sigaction(SIGQUIT, &sigact, (struct sigaction *)NULL);

    sigaddset(&sigact.sa_mask, SIGHUP);
    sigaction(SIGHUP, &sigact, (struct sigaction *)NULL);

    sigaddset(&sigact.sa_mask, SIGKILL);
    sigaction(SIGKILL, &sigact, (struct sigaction *)NULL);
}

static void signal_handler(int sig) {
    if (sig == SIGHUP) panic("FATAL: Program hanged up\n");
    if (sig == SIGSEGV || sig == SIGBUS){
        dumpstack();
        panic("FATAL: %s Fault. Logged StackTrace\n", (sig == SIGSEGV) ? "Segmentation" : ((sig == SIGBUS) ? "Bus" : "Unknown"));
    }
    if (sig == SIGQUIT) panic("QUIT signal ended program\n");
    if (sig == SIGKILL) panic("KILL signal ended program\n");
    if (sig == SIGINT) ;
}

void panic(const char *fmt, ...) {
    char buf[50];
    va_list argptr;
    va_start(argptr, fmt);
    vsprintf(buf, fmt, argptr);
    va_end(argptr);
    fprintf(stderr, buf);
    exit(-1);
}

static void dumpstack(void) {
    /* Got this routine from http://www.whitefang.com/unix/faq_toc.html
    ** Section 6.5. Modified to redirect to file to prevent clutter
    */
    /* This needs to be changed... */
    char dbx[160];

    sprintf(dbx, "echo 'where\ndetach' | dbx -a %d > %s.dump", getpid(), progname);
    /* Change the dbx to gdb */

    system(dbx);
    return;
}

void cleanup(void) {
    sigemptyset(&sigact.sa_mask);
    /* Do any cleaning up chores here */
}

You may have to additionally add a parameter to get gdb to dump the core as shown here in this blog here.

您可能需要添加一个参数来让gdb转储内核,如图所示。

#5


17  

To check where the core dumps are generated, run:

要检查生成核心转储的位置,请运行:

sysctl kernel.core_pattern

where %e is the process name and %t the system time. You can change it in /etc/sysctl.conf and reloading by sysctl -p.

其中%e是进程名,%t是系统时间。您可以在/etc/sysctl中更改它。用sysctl -p重载。

If the core files are not generated (test it by: sleep 10 & and killall -SIGSEGV sleep), check the limits by: ulimit -a.

如果没有生成核心文件(测试它:sleep 10 &和killall -SIGSEGV睡眠),请检查限制:ulimit -a。

If your core file size is limited, run:

如果您的核心文件大小有限,请运行:

ulimit -c unlimited

to make it unlimited.

让它无限。

Then test again, if the core dumping is successful, you will see “(core dumped)” after the segmentation fault indication as below:

然后再次测试,如果核心转储成功,分割故障指示后您将看到“(核心转储)”如下:

Segmentation fault: 11 (core dumped)

分割故障:11(核心转储)


Ubuntu

In Ubuntu usually the dumps are handled by apport in /var/crash/, but in different format, however it's not enabled by default in stable releases. Read more at Ubuntu wiki.

在Ubuntu中,转储通常由/var/crash/中的apport处理,但是格式不同,但是在稳定的版本中默认情况下不会启用。阅读更多的Ubuntu wiki。

It uses core_pattern to directly pipe the core dump into apport:

它使用core_pattern直接将核心转储导入apport:

$ cat /proc/sys/kernel/core_pattern
|/usr/share/apport/apport %p %s %c

So even core files are disabled by ulimit, apport will still capture the crash (How do I enable or disable Apport?).

所以即使核心文件被ulimit禁用,apport仍然会捕获崩溃(我如何启用或禁用apport ?)


macOS

For macOS, see: How to generate core dumps in Mac OS X?

对于macOS,请参见:如何在Mac OS X中生成核心转储文件?

#6


15  

There are more things that may influence the generation of a core dump. I encountered these:

有更多的东西可能会影响核心转储的生成。我遇到了这些:

  • the directory for the dump must be writable. By default this is the current directory of the process, but that may be changed by setting /proc/sys/kernel/core_pattern.
  • 转储的目录必须是可写的。默认情况下,这是进程的当前目录,但是可以通过设置/proc/sys/kernel/core_pattern来更改这个目录。
  • in some conditions, the kernel value in /proc/sys/fs/suid_dumpable may prevent the core to be generated.
  • 在某些情况下,/proc/sys/fs/suid_dumpable中的内核值可能会阻止生成内核。

There are more situations which may prevent the generation that are described in the man page - try man core.

有更多的情况可能会阻止在man页面中描述的生成- try man core。

#7


8  

In order to activate the core dump do the following:

为了激活核心转储,请执行以下操作:

  1. In /etc/profile comment the line:

    在/etc/profile中评论行:

    # ulimit -S -c 0 > /dev/null 2>&1
    
  2. In /etc/security/limits.conf comment out the line:

    在/etc/security/limits.conf注释行:

    *               soft    core            0
    
  3. execute the cmd limit coredumpsize unlimited and check it with cmd limit:

    执行cmd限值coredumpsize不限值,用cmd限值检查:

    # limit coredumpsize unlimited
    # limit
    cputime      unlimited
    filesize     unlimited
    datasize     unlimited
    stacksize    10240 kbytes
    coredumpsize unlimited
    memoryuse    unlimited
    vmemoryuse   unlimited
    descriptors  1024
    memorylocked 32 kbytes
    maxproc      528383
    #
    
  4. to check if the corefile gets written you can kill the relating process with cmd kill -s SEGV <PID> (should not be needed, just in case no core file gets written this can be used as a check):

    为了检查corefile是否被写入,您可以使用cmd kill -s SEGV 来终止相关进程(不需要,只是在没有corefile被写入的情况下,这可以用作检查):

    # kill -s SEGV <PID>
    

Once the corefile has been written make sure to deactivate the coredump settings again in the relating files (1./2./3.) !

一旦编写了corefile,请确保在相关文件(1./2./3)中再次禁用coredump设置!

#8


6  

For Ubuntu 14.04

对于Ubuntu 14.04

  1. Check core dump enabled:

    检查核心转储启用:

    ulimit -a
    
  2. One of the lines should be :

    其中一行应是:

    core file size          (blocks, -c) unlimited
    
  3. If not :

    如果不是:

    gedit ~/.bashrc and add ulimit -c unlimited to end of file and save, re-run terminal.

    中~ /。将ulimit -c无限制添加到文件末尾并保存,重新运行终端。

  4. Build your application with debug information :

    使用调试信息构建您的应用程序:

    In Makefile -O0 -g

    在Makefile o0 - g

  5. Run application that create core dump (core dump file with name ‘core’ should be created near application_name file):

    运行创建核心转储的应用程序(应该在application_name文件附近创建名为“core”的核心转储文件):

    ./application_name
    
  6. Run under gdb:

    在gdb下运行:

    gdb application_name core
    

#9


4  

By default you will get a core file. Check to see that the current directory of the process is writable, or no core file will be created.

默认情况下,您将获得一个核心文件。检查进程的当前目录是否可写入,否则不会创建任何核心文件。