malloc()如何引起SIGSEGV?

时间:2022-09-06 16:23:58

I have an odd bug in my program, it appears to me that malloc() is causing a SIGSEGV, which as far as my understanding goes does not make any sense. I am using a library called simclist for dynamic lists.

我的程序中有一个奇怪的bug,在我看来malloc()导致了一个SIGSEGV,就我的理解而言,这没有任何意义。我正在使用一个名为simclist的库用于动态列表。

Here is a struct that is referenced later:

这是一个稍后引用的结构:

typedef struct {
    int msgid;
    int status;
    void* udata;
    list_t queue;
} msg_t;

And here is the code:

下面是代码:

msg_t* msg = (msg_t*) malloc( sizeof( msg_t ) );

msg->msgid = msgid;
msg->status = MSG_STAT_NEW;
msg->udata = udata;
list_init( &msg->queue );

list_init is where the program fails, here is the code for list_init:

list_init是程序失败的地方,下面是list_init的代码:

/* list initialization */
int list_init(list_t *restrict l) {
    if (l == NULL) return -1;

    srandom((unsigned long)time(NULL));

    l->numels = 0;

    /* head/tail sentinels and mid pointer */
    l->head_sentinel = (struct list_entry_s *)malloc(sizeof(struct list_entry_s));
    l->tail_sentinel = (struct list_entry_s *)malloc(sizeof(struct list_entry_s));
    l->head_sentinel->next = l->tail_sentinel;
    l->tail_sentinel->prev = l->head_sentinel;
    l->head_sentinel->prev = l->tail_sentinel->next = l->mid = NULL;
    l->head_sentinel->data = l->tail_sentinel->data = NULL;

    /* iteration attributes */
    l->iter_active = 0;
    l->iter_pos = 0;
    l->iter_curentry = NULL;

    /* free-list attributes */
    l->spareels = (struct list_entry_s **)malloc(SIMCLIST_MAX_SPARE_ELEMS * sizeof(struct list_entry_s *));
    l->spareelsnum = 0;

#ifdef SIMCLIST_WITH_THREADS
    l->threadcount = 0;
#endif

    list_attributes_setdefaults(l);

    assert(list_repOk(l));
    assert(list_attrOk(l));

    return 0;
}

the line l->spareels = (struct list_entry_s **)malloc(SIMCLIST_MAX_SPARE_ELEMS * is where the SIGSEGV is caused according to the stack trace. I am using gdb/nemiver for debugging but am at a loss. The first time this function is called it works fine but it always fails the second time. How can malloc() cause a SIGSEGV?

l->spareels = (struct list_entry_s **)malloc(SIMCLIST_MAX_SPARE_ELEMS *是根据堆栈跟踪导致的SIGSEGV)。我正在使用gdb/nemiver进行调试,但不知如何是好。这个函数第一次被调用时运行良好,但第二次总是失败。malloc()如何引起SIGSEGV?

This is the stack trace:

这是堆栈跟踪:

#0  ?? () at :0
#1  malloc () at :0
#2  list_init (l=0x104f290) at src/simclist.c:205
#3  msg_new (msg_switch=0x1050dc0, msgid=8, udata=0x0) at src/msg_switch.c:218
#4  exread (sockfd=8, conn_info=0x104e0e0) at src/zimr-proxy/main.c:504
#5  zfd_select (tv_sec=0) at src/zfildes.c:124
#6  main (argc=3, argv=0x7fffcabe44f8) at src/zimr-proxy/main.c:210

Any help or insight is very appreciated!

非常感谢您的帮助和见解!

6 个解决方案

#1


25  

malloc can segfault for example when the heap is corrupted. Check that you are not writing anything beyond the bounds of any previous allocation.

malloc可以在堆损坏时分段错误。检查您没有写任何超出以前分配范围的内容。

#2


16  

Probably memory violation occurs in other part of your code. If you are on Linux, you should definitely try valgrind. I would never trust my own C programs unless it passes valgrind.

可能内存冲突发生在代码的其他部分。如果您使用的是Linux,那么您一定应该尝试一下valgrind。我永远不会相信我自己的C程序,除非它通过了valgrind。

EDIT: another useful tool is Electric fence. Glibc also provides the MALLOC_CHECK_ environmental variable to help debug memory problems. These two methods do not affect running speed as much as valgrind.

编辑:另一个有用的工具是电子围栏。Glibc还提供了MALLOC_CHECK_环境变量,以帮助调试内存问题。这两种方法对运行速度的影响不如对valgrind那么大。

#3


12  

You probably have corrupted you heap somewhere before this call by a buffer overflow or by calling free with a pointer that wasn't allocated by malloc (or that was already freed).

在此调用之前,您可能已经在某个地方损坏了堆(通过缓冲区溢出或使用malloc未分配的指针调用free)(或者已经释放的指针)。

If the internal data structures used by malloc get corrupted this way, malloc is using invalid data and might crash.

如果malloc使用的内部数据结构以这种方式被破坏,malloc将使用无效数据并可能崩溃。

#4


4  

There are a myriad ways of triggering a core dump from malloc() (and realloc() and calloc()). These include:

从malloc()(以及realloc()和calloc())触发核心转储的方法有很多。这些包括:

  • Buffer overflow: writing beyond the end of the allocated space (trampling control information that malloc() was keeping there).
  • 缓冲区溢出:在分配的空间的末尾写入(破坏malloc()保存在那里的控制信息)。
  • Buffer underflow: writing before the start of the allocated space (trampling control information that malloc() was keeping there).
  • 缓冲区欠流:在开始分配的空间之前写入(破坏malloc()保存在那里的控制信息)。
  • Freeing memory that was not allocated by malloc(). In a mixed C and C++ program, that would include freeing memory allocated in C++ by new.
  • 释放未被malloc()分配的内存。在混合的C和c++程序中,这将包括释放c++中通过new分配的内存。
  • Freeing a pointer that points part way through a memory block allocated by malloc() - which is a special case of the previous case.
  • 释放一个指针,该指针通过malloc()分配的内存块的一部分指向路径——这是前一种情况的特殊情况。
  • Freeing a pointer that was already freed - the notorious 'double free'.
  • 释放一个已经被释放的指针——臭名昭著的“双*”。

Using a diagnostic version of malloc() or enabling diagnostics in your system's standard version, may help identify some of these problems. For example, it may be able to detect small underflows and overflows (because it allocates extra space to provide a buffer zone around the space that you requested), and it can probably detect attempts to free memory that was not allocated or that was already freed or pointers part way through the allocated space - because it will store the information separately from the allocated space. The cost is that the debugging version takes more space. A really good allocator will be able to record the stack trace and line numbers to tell you where the allocation occurred in your code, or where the first free occurred.

使用malloc()的诊断版本或在系统的标准版本中启用诊断,可以帮助识别其中的一些问题。例如,它可以检测小下溢和溢出(因为它分配额外的空间提供一个缓冲地带你请求的空间),它可以检测试图释放内存时不分配或已经释放或指针部分通过分配空间——因为它会分开存储的信息分配空间。成本在于调试版本需要更多的空间。一个真正好的分配器将能够记录堆栈跟踪和行号,以告诉您分配发生在代码中的何处,或者第一个空闲发生在何处。

#5


1  

You should try to debug this code in isolation, to see if the problem is actually located where the segfault is generated. (I suspect that it is not).

您应该尝试单独调试这段代码,看看问题是否位于生成segfault的地方。(我怀疑不是)。

This means:

这意味着:

#1: Compile the code with -O0, to make sure that gdb gets correct line numbering information.

#1:使用-O0编译代码,以确保gdb获得正确的行号信息。

#2: Write a unit test which calls this part of the code.

#2:编写一个单元测试,它调用代码的这一部分。

My guess is that the code will work correctly when used separately. You can then test your other modules in the same way, until you find out what causes the bug.

我的猜测是,代码在单独使用时可以正常工作。然后可以以同样的方式测试其他模块,直到找到导致bug的原因。

Using Valgrind, as others have suggested, is also a very good idea.

正如其他人所建议的那样,使用Valgrind也是一个非常好的主意。

#6


0  

The code is problematic. If malloc returns NULL, this case is not handled correctly in your code. You simply assume that memory has been allocated for you when it actually has not been. This can cause memory corruption.

是有问题的代码。如果malloc返回NULL,那么在您的代码中不能正确地处理这种情况。您只需假设内存已经分配给您了,而实际上并没有。这会导致内存损坏。

#1


25  

malloc can segfault for example when the heap is corrupted. Check that you are not writing anything beyond the bounds of any previous allocation.

malloc可以在堆损坏时分段错误。检查您没有写任何超出以前分配范围的内容。

#2


16  

Probably memory violation occurs in other part of your code. If you are on Linux, you should definitely try valgrind. I would never trust my own C programs unless it passes valgrind.

可能内存冲突发生在代码的其他部分。如果您使用的是Linux,那么您一定应该尝试一下valgrind。我永远不会相信我自己的C程序,除非它通过了valgrind。

EDIT: another useful tool is Electric fence. Glibc also provides the MALLOC_CHECK_ environmental variable to help debug memory problems. These two methods do not affect running speed as much as valgrind.

编辑:另一个有用的工具是电子围栏。Glibc还提供了MALLOC_CHECK_环境变量,以帮助调试内存问题。这两种方法对运行速度的影响不如对valgrind那么大。

#3


12  

You probably have corrupted you heap somewhere before this call by a buffer overflow or by calling free with a pointer that wasn't allocated by malloc (or that was already freed).

在此调用之前,您可能已经在某个地方损坏了堆(通过缓冲区溢出或使用malloc未分配的指针调用free)(或者已经释放的指针)。

If the internal data structures used by malloc get corrupted this way, malloc is using invalid data and might crash.

如果malloc使用的内部数据结构以这种方式被破坏,malloc将使用无效数据并可能崩溃。

#4


4  

There are a myriad ways of triggering a core dump from malloc() (and realloc() and calloc()). These include:

从malloc()(以及realloc()和calloc())触发核心转储的方法有很多。这些包括:

  • Buffer overflow: writing beyond the end of the allocated space (trampling control information that malloc() was keeping there).
  • 缓冲区溢出:在分配的空间的末尾写入(破坏malloc()保存在那里的控制信息)。
  • Buffer underflow: writing before the start of the allocated space (trampling control information that malloc() was keeping there).
  • 缓冲区欠流:在开始分配的空间之前写入(破坏malloc()保存在那里的控制信息)。
  • Freeing memory that was not allocated by malloc(). In a mixed C and C++ program, that would include freeing memory allocated in C++ by new.
  • 释放未被malloc()分配的内存。在混合的C和c++程序中,这将包括释放c++中通过new分配的内存。
  • Freeing a pointer that points part way through a memory block allocated by malloc() - which is a special case of the previous case.
  • 释放一个指针,该指针通过malloc()分配的内存块的一部分指向路径——这是前一种情况的特殊情况。
  • Freeing a pointer that was already freed - the notorious 'double free'.
  • 释放一个已经被释放的指针——臭名昭著的“双*”。

Using a diagnostic version of malloc() or enabling diagnostics in your system's standard version, may help identify some of these problems. For example, it may be able to detect small underflows and overflows (because it allocates extra space to provide a buffer zone around the space that you requested), and it can probably detect attempts to free memory that was not allocated or that was already freed or pointers part way through the allocated space - because it will store the information separately from the allocated space. The cost is that the debugging version takes more space. A really good allocator will be able to record the stack trace and line numbers to tell you where the allocation occurred in your code, or where the first free occurred.

使用malloc()的诊断版本或在系统的标准版本中启用诊断,可以帮助识别其中的一些问题。例如,它可以检测小下溢和溢出(因为它分配额外的空间提供一个缓冲地带你请求的空间),它可以检测试图释放内存时不分配或已经释放或指针部分通过分配空间——因为它会分开存储的信息分配空间。成本在于调试版本需要更多的空间。一个真正好的分配器将能够记录堆栈跟踪和行号,以告诉您分配发生在代码中的何处,或者第一个空闲发生在何处。

#5


1  

You should try to debug this code in isolation, to see if the problem is actually located where the segfault is generated. (I suspect that it is not).

您应该尝试单独调试这段代码,看看问题是否位于生成segfault的地方。(我怀疑不是)。

This means:

这意味着:

#1: Compile the code with -O0, to make sure that gdb gets correct line numbering information.

#1:使用-O0编译代码,以确保gdb获得正确的行号信息。

#2: Write a unit test which calls this part of the code.

#2:编写一个单元测试,它调用代码的这一部分。

My guess is that the code will work correctly when used separately. You can then test your other modules in the same way, until you find out what causes the bug.

我的猜测是,代码在单独使用时可以正常工作。然后可以以同样的方式测试其他模块,直到找到导致bug的原因。

Using Valgrind, as others have suggested, is also a very good idea.

正如其他人所建议的那样,使用Valgrind也是一个非常好的主意。

#6


0  

The code is problematic. If malloc returns NULL, this case is not handled correctly in your code. You simply assume that memory has been allocated for you when it actually has not been. This can cause memory corruption.

是有问题的代码。如果malloc返回NULL,那么在您的代码中不能正确地处理这种情况。您只需假设内存已经分配给您了,而实际上并没有。这会导致内存损坏。