mmap、msync和linux进程终止

时间:2022-09-06 18:46:37

I want to use mmap to implement persistence of certain portions of program state in a C program running under Linux by associating a fixed-size struct with a well known file name using mmap() with the MAP_SHARED flag set. For performance reasons, I would prefer not to call msync() at all, and no other programs will be accessing this file. When my program terminates and is restarted, it will map the same file again and do some processing on it to recover the state that it was in before the termination. My question is this: if I never call msync() on the file descriptor, will the kernel guarantee that all updates to the memory will get written to disk and be subsequently recoverable even if my process is terminated with SIGKILL? Also, will there be general system overhead from the kernel periodically writing the pages to disk even if my program never calls msync()?

我想用mmap实现持久性的某些部分程序状态的C程序运行在Linux下通过将一个固定大小结构与一个众所周知的文件名用mmap()设置了MAP_SHARED标志。由于性能的原因,我宁愿不叫msync(),和没有其他项目将访问这个文件。当我的程序终止并重新启动时,它将再次映射相同的文件并对其进行一些处理,以恢复终止前的状态。我的问题是:如果我从来没有在文件描述符上调用msync(),那么内核会保证所有对内存的更新都被写到磁盘上,并且即使我的进程被SIGKILL终止了,它也会被恢复吗?而且,即使我的程序从来不调用msync(),也会有从内核定期写入到磁盘的常规系统开销吗?

EDIT: I've settled the problem of whether the data is written, but I'm still not sure about whether this will cause some unexpected system loading over trying to handle this problem with open()/write()/fsync() and taking the risk that some data might be lost if the process gets hit by KILL/SEGV/ABRT/etc. Added a 'linux-kernel' tag in hopes that some knowledgeable person might chime in.

编辑:我已经解决了数据是否被写入的问题,但是我仍然不确定这是否会导致一些意想不到的系统加载,因为我试图用open()/write()/fsync()()()处理这个问题,如果进程被KILL/SEGV/ABRT/等等击中,可能会丢失一些数据。添加了一个“linux-kernel”标签,希望一些有知识的人加入进来。

6 个解决方案

#1


16  

I found a comment from Linus Torvalds that answers this question http://www.realworldtech.com/forum/?threadid=113923&curpostid=114068

我找到了Linus Torvalds的一条评论,回答了这个问题:http://www.realworldtech.com/forum/?

The mapped pages are part of the filesystem cache, which means that even if the user process that made a change to that page dies, the page is still managed by the kernel and as all concurrent accesses to that file will go through the kernel, other processes will get served from that cache. In some old Linux kernels it was different, that's the reason why some kernel documents still tell to force msync.

映射的页面文件系统缓存的一部分,这意味着即使用户进程,改变了页面死了,该页面仍然是由内核和管理所有并发访问该文件将会通过内核,其他进程会从缓存。在一些旧的Linux内核中,它是不同的,这就是为什么一些内核文档仍然要求强制msync的原因。

EDIT: Thanks RobH corrected the link.

编辑:感谢RobH更正链接。

#2


13  

I decided to be less lazy and answer the question of whether the data is written to disk definitively by writing some code. The answer is that it will be written.

我决定不那么懒惰,并回答这个问题:是否通过编写一些代码将数据写入磁盘。答案是它会被写出来。

Here is a program that kills itself abruptly after writing some data to an mmap'd file:

这是一个程序,它在将一些数据写入mmap文件后突然死亡:

#include <stdint.h>
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>

typedef struct {
  char data[100];
  uint16_t count;
} state_data;

const char *test_data = "test";

int main(int argc, const char *argv[]) {
  int fd = open("test.mm", O_RDWR|O_CREAT|O_TRUNC, (mode_t)0700);
  if (fd < 0) {
    perror("Unable to open file 'test.mm'");
    exit(1);
  }
  size_t data_length = sizeof(state_data);
  if (ftruncate(fd, data_length) < 0) {
    perror("Unable to truncate file 'test.mm'");
    exit(1);
  }
  state_data *data = (state_data *)mmap(NULL, data_length, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, fd, 0);
  if (MAP_FAILED == data) {
    perror("Unable to mmap file 'test.mm'");
    close(fd);
    exit(1);
  }
  memset(data, 0, data_length);
  for (data->count = 0; data->count < 5; ++data->count) {
    data->data[data->count] = test_data[data->count];
  }
  kill(getpid(), 9);
}

Here is a program that validates the resulting file after the previous program is dead:

这里有一个程序,在前一个程序死后验证结果文件:

#include <stdint.h>
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <assert.h>

typedef struct {
  char data[100];
  uint16_t count;
} state_data;

const char *test_data = "test";

int main(int argc, const char *argv[]) {
  int fd = open("test.mm", O_RDONLY);
  if (fd < 0) {
    perror("Unable to open file 'test.mm'");
    exit(1);
  }
  size_t data_length = sizeof(state_data);
  state_data *data = (state_data *)mmap(NULL, data_length, PROT_READ, MAP_SHARED|MAP_POPULATE, fd, 0);
  if (MAP_FAILED == data) {
    perror("Unable to mmap file 'test.mm'");
    close(fd);
    exit(1);
  }
  assert(5 == data->count);
  unsigned index;
  for (index = 0; index < 4; ++index) {
    assert(test_data[index] == data->data[index]);
  }
  printf("Validated\n");
}

#3


9  

I found something adding to my confusion:

我发现了一些让我更加困惑的东西:

munmap does not affect the object that was mappedthat is, the call to munmap does not cause the contents of the mapped region to be written to the disk file. The updating of the disk file for a MAP_SHARED region happens automatically by the kernel's virtual memory algorithm as we store into the memory-mapped region.

munmap不影响mappedobject,也就是说,对munmap的调用不会导致映射区域的内容被写入磁盘文件。当我们存储到内存映射区域时,内核的虚拟内存算法会自动更新MAP_SHARED区域的磁盘文件。

this is excerpted from Advanced Programming in the UNIX® Environment.

这是摘自高级编程在UNIX®环境中。

from the linux manpage:

从linux从:

MAP_SHARED Share this mapping with all other processes that map this object. Storing to the region is equiva-lent to writing to the file. The file may not actually be updated until msync(2) or munmap(2) are called.

MAP_SHARED与映射此对象的所有其他进程共享此映射。存储到该区域相当于向文件写入。在调用msync(2)或munmap(2)之前,文件实际上可能不会被更新。

the two seem contradictory. is APUE wrong?

这两个似乎矛盾的。APUE是错误的吗?

#4


4  

I didnot find a very precise answer to your question so decided add one more:

我找不到你问题的确切答案,所以我决定再加一个:

  1. Firstly about losing data, using write or mmap/memcpy mechanisms both writes to page cache and are synced to underlying storage in background by OS based on its page replacement settings/algo. For example linux has vm.dirty_writeback_centisecs which determines which pages are considered "old" to be flushed to disk. Now even if your process dies after the write call has succeeded, the data would not be lost as the data is already present in kernel pages which will eventually be written to storage. The only case you would lose data is if OS itself crashes (kernel panic, power off etc). The way to absolutely make sure your data has reached storage would be call fsync or msync (for mmapped regions) as the case might be.
  2. 首先是关于丢失数据,使用写或mmap/memcpy机制都写到页面缓存,并由OS基于其页面替换设置/algo同步到后台存储。例如,linux有vm。dirty_writeback_centisecs确定哪些页面被认为是“旧的”,需要刷新到磁盘。现在,即使您的进程在写调用成功之后死亡,数据也不会丢失,因为数据已经出现在内核页面中,最终将被写到存储中。唯一可能丢失数据的情况是操作系统本身崩溃(内核恐慌、关机等)。确保数据已经到达存储的方法是调用fsync或msync(用于映射区域)。
  3. About the system load concern, yes calling msync/fsync for each request is going to slow your throughput drastically, so do that only if you have to. Remember you are really protecting against losing data on OS crashes which I would assume is rare and probably something most could live with. One general optimization done is to issue sync at regular intervals say 1 sec to get a good balance.
  4. 关于系统负载问题,对每个请求调用msync/fsync会显著降低吞吐量,所以只有在必要时才这么做。请记住,您确实在保护操作系统崩溃时不会丢失数据,我认为这种情况很少见,而且可能是大多数人都能忍受的。一种通用的优化方法是按一定的间隔(比如1秒)进行同步,以获得良好的平衡。

#5


1  

Either the Linux manpage information is incorrect or Linux is horribly non-conformant. msync is not supposed to have anything to do with whether the changes are committed to the logical state of the file, or whether other processes using mmap or read to access the file see the changes; it's purely an analogue of fsync and should be treated as a no-op except for the purposes of ensuring data integrity in the event of power failure or other hardware-level failure.

要么是Linux手册信息不正确,要么是Linux严重不符合。msync不应该与更改是否提交到文件的逻辑状态,或者其他使用mmap或read访问文件的进程是否看到更改有关;它纯粹是fsync的一个类似物,应该被视为一个禁忌,除非是为了在电源故障或其他硬件级故障时确保数据完整性。

#6


-1  

According to the manpage,

根据从,

The file may not actually be updated until msync(2) or munmap() is called.

在msync(2)或munmap()被调用之前,文件可能不会被更新。

So you will need to make sure you call munmap() prior to exiting at the very least.

因此,您至少需要在退出之前调用munmap()。

#1


16  

I found a comment from Linus Torvalds that answers this question http://www.realworldtech.com/forum/?threadid=113923&curpostid=114068

我找到了Linus Torvalds的一条评论,回答了这个问题:http://www.realworldtech.com/forum/?

The mapped pages are part of the filesystem cache, which means that even if the user process that made a change to that page dies, the page is still managed by the kernel and as all concurrent accesses to that file will go through the kernel, other processes will get served from that cache. In some old Linux kernels it was different, that's the reason why some kernel documents still tell to force msync.

映射的页面文件系统缓存的一部分,这意味着即使用户进程,改变了页面死了,该页面仍然是由内核和管理所有并发访问该文件将会通过内核,其他进程会从缓存。在一些旧的Linux内核中,它是不同的,这就是为什么一些内核文档仍然要求强制msync的原因。

EDIT: Thanks RobH corrected the link.

编辑:感谢RobH更正链接。

#2


13  

I decided to be less lazy and answer the question of whether the data is written to disk definitively by writing some code. The answer is that it will be written.

我决定不那么懒惰,并回答这个问题:是否通过编写一些代码将数据写入磁盘。答案是它会被写出来。

Here is a program that kills itself abruptly after writing some data to an mmap'd file:

这是一个程序,它在将一些数据写入mmap文件后突然死亡:

#include <stdint.h>
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>

typedef struct {
  char data[100];
  uint16_t count;
} state_data;

const char *test_data = "test";

int main(int argc, const char *argv[]) {
  int fd = open("test.mm", O_RDWR|O_CREAT|O_TRUNC, (mode_t)0700);
  if (fd < 0) {
    perror("Unable to open file 'test.mm'");
    exit(1);
  }
  size_t data_length = sizeof(state_data);
  if (ftruncate(fd, data_length) < 0) {
    perror("Unable to truncate file 'test.mm'");
    exit(1);
  }
  state_data *data = (state_data *)mmap(NULL, data_length, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, fd, 0);
  if (MAP_FAILED == data) {
    perror("Unable to mmap file 'test.mm'");
    close(fd);
    exit(1);
  }
  memset(data, 0, data_length);
  for (data->count = 0; data->count < 5; ++data->count) {
    data->data[data->count] = test_data[data->count];
  }
  kill(getpid(), 9);
}

Here is a program that validates the resulting file after the previous program is dead:

这里有一个程序,在前一个程序死后验证结果文件:

#include <stdint.h>
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <assert.h>

typedef struct {
  char data[100];
  uint16_t count;
} state_data;

const char *test_data = "test";

int main(int argc, const char *argv[]) {
  int fd = open("test.mm", O_RDONLY);
  if (fd < 0) {
    perror("Unable to open file 'test.mm'");
    exit(1);
  }
  size_t data_length = sizeof(state_data);
  state_data *data = (state_data *)mmap(NULL, data_length, PROT_READ, MAP_SHARED|MAP_POPULATE, fd, 0);
  if (MAP_FAILED == data) {
    perror("Unable to mmap file 'test.mm'");
    close(fd);
    exit(1);
  }
  assert(5 == data->count);
  unsigned index;
  for (index = 0; index < 4; ++index) {
    assert(test_data[index] == data->data[index]);
  }
  printf("Validated\n");
}

#3


9  

I found something adding to my confusion:

我发现了一些让我更加困惑的东西:

munmap does not affect the object that was mappedthat is, the call to munmap does not cause the contents of the mapped region to be written to the disk file. The updating of the disk file for a MAP_SHARED region happens automatically by the kernel's virtual memory algorithm as we store into the memory-mapped region.

munmap不影响mappedobject,也就是说,对munmap的调用不会导致映射区域的内容被写入磁盘文件。当我们存储到内存映射区域时,内核的虚拟内存算法会自动更新MAP_SHARED区域的磁盘文件。

this is excerpted from Advanced Programming in the UNIX® Environment.

这是摘自高级编程在UNIX®环境中。

from the linux manpage:

从linux从:

MAP_SHARED Share this mapping with all other processes that map this object. Storing to the region is equiva-lent to writing to the file. The file may not actually be updated until msync(2) or munmap(2) are called.

MAP_SHARED与映射此对象的所有其他进程共享此映射。存储到该区域相当于向文件写入。在调用msync(2)或munmap(2)之前,文件实际上可能不会被更新。

the two seem contradictory. is APUE wrong?

这两个似乎矛盾的。APUE是错误的吗?

#4


4  

I didnot find a very precise answer to your question so decided add one more:

我找不到你问题的确切答案,所以我决定再加一个:

  1. Firstly about losing data, using write or mmap/memcpy mechanisms both writes to page cache and are synced to underlying storage in background by OS based on its page replacement settings/algo. For example linux has vm.dirty_writeback_centisecs which determines which pages are considered "old" to be flushed to disk. Now even if your process dies after the write call has succeeded, the data would not be lost as the data is already present in kernel pages which will eventually be written to storage. The only case you would lose data is if OS itself crashes (kernel panic, power off etc). The way to absolutely make sure your data has reached storage would be call fsync or msync (for mmapped regions) as the case might be.
  2. 首先是关于丢失数据,使用写或mmap/memcpy机制都写到页面缓存,并由OS基于其页面替换设置/algo同步到后台存储。例如,linux有vm。dirty_writeback_centisecs确定哪些页面被认为是“旧的”,需要刷新到磁盘。现在,即使您的进程在写调用成功之后死亡,数据也不会丢失,因为数据已经出现在内核页面中,最终将被写到存储中。唯一可能丢失数据的情况是操作系统本身崩溃(内核恐慌、关机等)。确保数据已经到达存储的方法是调用fsync或msync(用于映射区域)。
  3. About the system load concern, yes calling msync/fsync for each request is going to slow your throughput drastically, so do that only if you have to. Remember you are really protecting against losing data on OS crashes which I would assume is rare and probably something most could live with. One general optimization done is to issue sync at regular intervals say 1 sec to get a good balance.
  4. 关于系统负载问题,对每个请求调用msync/fsync会显著降低吞吐量,所以只有在必要时才这么做。请记住,您确实在保护操作系统崩溃时不会丢失数据,我认为这种情况很少见,而且可能是大多数人都能忍受的。一种通用的优化方法是按一定的间隔(比如1秒)进行同步,以获得良好的平衡。

#5


1  

Either the Linux manpage information is incorrect or Linux is horribly non-conformant. msync is not supposed to have anything to do with whether the changes are committed to the logical state of the file, or whether other processes using mmap or read to access the file see the changes; it's purely an analogue of fsync and should be treated as a no-op except for the purposes of ensuring data integrity in the event of power failure or other hardware-level failure.

要么是Linux手册信息不正确,要么是Linux严重不符合。msync不应该与更改是否提交到文件的逻辑状态,或者其他使用mmap或read访问文件的进程是否看到更改有关;它纯粹是fsync的一个类似物,应该被视为一个禁忌,除非是为了在电源故障或其他硬件级故障时确保数据完整性。

#6


-1  

According to the manpage,

根据从,

The file may not actually be updated until msync(2) or munmap() is called.

在msync(2)或munmap()被调用之前,文件可能不会被更新。

So you will need to make sure you call munmap() prior to exiting at the very least.

因此,您至少需要在退出之前调用munmap()。