如何处理二进制文件格式的可移植性问题

I'm designing a binary file format to store strings[without terminating null to save space] and binary data.

我正在设计一个二进制文件格式来存储字符串[不终止null以节省空间]和二进制数据。

i. What is the best way to deal with little/big endian systems? i.a Would converting everything to network byte order and back with ntohl()/htonl() work?

一世。处理小/大端系统的最佳方法是什么? i.a将所有内容转换为网络字节顺序并返回ntohl()/ htonl()工作?

ii. Will the packed structures be the same size on x86, x64 and arm?

II。在x86,x64和arm上,打包结构的大小是否相同?

iii. Are their any inherent weakness with this approach?

III。这种方法有任何内在的弱点吗?

struct __attribute__((packed)) Header {
    uint8_t magic;
    uint8_t flags;
};

struct __attribute__((packed)) Record {
    uint64_t length;
    uint32_t crc;
    uint16_t year;
    uint8_t day;
    uint8_t month;
    uint8_t hour;
    uint8_t minute;
    uint8_t second;
    uint8_t type;
};

Tester code I'm using the develop the format:

测试器代码我正在使用开发格式:

#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <limits.h>
#include <strings.h>
#include <stdint.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>

struct __attribute__((packed)) Header {
    uint8_t magic;
    uint8_t flags;
};

struct __attribute__((packed)) Record {
    uint64_t length;
    uint32_t crc;
    uint16_t year;
    uint8_t day;
    uint8_t month;
    uint8_t hour;
    uint8_t minute;
    uint8_t second;
    uint8_t type;
};

    int main(void)
    {

        int fd = open("test.dat", O_RDWR|O_APPEND|O_CREAT, 444);
        struct Header header = {1, 0};
        write(fd, &header, sizeof(header));
        char msg[] = {"BINARY"};
        struct Record record = {strlen(msg), 0, 0, 0, 0, 0, 0, 0};
        write(fd, &record, sizeof(record));
        write(fd, msg, record.length);
        close(fd);
        fd = open("test.dat", O_RDWR|O_APPEND|O_CREAT, 444);


        read(fd, &header, sizeof(struct Header));
        read(fd, &record, sizeof(struct Record));
        int len = record.length;
        char c;
        while (len != 0) {
            read(fd, &c, 1);
            len--;
            printf("%c", c);
        }
        close(fd);
    }

1 个解决方案

#1

i. Defining the file to be in one order and converting to and from "internal" order, if necessary, when reading/writing (perhaps with ntohl and the like) is, in my opinion, the best approach.

一世。在我看来,在我看来,将文件定义为一个顺序并转换为“内部”顺序,以及在读取/写入时(可能使用ntohl等)转换为最佳方法。

ii. I do not trust packed structures. They might work for this approach for those platforms, but there are no guarantees.

II。我不相信包装结构。他们可能会为这些平台的这种方法工作,但没有任何保证。

iii. Reading and writing binary files using fread and fwrite on whole structs is (again in my opinion) an inherently weak approach. You maximize the likelihood that you will be bitten by word size problems, padding and alignment problems, and byte order problems.

III。在整个结构上使用fread和fwrite读取和写入二进制文件(在我看来)是一种固有的弱方法。您最大限度地提高了被字大小问题,填充和对齐问题以及字节顺序问题所困扰的可能性。

What I like to do is write little functions like get16() and put32() that read and write a byte at a time and so are inherently insensitive to word size and byte order difficulties. Then I write straightforward putHeader and getRecord functions (and the like) in terms of these.

我喜欢写的是像get16()和put32()这样的小函数,它们一次读写一个字节,因此对字大小和字节顺序的困难本身就不敏感。然后我根据这些来编写简单的putHeader和getRecord函数(等等)。

unsigned int get16(FILE *fp)
{
    unsigned int r;
    r = getc(fp);
    r = (r << 8) | getc(fp);
    return r;
}

void put32(unsigned long int x, FILE *fp)
{
    putc((int)((x >> 24) & 0xff), fp);
    putc((int)((x >> 16) & 0xff), fp);
    putc((int)((x >> 8) & 0xff), fp);
    putc((int)(x & 0xff), fp);
}

[P.S. As @Olaf correctly points out in one of the comments, in production code you'd need handling for EOF and error in these functions. I've left those out for simplicity of presentation.]

[附注:正如@Olaf在其中一条评论中正确指出的那样,在生产代码中,您需要处理EOF并在这些函数中出错。为了简单起见,我把它们留了下来。]

#1