网络数据包有效负载数据应该在适当的边界上对齐吗?

时间:2022-03-23 05:11:07

If you have the following class as a network packet payload:

如果您将以下类作为网络数据包负载:

class Payload { char field0; int field1; char field2; int field3; };

class Payload {char field0; int field1; char field2; int field3; };

Does using a class like Payload leave the recipient of the data susceptible to alignment issues when receiving the data over a socket? I would think that the class would either need to be reordered or add padding to ensure alignment.

在通过套接字接收数据时,使用像Payload这样的类是否会使数据的接收者容易出现对齐问题?我认为该类要么需要重新排序,要么添加填充以确保对齐。

Either reorder:

class Payload
{
    int  field1;
    int  field3;
    char field0;
    char field2;
};

or add padding:

或添加填充:

class Payload
{
    char  field0;
    char  pad[3];
    int   field1;
    char  field2;
    char  pad[3];
    int   field3; 
};

If reordering doesn't make sense for some reason, I would think adding the padding would be preferred since it would avoid alignment issues even though it would increase the size of the class.

如果由于某种原因重新排序没有意义,我认为添加填充将是首选,因为它会避免对齐问题,即使它会增加类的大小。

What is your experience with such alignment issues in network data?

您对网络数据中的这种对齐问题有什么经验?

6 个解决方案

#1


4  

You should look into Google protocol buffers, or Boost::serialize like another poster said.

您应该查看Google协议缓冲区,或者像另一张海报所说的那样调查Boost :: serialize。

If you want to roll your own, please do it right.

如果你想自己动手,请正确行事。

If you use types from stdint.h (ie: uint32_t, int8_t, etc.), and make sure every variable has "native alignment" (meaning its address is divisible evenly by its size (int8_ts are anywhere, uint16_ts are on even addresses, uint32_ts are on addresses divisble by 4), you won't have to worry about alignment or packing.

如果你使用stdint.h中的类型(即:uint32_t,int8_t等),并确保每个变量都有“native alignment”(意味着它的地址可以按其大小均匀分割(int8_ts在任何地方,uint16_ts在偶数地址上, uint32_ts位于4)可分割的地址上,您不必担心对齐或打包。

At a previous job we had all structures sent over our databus (ethernet or CANbus or byteflight or serial ports) defined in XML. There was a parser that would validate alignment on the variables within the structures (alerting you if someone wrote bad XML), and then generate header files for various platforms and languages to send and receive the structures. This worked really well for us, we never had to worry about hand-writing code to do message parsing or packing, and it was guaranteed that all platforms wouldn't have stupid little coding errors. Some of our datalink layers were pretty bandwidth constrained, so we implemented things like bitfields, with the parser generating the proper code for each platform. We also had enumerations, which was very nice (you'd be surprised how easy it is for a human to screw up coding bitfields on enumerations by hand).

在之前的工作中,我们通过XML定义的数据总线(以太网或CANbus或byteflight或串行端口)发送了所有结构。有一个解析器可以验证结构中变量的对齐情况(提醒您是否有人编写了错误的XML),然后为各种平台和语言生成头文件以发送和接收结构。这对我们来说非常有效,我们从来不必担心手写代码来进行消息解析或打包,并且保证所有平台都不会有愚蠢的小编码错误。我们的一些数据链路层受带宽限制,因此我们实现了像位域这样的功能,解析器为每个平台生成了正确的代码。我们也有一些非常好的枚举(你会惊讶于人类用手工编写枚举的比特字段是多么容易)。

Unless you need to worry about it running on 8051s and HC11s with C, or over data link layers that are very bandwidth constrained, you are not going to come up with something better than protocol buffers, you'll just spend a lot of time trying to be on par with them.

除非你需要担心它在8051s和带有C的HC11上运行,或者在带宽受限的数据链路层上运行,否则你不会想出比协议缓冲区更好的东西,你只需花费大量时间尝试与他们相提并论。

#2


8  

Correct, blindly ignoring alignment can cause problems. Even on the same operating system if 2 components were compiled with different compilers or different compiler versions.

正确,盲目地忽略对齐可能会导致问题。即使在同一操作系统上,如果使用不同的编译器或不同的编译器版本编译了2个组件。

It is better to...
1) Pass your data through some sort of serialization process.
2) Or pass each of your primitives individually, while still paying attention to byte ordering == Endianness

最好...... 1)通过某种序列化过程传递您的数据。 2)或者单独传递每个原语,同时仍然注意字节排序== Endianness

A good place to start would be Boost Serialization.

一个好的起点是Boost Serialization。

#3


4  

We use packed structures that are overlaid directly over the binary packet in memory today and I am rueing the day that I decided to do that. The only way that we have gotten this to work is by:

我们今天在内存中使用直接叠加在二进制数据包上的压缩结构,而我在决定这样做的那一天我正在懊恼。我们让这个工作的唯一方法是:

  1. carefully defining bit-width specific types based on the compilation environment (typedef unsigned int uint32_t)
  2. 根据编译环境仔细定义特定于位宽的类型(typedef unsigned int uint32_t)

  3. inserting the appropriate compiler-specific pragmas in to specify tight packing of structure members
  4. 插入适当的特定于编译器的编译指示以指定结构成员的紧密打包

  5. requiring that everything is in one byte order (use network or big-endian ordering)
  6. 要求所有内容都按一个字节顺序排列(使用网络或大端排序)

  7. carefully writing both the server and client code
  8. 仔细编写服务器和客户端代码

If you are just starting out, I would advise you to skip the whole mess of trying to represent what's on the wire with structures. Just serialize each primitive element separately. If you choose not to use an existing library like Boost Serialize or a middleware like TibCo, then save yourself a lot of headache by writing an abstraction around a binary buffer that hides the details of your serialization method. Aim for an interface like:

如果你刚刚开始,我会建议你跳过整个混乱,试图用结构来表示线上的东西。只需单独序列化每个原始元素。如果您选择不使用像Boost Serialize这样的现有库或者像TibCo这样的中间件,那么通过在二进制缓冲区周围编写抽象来隐藏序列化方法的细节,可以避免很多麻烦。瞄准如下界面:

class ByteBuffer {
public:
    ByteBuffer(uint8_t *bytes, size_t numBytes) {
        buffer_.assign(&bytes[0], &bytes[numBytes]);
    }
    void encode8Bits(uint8_t n);
    void encode16Bits(uint16_t n);
    //...
    void overwrite8BitsAt(unsigned offset, uint8_t n);
    void overwrite16BitsAt(unsigned offset, uint16_t n);
    //...
    void encodeString(std::string const& s);
    void encodeString(std::wstring const& s);

    uint8_t decode8BitsFrom(unsigned offset) const;
    uint16_t decode16BitsFrom(unsigned offset) const;
    //...
private:
    std::vector<uint8_t> buffer_;
};

The each of your packet classes would have a method to serialize to a ByteBuffer or be deserialized from a ByteBuffer and offset. This is one of those things that I absolutely wish that I could go back in time and correct. I cannot count the number of times that I have spent time debugging an issue that was caused by forgetting to swap bytes or not packing a struct.

每个数据包类都有一个序列化为ByteBuffer或从ByteBuffer和offset反序列化的方法。这是我绝对希望能够及时回归的事情之一。我无法计算我花时间调试因忘记交换字节或不打包结构而导致的问题的次数。

The other trap to avoid is using a union to represent bytes or memcpying to an unsigned char buffer to extract bytes. If you always use Big-Endian on the wire, then you can use simple code to write the bytes to the buffer and not worry about the htonl stuff:

要避免的另一个陷阱是使用union表示字节或memcpying到unsigned char缓冲区以提取字节。如果你总是在线上使用Big-Endian,那么你可以使用简单的代码将字节写入缓冲区而不用担心htonl的东西:

void ByteBuffer::encode8Bits(uint8_t n) {
    buffer_.push_back(n);
}
void ByteBuffer::encode16Bits(uint16_t n) {
    encode8Bits(uint8_t((n & 0xff00) >> 8));
    encode8Bits(uint8_t((n & 0x00ff)     ));
}
void ByteBuffer::encode32Bits(uint32_t n) {
    encode16Bits(uint16_t((n & 0xffff0000) >> 16));
    encode16Bits(uint16_t((n & 0x0000ffff)      ));
}
void ByteBuffer::encode64Bits(uint64_t n) {
    encode32Bits(uint32_t((n & 0xffffffff00000000) >> 32));
    encode32Bits(uint32_t((n & 0x00000000ffffffff)      ));
}

This remains nicely platform agnostic since the numerical representation is always logically Big-Endian. This code also lends itself very nicely to using templates based on the size of the primitive type (think encode<sizeof(val)>((unsigned char const*)&val))... not so pretty, but very, very easy to write and maintain.

这仍然很好地与平台无关,因为数值表示总是逻辑上是Big-Endian。这段代码也非常适合使用基于原始类型大小的模板(想想encode ((unsigned char const *)&val))...不是那么漂亮,但非常非常容易写作和维护。 (val)>

#4


2  

My experience is that the following approaches are to be preferred (in order of preference):

我的经验是,首选(按优先顺序)以下方法:

  1. Use a high level framework like Tibco, CORBA, DCOM or whatever that will manage all these issues for you.

    使用高级框架,如Tibco,CORBA,DCOM或任何可以为您管理所有这些问题的框架。

  2. Write your own libraries on both sides of the connection that are are aware of packing, byte order and other issues.

    在连接的两端编写自己的库,这些库可以识别打包,字节顺序和其他问题。

  3. Communicate only using string data.

    仅使用字符串数据进行通信。

Trying to send raw binary data without any mediation will almost certainly cause lots of problems.

试图在没有任何调解的情况下发送原始二进制数据几乎肯定会导致很多问题。

#5


1  

You practically can't use a class or structure for this if you want any sort of portability. In your example, the ints may be 32-bit or 64-bit depending on your system. You're most likely using a little endian machine, but the older Apple macs are big endian. The compiler is free to pad as it likes too.

如果你想要任何类型的可移植性,你实际上不能使用类或结构。在您的示例中,int可能是32位或64位,具体取决于您的系统。你最有可能使用一个小端机器,但较旧的Apple macs是大端。编译器也可以随意填充。

In general you'll need a method that writes each field to the buffer a byte at a time, after ensuring you get the byte order right with n2hll, n2hl or n2hs.

通常,在确保使用n2hll,n2hl或n2hs获得正确的字节顺序之后,您将需要一种方法将每个字段一次写入缓冲区。

#6


1  

If you don't have natural alignment in the structures, compilers will usually insert padding so that alignment is proper. If, however, you use pragmas to "pack" the structures (remove the padding), there can be very harmful side affects. On PowerPCs, non-aligned floats generate an exception. If you're working on an embedded system that doesn't handle that exception, you'll get a reset. If there is a routine to handle that interrupt, it can DRASTICALLY slow down your code, because it'll use a software routine to work around the misalignment, which will silently cripple your performance.

如果结构中没有自然对齐,编译器通常会插入填充以使对齐正确。但是,如果您使用编译指示“打包”结构(移除填充),则可能会产生非常有害的副作用。在PowerPC上,非对齐浮点数会生成异常。如果您正在处理不能处理该异常的嵌入式系统,您将获得重置。如果有一个例程来处理该中断,它可能会大大减慢您的代码速度,因为它将使用软件例程来解决错位,这将无声地削弱您的性能。

#1


4  

You should look into Google protocol buffers, or Boost::serialize like another poster said.

您应该查看Google协议缓冲区,或者像另一张海报所说的那样调查Boost :: serialize。

If you want to roll your own, please do it right.

如果你想自己动手,请正确行事。

If you use types from stdint.h (ie: uint32_t, int8_t, etc.), and make sure every variable has "native alignment" (meaning its address is divisible evenly by its size (int8_ts are anywhere, uint16_ts are on even addresses, uint32_ts are on addresses divisble by 4), you won't have to worry about alignment or packing.

如果你使用stdint.h中的类型(即:uint32_t,int8_t等),并确保每个变量都有“native alignment”(意味着它的地址可以按其大小均匀分割(int8_ts在任何地方,uint16_ts在偶数地址上, uint32_ts位于4)可分割的地址上,您不必担心对齐或打包。

At a previous job we had all structures sent over our databus (ethernet or CANbus or byteflight or serial ports) defined in XML. There was a parser that would validate alignment on the variables within the structures (alerting you if someone wrote bad XML), and then generate header files for various platforms and languages to send and receive the structures. This worked really well for us, we never had to worry about hand-writing code to do message parsing or packing, and it was guaranteed that all platforms wouldn't have stupid little coding errors. Some of our datalink layers were pretty bandwidth constrained, so we implemented things like bitfields, with the parser generating the proper code for each platform. We also had enumerations, which was very nice (you'd be surprised how easy it is for a human to screw up coding bitfields on enumerations by hand).

在之前的工作中,我们通过XML定义的数据总线(以太网或CANbus或byteflight或串行端口)发送了所有结构。有一个解析器可以验证结构中变量的对齐情况(提醒您是否有人编写了错误的XML),然后为各种平台和语言生成头文件以发送和接收结构。这对我们来说非常有效,我们从来不必担心手写代码来进行消息解析或打包,并且保证所有平台都不会有愚蠢的小编码错误。我们的一些数据链路层受带宽限制,因此我们实现了像位域这样的功能,解析器为每个平台生成了正确的代码。我们也有一些非常好的枚举(你会惊讶于人类用手工编写枚举的比特字段是多么容易)。

Unless you need to worry about it running on 8051s and HC11s with C, or over data link layers that are very bandwidth constrained, you are not going to come up with something better than protocol buffers, you'll just spend a lot of time trying to be on par with them.

除非你需要担心它在8051s和带有C的HC11上运行,或者在带宽受限的数据链路层上运行,否则你不会想出比协议缓冲区更好的东西,你只需花费大量时间尝试与他们相提并论。

#2


8  

Correct, blindly ignoring alignment can cause problems. Even on the same operating system if 2 components were compiled with different compilers or different compiler versions.

正确,盲目地忽略对齐可能会导致问题。即使在同一操作系统上,如果使用不同的编译器或不同的编译器版本编译了2个组件。

It is better to...
1) Pass your data through some sort of serialization process.
2) Or pass each of your primitives individually, while still paying attention to byte ordering == Endianness

最好...... 1)通过某种序列化过程传递您的数据。 2)或者单独传递每个原语,同时仍然注意字节排序== Endianness

A good place to start would be Boost Serialization.

一个好的起点是Boost Serialization。

#3


4  

We use packed structures that are overlaid directly over the binary packet in memory today and I am rueing the day that I decided to do that. The only way that we have gotten this to work is by:

我们今天在内存中使用直接叠加在二进制数据包上的压缩结构,而我在决定这样做的那一天我正在懊恼。我们让这个工作的唯一方法是:

  1. carefully defining bit-width specific types based on the compilation environment (typedef unsigned int uint32_t)
  2. 根据编译环境仔细定义特定于位宽的类型(typedef unsigned int uint32_t)

  3. inserting the appropriate compiler-specific pragmas in to specify tight packing of structure members
  4. 插入适当的特定于编译器的编译指示以指定结构成员的紧密打包

  5. requiring that everything is in one byte order (use network or big-endian ordering)
  6. 要求所有内容都按一个字节顺序排列(使用网络或大端排序)

  7. carefully writing both the server and client code
  8. 仔细编写服务器和客户端代码

If you are just starting out, I would advise you to skip the whole mess of trying to represent what's on the wire with structures. Just serialize each primitive element separately. If you choose not to use an existing library like Boost Serialize or a middleware like TibCo, then save yourself a lot of headache by writing an abstraction around a binary buffer that hides the details of your serialization method. Aim for an interface like:

如果你刚刚开始,我会建议你跳过整个混乱,试图用结构来表示线上的东西。只需单独序列化每个原始元素。如果您选择不使用像Boost Serialize这样的现有库或者像TibCo这样的中间件,那么通过在二进制缓冲区周围编写抽象来隐藏序列化方法的细节,可以避免很多麻烦。瞄准如下界面:

class ByteBuffer {
public:
    ByteBuffer(uint8_t *bytes, size_t numBytes) {
        buffer_.assign(&bytes[0], &bytes[numBytes]);
    }
    void encode8Bits(uint8_t n);
    void encode16Bits(uint16_t n);
    //...
    void overwrite8BitsAt(unsigned offset, uint8_t n);
    void overwrite16BitsAt(unsigned offset, uint16_t n);
    //...
    void encodeString(std::string const& s);
    void encodeString(std::wstring const& s);

    uint8_t decode8BitsFrom(unsigned offset) const;
    uint16_t decode16BitsFrom(unsigned offset) const;
    //...
private:
    std::vector<uint8_t> buffer_;
};

The each of your packet classes would have a method to serialize to a ByteBuffer or be deserialized from a ByteBuffer and offset. This is one of those things that I absolutely wish that I could go back in time and correct. I cannot count the number of times that I have spent time debugging an issue that was caused by forgetting to swap bytes or not packing a struct.

每个数据包类都有一个序列化为ByteBuffer或从ByteBuffer和offset反序列化的方法。这是我绝对希望能够及时回归的事情之一。我无法计算我花时间调试因忘记交换字节或不打包结构而导致的问题的次数。

The other trap to avoid is using a union to represent bytes or memcpying to an unsigned char buffer to extract bytes. If you always use Big-Endian on the wire, then you can use simple code to write the bytes to the buffer and not worry about the htonl stuff:

要避免的另一个陷阱是使用union表示字节或memcpying到unsigned char缓冲区以提取字节。如果你总是在线上使用Big-Endian,那么你可以使用简单的代码将字节写入缓冲区而不用担心htonl的东西:

void ByteBuffer::encode8Bits(uint8_t n) {
    buffer_.push_back(n);
}
void ByteBuffer::encode16Bits(uint16_t n) {
    encode8Bits(uint8_t((n & 0xff00) >> 8));
    encode8Bits(uint8_t((n & 0x00ff)     ));
}
void ByteBuffer::encode32Bits(uint32_t n) {
    encode16Bits(uint16_t((n & 0xffff0000) >> 16));
    encode16Bits(uint16_t((n & 0x0000ffff)      ));
}
void ByteBuffer::encode64Bits(uint64_t n) {
    encode32Bits(uint32_t((n & 0xffffffff00000000) >> 32));
    encode32Bits(uint32_t((n & 0x00000000ffffffff)      ));
}

This remains nicely platform agnostic since the numerical representation is always logically Big-Endian. This code also lends itself very nicely to using templates based on the size of the primitive type (think encode<sizeof(val)>((unsigned char const*)&val))... not so pretty, but very, very easy to write and maintain.

这仍然很好地与平台无关,因为数值表示总是逻辑上是Big-Endian。这段代码也非常适合使用基于原始类型大小的模板(想想encode ((unsigned char const *)&val))...不是那么漂亮,但非常非常容易写作和维护。 (val)>

#4


2  

My experience is that the following approaches are to be preferred (in order of preference):

我的经验是,首选(按优先顺序)以下方法:

  1. Use a high level framework like Tibco, CORBA, DCOM or whatever that will manage all these issues for you.

    使用高级框架,如Tibco,CORBA,DCOM或任何可以为您管理所有这些问题的框架。

  2. Write your own libraries on both sides of the connection that are are aware of packing, byte order and other issues.

    在连接的两端编写自己的库,这些库可以识别打包,字节顺序和其他问题。

  3. Communicate only using string data.

    仅使用字符串数据进行通信。

Trying to send raw binary data without any mediation will almost certainly cause lots of problems.

试图在没有任何调解的情况下发送原始二进制数据几乎肯定会导致很多问题。

#5


1  

You practically can't use a class or structure for this if you want any sort of portability. In your example, the ints may be 32-bit or 64-bit depending on your system. You're most likely using a little endian machine, but the older Apple macs are big endian. The compiler is free to pad as it likes too.

如果你想要任何类型的可移植性,你实际上不能使用类或结构。在您的示例中,int可能是32位或64位,具体取决于您的系统。你最有可能使用一个小端机器,但较旧的Apple macs是大端。编译器也可以随意填充。

In general you'll need a method that writes each field to the buffer a byte at a time, after ensuring you get the byte order right with n2hll, n2hl or n2hs.

通常,在确保使用n2hll,n2hl或n2hs获得正确的字节顺序之后,您将需要一种方法将每个字段一次写入缓冲区。

#6


1  

If you don't have natural alignment in the structures, compilers will usually insert padding so that alignment is proper. If, however, you use pragmas to "pack" the structures (remove the padding), there can be very harmful side affects. On PowerPCs, non-aligned floats generate an exception. If you're working on an embedded system that doesn't handle that exception, you'll get a reset. If there is a routine to handle that interrupt, it can DRASTICALLY slow down your code, because it'll use a software routine to work around the misalignment, which will silently cripple your performance.

如果结构中没有自然对齐,编译器通常会插入填充以使对齐正确。但是,如果您使用编译指示“打包”结构(移除填充),则可能会产生非常有害的副作用。在PowerPC上,非对齐浮点数会生成异常。如果您正在处理不能处理该异常的嵌入式系统,您将获得重置。如果有一个例程来处理该中断,它可能会大大减慢您的代码速度,因为它将使用软件例程来解决错位,这将无声地削弱您的性能。