在x86-64中访问32位整数数组是否存在性能损失?

时间:2022-09-01 10:01:45

Sorry if the question sounds stupid. I'm only vaguely cognizant of the issue of data alignment and have never done any 64-bit programming. I'm working on some 32-bit x86 code right now. It frequently accesses an array of int. Sometimes one 32-bit integer is read. Sometimes two or more are read. At some point I'd like to make the code 64-bit. What I'm not sure is whether I should declare this int array as int or long int. I would rather keep the width of the integer the same, so I don't have to worry about differences. I'm sort of worried though that reading/writing off an address that isn't aligned to the natural word might be slow.

对不起,如果这个问题听起来很愚蠢。我只是模糊地认识到数据对齐的问题,从来没有做过任何64位编程。我现在正在处理一些32位的x86代码。它经常访问整数数组。有时读取一个32位的整数。有时会读两个或更多。在某种程度上,我希望使代码64位。我不确定的是我应该将这个int数组声明为int还是long int,我宁愿保持整数的宽度不变,这样我就不用担心差异了。我有点担心读/写不符合自然语言的地址可能很慢。

4 个解决方案

#1


7  

Misalignment penalties only occur when the load or store crosses an alignment boundary. The boundary is usually the smaller of:

只有当装载或存储跨越对齐边界时才会出现不对齐的惩罚。边界通常是:

  • The natural word-size of the hardware. (32-bits or 64-bit*)
  • 硬件的自然字大小。(32位或64位*)
  • The size of the data-type.
  • 数据类型的大小。

If you're loading a 4-byte word on a 64-bit (8-byte) architecture. It does not need to be 8-byte aligned. It only needs to be 4-byte aligned.

如果在64位(8字节)体系结构上加载一个4字节的字。它不需要8字节对齐。它只需要4字节对齐。

Likewise, if you're loading a 1-byte char on any machine, it doesn't need to be aligned at all.

同样地,如果在任何机器上加载一个1字节的字符,它根本不需要对齐。

*Note that SIMD vectors can imply a larger natural word-size. For example, 16-byte SSE still requires 16-byte alignment on both x86 and x64. (barring explicit misaligned loads/stores)

*注意,SIMD矢量可以暗示更大的自然字号。例如,在x86和x64上,16字节的SSE仍然需要16字节的对齐。(除非明确偏差加载/存储)


So in short, no you don't have to worry about data-alignment. The language and the compiler tries pretty hard to prevent you from having to worry about it.

简而言之,不,你不需要担心数据对齐。语言和编译器非常努力地阻止你去担心它。

So just stick with whatever datatype makes the most sense for you.

所以不管什么数据类型对你来说都是最合理的。

#2


3  

64-bit x86 CPUs are still heavily optimized for efficient manipulation of 32-bit values. Even on 64-bit operating systems, accessing 32-bit values is at least as fast as accessing 64-bit values. In practice, it will actually be faster because less cache space and memory bandwidth is consumed.

64位x86 cpu仍然对32位值的有效操作进行了大量优化。即使在64位操作系统上,访问32位值至少与访问64位值一样快。实际上,它会更快,因为缓存空间和内存带宽消耗更少。

#3


1  

There is a lot of good information available here: Performance 32 bit vs. 64 bit arithmetic

这里有很多有用的信息:性能32位和64位算法

Even more information https://superuser.com/questions/56540/32-bit-vs-64-bit-systems, where the answer claims to have seen the worst slow down at 5% (from an application perspective, not individual operations).

更详细的信息是https://superuser.com/questions/56540/32bit - vss-64位系统,在这些系统中,最糟糕的慢速是5%(从应用程序的角度来看,不是单个操作)。

The short answer is no, you won't take a performance hit.

简短的回答是不,你不会受到表演的影响。

#4


1  

Whenever you access any memory location an entire cache line is read into L1 cache, and any subsequent access to anything in that line is as fast as possible. Unless your 32-bit access crosses a cache line (which it won't if it's on a 32-bit alignment) it will be as fast as a 64-bit access.

无论何时访问任何内存位置,都会将整个缓存行读入L1缓存,并且对该行中任何内容的后续访问都尽可能快。除非您的32位访问跨越了缓存线(如果是32位对齐的话就不会),否则它将和64位访问一样快。

#1


7  

Misalignment penalties only occur when the load or store crosses an alignment boundary. The boundary is usually the smaller of:

只有当装载或存储跨越对齐边界时才会出现不对齐的惩罚。边界通常是:

  • The natural word-size of the hardware. (32-bits or 64-bit*)
  • 硬件的自然字大小。(32位或64位*)
  • The size of the data-type.
  • 数据类型的大小。

If you're loading a 4-byte word on a 64-bit (8-byte) architecture. It does not need to be 8-byte aligned. It only needs to be 4-byte aligned.

如果在64位(8字节)体系结构上加载一个4字节的字。它不需要8字节对齐。它只需要4字节对齐。

Likewise, if you're loading a 1-byte char on any machine, it doesn't need to be aligned at all.

同样地,如果在任何机器上加载一个1字节的字符,它根本不需要对齐。

*Note that SIMD vectors can imply a larger natural word-size. For example, 16-byte SSE still requires 16-byte alignment on both x86 and x64. (barring explicit misaligned loads/stores)

*注意,SIMD矢量可以暗示更大的自然字号。例如,在x86和x64上,16字节的SSE仍然需要16字节的对齐。(除非明确偏差加载/存储)


So in short, no you don't have to worry about data-alignment. The language and the compiler tries pretty hard to prevent you from having to worry about it.

简而言之,不,你不需要担心数据对齐。语言和编译器非常努力地阻止你去担心它。

So just stick with whatever datatype makes the most sense for you.

所以不管什么数据类型对你来说都是最合理的。

#2


3  

64-bit x86 CPUs are still heavily optimized for efficient manipulation of 32-bit values. Even on 64-bit operating systems, accessing 32-bit values is at least as fast as accessing 64-bit values. In practice, it will actually be faster because less cache space and memory bandwidth is consumed.

64位x86 cpu仍然对32位值的有效操作进行了大量优化。即使在64位操作系统上,访问32位值至少与访问64位值一样快。实际上,它会更快,因为缓存空间和内存带宽消耗更少。

#3


1  

There is a lot of good information available here: Performance 32 bit vs. 64 bit arithmetic

这里有很多有用的信息:性能32位和64位算法

Even more information https://superuser.com/questions/56540/32-bit-vs-64-bit-systems, where the answer claims to have seen the worst slow down at 5% (from an application perspective, not individual operations).

更详细的信息是https://superuser.com/questions/56540/32bit - vss-64位系统,在这些系统中,最糟糕的慢速是5%(从应用程序的角度来看,不是单个操作)。

The short answer is no, you won't take a performance hit.

简短的回答是不,你不会受到表演的影响。

#4


1  

Whenever you access any memory location an entire cache line is read into L1 cache, and any subsequent access to anything in that line is as fast as possible. Unless your 32-bit access crosses a cache line (which it won't if it's on a 32-bit alignment) it will be as fast as a 64-bit access.

无论何时访问任何内存位置,都会将整个缓存行读入L1缓存,并且对该行中任何内容的后续访问都尽可能快。除非您的32位访问跨越了缓存线(如果是32位对齐的话就不会),否则它将和64位访问一样快。