对多个布尔值使用数组或位访问是否更快?

时间:2022-12-05 10:00:05

1) On a 32-bit CPU is it faster to acccess an array of 32 boolean values or to access the 32 bits within one word? (Assume we want to check the value of the Nth element and can use either a bit-mask (Nth bit is set) or the integer N as an array index.)

1)在32位CPU上,更快地获得32个布尔值的数组或访问一个字内的32位? (假设我们要检查第N个元素的值,并且可以使用位掩码(第N位设置)或整数N作为数组索引。)

It seems to me that the array would be faster because all common computer architectures natively work at the word level (32 bits, 64 bits, etc., processed in parallel) and accessing the sub-word bits takes extra work.

在我看来,阵列会更快,因为所有常见的计算机架构本身都在字级(32位,64位等,并行处理)工作,访问子字位需要额外的工作。

I know different compilers will represent things differently, but it seems that the underlying hardware architecture would dictate the answer. Or does the answer depend on the language and compiler?

我知道不同的编译器会以不同的方式表示事物,但似乎底层的硬件架构会决定答案。或者答案取决于语言和编译器?

And, 2) Is the speed answer reversed if this array represents a state that I pass between client and server? This question came to mind when reading question "How use bit/bit-operator to control object state?"

并且,2)如果此数组表示我在客户端和服务器之间传递的状态,速度应答是否会反转?在阅读问题“如何使用位/位操作符来控制对象状态?”时会想到这个问题。

P.S. Yes, I could write code to test this myself, but then the SO community wouldn't get to play along!

附:是的,我可以编写代码来自己测试,但是SO社区不会参与其中!

6 个解决方案

#1


For question #1: Yes, on most 32-bit platforms, an array of boolean values should be faster, because you will just be loading each 32-bit-aligned value in the array and testing it against 0. If you use a single word, you will have all that work plus the overhead of bit-fiddling.

对于问题#1:是的,在大多数32位平台上,布尔值数组应该更快,因为您只需加载数组中的每个32位对齐值并对其进行测试。如果使用单个换句话说,你将拥有所有的工作加上比特摆弄的开销。

For question #2: Again, yes, since sending data over a network is significantly slower than operating on data in the CPU and main memory, the overhead of sending even one word will strongly outweigh any performance gain or loss you get by aligning words or bit fiddling.

对于问题#2:同样,是的,因为通过网络发送数据比对CPU和主存储器中的数据进行操作要慢得多,发送一个字的开销将大大超过通过对齐单词或通过对齐单词获得的任何性能增益或损失。有点小提琴。

#2


Bear in mind that a theoretically faster solution that doesn't fit into a cache line might be slower than a theoretically slower one that does, depending on a whole host of things. If this is actually something that needs to be fast, as determined by profiling, test both ways and see. If it doesn't, do whatever looks like cleaner code, which is probably the array.

请记住,理论上更快的解决方案不适合缓存行可能比理论上更慢的解决方案慢,取决于一大堆东西。如果这实际上是需要快速的东西,如通过分析确定的那样,测试两种方式并查看。如果没有,做任何看起来像更干净的代码,可能是数组。

#3


It depends on the compiler and the access patterns and the platform. Raymond Chen has an excellent cost-benefit analysis: http://blogs.msdn.com/oldnewthing/archive/2008/11/26/9143050.aspx .

它取决于编译器和访问模式以及平台。 Raymond Chen有一个很好的成本效益分析:http://blogs.msdn.com/oldnewthing/archive/2008/11/26/9143050.aspx。

Even on non x86 platforms the use of bits can be prohibitive as at least one PPC platform out there uses microcoded instructions to perform a variable shift which can do nasty things with other hardware threads.

即使在非x86平台上,比特的使用也是令人望而却步的,因为至少有一个PPC平台使用微编码指令来执行变量移位,这可能与其他硬件线程做出讨厌的事情。

So it can be a win, but you need to understand the context in which it will be good and bad. (Which is a general thing anyway.)

所以这可能是一场胜利,但你需要了解它的优劣背景。 (无论如何,这是一般性的。)

#4


This is the code generated by 0 != (value & (1 << index)) to test a bit:

这是由0!=(value&(1 << index))生成的代码来测试一下:

00401000  mov         eax,1 
00401005  shl         eax,cl 
00401007  and         eax,1 

And this by values[index] to test a bool[]:

这通过值[index]来测试bool []:

00401000  movzx       eax,byte ptr [ecx+eax]

Can't figure out how to put a loop around it that doesn't get optimized away, I'll vote bool[].

无法弄清楚如何在它周围放置一个没有得到优化的循环,我会投票给bool []。

#5


If you are going to check more than one value at a time, doing it in parallel will obviously be faster. If you're only checking one value, it's probably the same.

如果您要一次检查多个值,那么并行执行该操作显然会更快。如果你只检查一个值,它可能是相同的。

If you need a better answer than that, write some tests and get back to us.

如果您需要更好的答案,请写一些测试并回复我们。

#6


I think a byte array is probably better than a full-word array for simple random access.

我认为对于简单的随机访问,字节数组可能比全字数组更好。

It will give better cache locality than using the full word size, and I don't think byte access is any slower on most/all common architectures.

与使用全字大小相比,它将提供更好的缓存局部性,我认为大多数/所有常见架构上的字节访问都不会慢。

#1


For question #1: Yes, on most 32-bit platforms, an array of boolean values should be faster, because you will just be loading each 32-bit-aligned value in the array and testing it against 0. If you use a single word, you will have all that work plus the overhead of bit-fiddling.

对于问题#1:是的,在大多数32位平台上,布尔值数组应该更快,因为您只需加载数组中的每个32位对齐值并对其进行测试。如果使用单个换句话说,你将拥有所有的工作加上比特摆弄的开销。

For question #2: Again, yes, since sending data over a network is significantly slower than operating on data in the CPU and main memory, the overhead of sending even one word will strongly outweigh any performance gain or loss you get by aligning words or bit fiddling.

对于问题#2:同样,是的,因为通过网络发送数据比对CPU和主存储器中的数据进行操作要慢得多,发送一个字的开销将大大超过通过对齐单词或通过对齐单词获得的任何性能增益或损失。有点小提琴。

#2


Bear in mind that a theoretically faster solution that doesn't fit into a cache line might be slower than a theoretically slower one that does, depending on a whole host of things. If this is actually something that needs to be fast, as determined by profiling, test both ways and see. If it doesn't, do whatever looks like cleaner code, which is probably the array.

请记住,理论上更快的解决方案不适合缓存行可能比理论上更慢的解决方案慢,取决于一大堆东西。如果这实际上是需要快速的东西,如通过分析确定的那样,测试两种方式并查看。如果没有,做任何看起来像更干净的代码,可能是数组。

#3


It depends on the compiler and the access patterns and the platform. Raymond Chen has an excellent cost-benefit analysis: http://blogs.msdn.com/oldnewthing/archive/2008/11/26/9143050.aspx .

它取决于编译器和访问模式以及平台。 Raymond Chen有一个很好的成本效益分析:http://blogs.msdn.com/oldnewthing/archive/2008/11/26/9143050.aspx。

Even on non x86 platforms the use of bits can be prohibitive as at least one PPC platform out there uses microcoded instructions to perform a variable shift which can do nasty things with other hardware threads.

即使在非x86平台上,比特的使用也是令人望而却步的,因为至少有一个PPC平台使用微编码指令来执行变量移位,这可能与其他硬件线程做出讨厌的事情。

So it can be a win, but you need to understand the context in which it will be good and bad. (Which is a general thing anyway.)

所以这可能是一场胜利,但你需要了解它的优劣背景。 (无论如何,这是一般性的。)

#4


This is the code generated by 0 != (value & (1 << index)) to test a bit:

这是由0!=(value&(1 << index))生成的代码来测试一下:

00401000  mov         eax,1 
00401005  shl         eax,cl 
00401007  and         eax,1 

And this by values[index] to test a bool[]:

这通过值[index]来测试bool []:

00401000  movzx       eax,byte ptr [ecx+eax]

Can't figure out how to put a loop around it that doesn't get optimized away, I'll vote bool[].

无法弄清楚如何在它周围放置一个没有得到优化的循环,我会投票给bool []。

#5


If you are going to check more than one value at a time, doing it in parallel will obviously be faster. If you're only checking one value, it's probably the same.

如果您要一次检查多个值,那么并行执行该操作显然会更快。如果你只检查一个值,它可能是相同的。

If you need a better answer than that, write some tests and get back to us.

如果您需要更好的答案,请写一些测试并回复我们。

#6


I think a byte array is probably better than a full-word array for simple random access.

我认为对于简单的随机访问,字节数组可能比全字数组更好。

It will give better cache locality than using the full word size, and I don't think byte access is any slower on most/all common architectures.

与使用全字大小相比,它将提供更好的缓存局部性,我认为大多数/所有常见架构上的字节访问都不会慢。