最有效的方法是在C数组中进行比特操作!

时间:2021-09-22 20:14:05

I have a C array like:

我有一个C数组,比如

char byte_array[10];

And another one that acts as a mask:

还有一个面具的角色:

char byte_mask[10];

I would like to do get another array that is the result from the first one plus the second one using a bitwise operation, on each byte.

我想要得到另一个数组,这是第一个数组的结果加上第二个,在每个字节上使用一个位运算。

What's the most efficient way to do this?

最有效的方法是什么?

thanks for your answers.

谢谢你的答案。

3 个解决方案

#1


13  

for ( i = 10 ; i-- > 0 ; )
    result_array[i] = byte_array[i] & byte_mask[i];
  • Going backwards pre-loads processor cache-lines.
  • 向后预加载处理器缓存线。
  • Including the decrement in the compare can save some instructions.
  • 包括比较中的减量可以节省一些说明。

This will work for all arrays and processors. However, if you know your arrays are word-aligned, a faster method is to cast to a larger type and do the same calculation.

这将适用于所有的数组和处理器。但是,如果您知道您的数组是字对齐的,那么更快的方法是转换为更大的类型并进行相同的计算。

For example, let's say n=16 instead of n=10. Then this would be much faster:

例如,假设n=16,而不是n=10。这样就快多了

uint32_t* input32 = (uint32_t*)byte_array;
uint32_t* mask32 = (uint32_t*)byte_mask;
uint32_t* result32 = (uint32_t*)result_array;
for ( i = 4 ; i-- > 0 ; )
    result32[i] = input32[i] & mask32[i];

(Of course you need a proper type for uint32_t, and if n is not a power of 2 you need to clean up the beginning and/or ending so that the 32-bit stuff is aligned.)

(当然,你需要一个合适的uint32_t类型,如果n不是2的幂,你需要清理开始和/或结束,以便32位的东西对齐。)

Variation: The question specifically calls for the results to be placed in a separate array, however it would almost certainly be faster to modify the input array in-place.

变化:这个问题特别要求将结果放置在一个单独的数组中,但是几乎可以肯定的是,它可以更快地对输入数组进行修改。

#2


5  

If you want to make it faster, make sure that byte_array has length that is multiple of 4 (8 on 64-bit machines), and then:

如果您想更快地实现它,请确保byte_array的长度是4(64位机器上的8)的倍数,然后:

char byte_array[12];
char byte_mask[12];
/* Checks for proper alignment */
assert(((unsigned int)(void *)byte_array) & 3 == 0);
assert(((unsigned int)(void *)byte_mask) & 3 == 0);
for (i = 0; i < (10+3)/4; i++) {
  ((unsigned int *)(byte_array))[i] &= ((unsigned int *)(byte_mask))[i];
}

This is much faster than doing it byte per byte.

这比以字节为单位字节要快得多。

(Note that this is in-place mutation; if you want to keep the original byte_array also, then you obviously need to store the results in another array instead.)

(注意这是就地突变;如果您还想保留原来的byte_array,那么显然需要将结果存储在另一个数组中。

#3


1  

\#define CHAR_ARRAY_SIZE    (10)
\#define INT_ARRAY_SIZE     ((CHAR_ARRAY_SIZE/ (sizeof (unsigned int)) + 1)

typedef union _arr_tag_ {

    char          byte_array [CHAR_ARRAY_SIZE];
    unsigned int  int_array [INT_ARRAY_SIZE]; 

} arr_tag;

Now int_array for masking. This might work for both 32bit and 64 bit processors.

现在int_array掩蔽。这可能对32位和64位处理器都有效。

arr_tag arr_src, arr_result, arr_mask;

for (int i = 0; i < INT_ARRAY_SIZE; i ++) {
    arr_result.int_array [i] = arr_src.int_array[i] & arr_mask.int_array [i];
}

Try this, code might also look clean.

试试这个,代码看起来也很干净。

#1


13  

for ( i = 10 ; i-- > 0 ; )
    result_array[i] = byte_array[i] & byte_mask[i];
  • Going backwards pre-loads processor cache-lines.
  • 向后预加载处理器缓存线。
  • Including the decrement in the compare can save some instructions.
  • 包括比较中的减量可以节省一些说明。

This will work for all arrays and processors. However, if you know your arrays are word-aligned, a faster method is to cast to a larger type and do the same calculation.

这将适用于所有的数组和处理器。但是,如果您知道您的数组是字对齐的,那么更快的方法是转换为更大的类型并进行相同的计算。

For example, let's say n=16 instead of n=10. Then this would be much faster:

例如,假设n=16,而不是n=10。这样就快多了

uint32_t* input32 = (uint32_t*)byte_array;
uint32_t* mask32 = (uint32_t*)byte_mask;
uint32_t* result32 = (uint32_t*)result_array;
for ( i = 4 ; i-- > 0 ; )
    result32[i] = input32[i] & mask32[i];

(Of course you need a proper type for uint32_t, and if n is not a power of 2 you need to clean up the beginning and/or ending so that the 32-bit stuff is aligned.)

(当然,你需要一个合适的uint32_t类型,如果n不是2的幂,你需要清理开始和/或结束,以便32位的东西对齐。)

Variation: The question specifically calls for the results to be placed in a separate array, however it would almost certainly be faster to modify the input array in-place.

变化:这个问题特别要求将结果放置在一个单独的数组中,但是几乎可以肯定的是,它可以更快地对输入数组进行修改。

#2


5  

If you want to make it faster, make sure that byte_array has length that is multiple of 4 (8 on 64-bit machines), and then:

如果您想更快地实现它,请确保byte_array的长度是4(64位机器上的8)的倍数,然后:

char byte_array[12];
char byte_mask[12];
/* Checks for proper alignment */
assert(((unsigned int)(void *)byte_array) & 3 == 0);
assert(((unsigned int)(void *)byte_mask) & 3 == 0);
for (i = 0; i < (10+3)/4; i++) {
  ((unsigned int *)(byte_array))[i] &= ((unsigned int *)(byte_mask))[i];
}

This is much faster than doing it byte per byte.

这比以字节为单位字节要快得多。

(Note that this is in-place mutation; if you want to keep the original byte_array also, then you obviously need to store the results in another array instead.)

(注意这是就地突变;如果您还想保留原来的byte_array,那么显然需要将结果存储在另一个数组中。

#3


1  

\#define CHAR_ARRAY_SIZE    (10)
\#define INT_ARRAY_SIZE     ((CHAR_ARRAY_SIZE/ (sizeof (unsigned int)) + 1)

typedef union _arr_tag_ {

    char          byte_array [CHAR_ARRAY_SIZE];
    unsigned int  int_array [INT_ARRAY_SIZE]; 

} arr_tag;

Now int_array for masking. This might work for both 32bit and 64 bit processors.

现在int_array掩蔽。这可能对32位和64位处理器都有效。

arr_tag arr_src, arr_result, arr_mask;

for (int i = 0; i < INT_ARRAY_SIZE; i ++) {
    arr_result.int_array [i] = arr_src.int_array[i] & arr_mask.int_array [i];
}

Try this, code might also look clean.

试试这个,代码看起来也很干净。

相关文章