I have a C array like:
我有一个C数组,比如
char byte_array[10];
And another one that acts as a mask:
还有一个面具的角色:
char byte_mask[10];
I would like to do get another array that is the result from the first one plus the second one using a bitwise operation, on each byte.
我想要得到另一个数组,这是第一个数组的结果加上第二个,在每个字节上使用一个位运算。
What's the most efficient way to do this?
最有效的方法是什么?
thanks for your answers.
谢谢你的答案。
3 个解决方案
#1
13
for ( i = 10 ; i-- > 0 ; )
result_array[i] = byte_array[i] & byte_mask[i];
- Going backwards pre-loads processor cache-lines.
- 向后预加载处理器缓存线。
- Including the decrement in the compare can save some instructions.
- 包括比较中的减量可以节省一些说明。
This will work for all arrays and processors. However, if you know your arrays are word-aligned, a faster method is to cast to a larger type and do the same calculation.
这将适用于所有的数组和处理器。但是,如果您知道您的数组是字对齐的,那么更快的方法是转换为更大的类型并进行相同的计算。
For example, let's say n=16
instead of n=10
. Then this would be much faster:
例如,假设n=16,而不是n=10。这样就快多了
uint32_t* input32 = (uint32_t*)byte_array;
uint32_t* mask32 = (uint32_t*)byte_mask;
uint32_t* result32 = (uint32_t*)result_array;
for ( i = 4 ; i-- > 0 ; )
result32[i] = input32[i] & mask32[i];
(Of course you need a proper type for uint32_t
, and if n
is not a power of 2 you need to clean up the beginning and/or ending so that the 32-bit stuff is aligned.)
(当然,你需要一个合适的uint32_t类型,如果n不是2的幂,你需要清理开始和/或结束,以便32位的东西对齐。)
Variation: The question specifically calls for the results to be placed in a separate array, however it would almost certainly be faster to modify the input array in-place.
变化:这个问题特别要求将结果放置在一个单独的数组中,但是几乎可以肯定的是,它可以更快地对输入数组进行修改。
#2
5
If you want to make it faster, make sure that byte_array has length that is multiple of 4 (8 on 64-bit machines), and then:
如果您想更快地实现它,请确保byte_array的长度是4(64位机器上的8)的倍数,然后:
char byte_array[12];
char byte_mask[12];
/* Checks for proper alignment */
assert(((unsigned int)(void *)byte_array) & 3 == 0);
assert(((unsigned int)(void *)byte_mask) & 3 == 0);
for (i = 0; i < (10+3)/4; i++) {
((unsigned int *)(byte_array))[i] &= ((unsigned int *)(byte_mask))[i];
}
This is much faster than doing it byte per byte.
这比以字节为单位字节要快得多。
(Note that this is in-place mutation; if you want to keep the original byte_array also, then you obviously need to store the results in another array instead.)
(注意这是就地突变;如果您还想保留原来的byte_array,那么显然需要将结果存储在另一个数组中。
#3
1
\#define CHAR_ARRAY_SIZE (10)
\#define INT_ARRAY_SIZE ((CHAR_ARRAY_SIZE/ (sizeof (unsigned int)) + 1)
typedef union _arr_tag_ {
char byte_array [CHAR_ARRAY_SIZE];
unsigned int int_array [INT_ARRAY_SIZE];
} arr_tag;
Now int_array for masking. This might work for both 32bit and 64 bit processors.
现在int_array掩蔽。这可能对32位和64位处理器都有效。
arr_tag arr_src, arr_result, arr_mask;
for (int i = 0; i < INT_ARRAY_SIZE; i ++) {
arr_result.int_array [i] = arr_src.int_array[i] & arr_mask.int_array [i];
}
Try this, code might also look clean.
试试这个,代码看起来也很干净。
#1
13
for ( i = 10 ; i-- > 0 ; )
result_array[i] = byte_array[i] & byte_mask[i];
- Going backwards pre-loads processor cache-lines.
- 向后预加载处理器缓存线。
- Including the decrement in the compare can save some instructions.
- 包括比较中的减量可以节省一些说明。
This will work for all arrays and processors. However, if you know your arrays are word-aligned, a faster method is to cast to a larger type and do the same calculation.
这将适用于所有的数组和处理器。但是,如果您知道您的数组是字对齐的,那么更快的方法是转换为更大的类型并进行相同的计算。
For example, let's say n=16
instead of n=10
. Then this would be much faster:
例如,假设n=16,而不是n=10。这样就快多了
uint32_t* input32 = (uint32_t*)byte_array;
uint32_t* mask32 = (uint32_t*)byte_mask;
uint32_t* result32 = (uint32_t*)result_array;
for ( i = 4 ; i-- > 0 ; )
result32[i] = input32[i] & mask32[i];
(Of course you need a proper type for uint32_t
, and if n
is not a power of 2 you need to clean up the beginning and/or ending so that the 32-bit stuff is aligned.)
(当然,你需要一个合适的uint32_t类型,如果n不是2的幂,你需要清理开始和/或结束,以便32位的东西对齐。)
Variation: The question specifically calls for the results to be placed in a separate array, however it would almost certainly be faster to modify the input array in-place.
变化:这个问题特别要求将结果放置在一个单独的数组中,但是几乎可以肯定的是,它可以更快地对输入数组进行修改。
#2
5
If you want to make it faster, make sure that byte_array has length that is multiple of 4 (8 on 64-bit machines), and then:
如果您想更快地实现它,请确保byte_array的长度是4(64位机器上的8)的倍数,然后:
char byte_array[12];
char byte_mask[12];
/* Checks for proper alignment */
assert(((unsigned int)(void *)byte_array) & 3 == 0);
assert(((unsigned int)(void *)byte_mask) & 3 == 0);
for (i = 0; i < (10+3)/4; i++) {
((unsigned int *)(byte_array))[i] &= ((unsigned int *)(byte_mask))[i];
}
This is much faster than doing it byte per byte.
这比以字节为单位字节要快得多。
(Note that this is in-place mutation; if you want to keep the original byte_array also, then you obviously need to store the results in another array instead.)
(注意这是就地突变;如果您还想保留原来的byte_array,那么显然需要将结果存储在另一个数组中。
#3
1
\#define CHAR_ARRAY_SIZE (10)
\#define INT_ARRAY_SIZE ((CHAR_ARRAY_SIZE/ (sizeof (unsigned int)) + 1)
typedef union _arr_tag_ {
char byte_array [CHAR_ARRAY_SIZE];
unsigned int int_array [INT_ARRAY_SIZE];
} arr_tag;
Now int_array for masking. This might work for both 32bit and 64 bit processors.
现在int_array掩蔽。这可能对32位和64位处理器都有效。
arr_tag arr_src, arr_result, arr_mask;
for (int i = 0; i < INT_ARRAY_SIZE; i ++) {
arr_result.int_array [i] = arr_src.int_array[i] & arr_mask.int_array [i];
}
Try this, code might also look clean.
试试这个,代码看起来也很干净。