将二进制数据转换为可打印的十六进制

时间:2023-01-08 15:46:23

In this thread some one commented that the following code should only be used in 'toy' projects. Unfortunately he hasn't come back to say why it's not of production quality so I was hoping some one in the community may be able to either assure me the code is ok (because I quite like it) or identify what is wrong.

在这个帖子中,有人评论说下面的代码只能在'玩具'项目中使用。不幸的是,他还没有回来说为什么它不符合生产质量,所以我希望社区中的某个人能够向我保证代码没问题(因为我非常喜欢)或者找出错误。

template< class T1, class T2>
void hexascii( T1& out, const T2& in )
{
    out.resize( in.size() * 2 );
    const char hexDigits[] = {'0', '1', '2', '3', '4', '5', '6', '7','8', '9', 'A', 'B', 'C', 'D', 'E', 'F'};
    T1::iterator outit = out.begin();
    for( T2::const_iterator it = in.begin(); it != in.end(); ++it )
    {
        *outit++ = hexDigits[*it >> 4];
        *outit++ = hexDigits[*it & 0xF];
    }
}

template<class T1, class T2>
void asciihex( T1& out, const T2& in )
{
    size_t size = in.size;
    assert( !(size % 2) );

    out.resize( size / 2 );
    T1::iterator outit = out.begin();
    for( T2::const_iterator it = in.begin(); it != in.end(); it += 2, ++outit )
    {
    *outit = ((( (*it > '9' ? *it - 0x07 : *it)  - 0x30) << 4) & 0x00f0) + 
                (((*(it+1) > '9' ? *(it+1) - 0x07 : *(it+1)) - 0x30) & 0x000f);
    }
}

Edit: Thanks for your help guys, you've made some big improvements. I've written functions in the two suggested styles from your answers. Some rough testing suggests the second method is marginally faster than the first, but IMO this is outweighed by the improved readability of the first.

编辑:感谢您的帮助,您已经做了一些重大改进。我从你的答案中写出了两种建议风格的函数。一些粗略的测试表明,第二种方法比第一种方法略快,但是IMO比第一种方法的可读性提高了。

template<class T1>
void asciihex2( T1& out, const std::string& in )
{
    dassert( sizeof(T1::value_type)==1 );
    size_t size = in.size();
assert( !(size % 2) );
    out.resize( size / 2 );
    T1::iterator outit = out.begin();
    for( size_t i = 0; i < in.size(); i += 2 )
    {
        int tmp;
        sscanf( in.c_str() + i, "%02X", &tmp );
        *outit++ = tmp;
    }
}

template<class T1>
void asciihex3( T1& out, const std::string& in )
{
    dassert( sizeof(T1::value_type)==1 );
    size_t size = in.size();
assert( !(size % 2) );
    out.resize( size / 2 );
    T1::iterator outit = out.begin();
const char hexDigits[] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 
                          0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
                  0x0A, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F};
for( std::string::const_iterator it = in.begin(); it != in.end(); it += 2, ++outit )
    {
    *outit = (hexDigits[(*it - 0x30) & 0x1f] << 4) + 
              hexDigits[((*(it+1) - 0x30) & 0x1f)];
    }
}

Some of the assumptions surronding this code: 1: They are not intended as a generic, but are used in an anonymous name space to translate data for a specific class. 2: The templating is required as two separate container types are being used (one being std::vector, the other a similar byte array type container from a third party library. 3: The purpose is to be able to convert binary data of indeterminate length into strings and back again (0x1234abcd <-> "1234abcd") 4: assert traps errors in both debug and release modes 5: by the time these functions are called the size of the string will already have been checked, assert is used to terminate processing if something serious has gone wrong 6: It needs some commenting

这段代码中的一些假设:1:它们不是通用的,而是在匿名名称空间中用于转换特定类的数据。 2:模板是必需的,因为正在使用两个单独的容器类型(一个是std :: vector,另一个是来自第三方库的类似字节数组类型容器.3:目的是能够转换不确定的二进制数据长度为字符串然后再返回(0x1234abcd < - >“1234abcd”)4:在调试和释放模式下断言陷阱错误5:在调用这些函数时,字符串的大小已经被检查过,断言用于如果出现严重问题则终止处理6:需要一些评论

Any other ideas appreciated.

任何其他想法赞赏。

9 个解决方案

#1


It seems like a lot of templated code to achieve very little, given you have direct hex conversion in the standard C scanf and printf functions. why bother?

考虑到在标准C scanf和printf函数中有直接的十六进制转换,似乎很多模板化的代码实现得很少。何必?

#2


My main comment about it is that it's very difficult to read.

我对它的主要评论是,它很难阅读。

Especially:

*outit = ((( (*it > '9' ? *it - 0x07 : *it)  - 0x30) << 4) & 0x00f0) + 
            (((*(it+1) > '9' ? *(it+1) - 0x07 : *(it+1)) - 0x30) & 0x000f)

It would take my brain a little while to grok that, and annoy me if I inherited the code.

如果我继承了代码,那么我的大脑会花一点时间来解决这个问题并让我烦恼。

#3


What is it supposed to do? There is no well-known accepted meaning of hexascii or asciihex, so the names should change.

该怎么办? hexascii或asciihex没有众所周知的公认含义,因此名称应该改变。

[edit] Converting from binary to hex notation should often not be called ascii..., as ascii is a 7-bit format.

[edit]从二进制转换为十六进制表示法通常不应被称为ascii ...,因为ascii是一种7位格式。

#4


I don't really object against it. It's generic (within limits), it uses consts, references where needed, etc... It lacks a bit of documentation, and the asciihex *outit assignment is not quite clear at first sight.

我并不反对它。它是通用的(在限制范围内),它使用了consts,在需要的地方引用了等等......它缺少一些文档,并且asciihex * outit赋值乍一看并不十分清楚。

resize initializes the output's elements unnecessary (use reserve instead).

resize初始化输出的元素是不必要的(改为使用reserve)。

Maybe the genericity is somewhat too flexible: you can feed the algorithms with any datatype you like, while you should only give it hex numbers (not e.g. a vector of doubles)

也许通用性有点过于灵活:您可以使用您喜欢的任何数据类型提供算法,而您应该只给它十六进制数字(不是例如双精度矢量)

And indeed, it may be a bit overkill, given the presence of good library functions.

事实上,鉴于存在良好的库函数,它可能有点矫枉过正。

#5


What's wrong with

怎么了?

*outit = hexDigits[*it]

Why can't these two functions share a common list of hexDigits and eliminate the complex (and slow) calculation of an ASCII character?

为什么这两个函数不能共享一个公共的hexDigit列表并消除ASCII字符的复杂(和慢速)计算?

#6


  • Code has assert statements instead of proper handling of an error condition (and if your assert is turned off, the code may blow up)

    代码具有断言语句而不是正确处理错误条件(如果断言被关闭,代码可能会爆炸)

  • for loop has dangerous double-increase of iterator (it+=2). Especially in case your assert did not fire. What happens when your iterator is already at the end and you ++ it?

    for循环具有危险的迭代器双倍增加(它+ = 2)。特别是在你的断言没有开火的情况下。当你的迭代器已经结束时你会发生什么?

  • Code is templated, but what you're doing is simply converting characters to numbers or the other way round. It's cargo cult programming. You hope that the blessings of template programming will come upon you by using templates. You even tagged this as a template question although the template aspect is completely irrelevant in your functions.

    代码是模板化的,但你所做的只是将字符转换为数字或反过来。这是货物崇拜节目。您希望通过使用模板来获得模板编程的祝福。您甚至将此标记为模板问题,尽管模板方面与您的函数完全无关。

  • the *outit= line is too complicated.

    * outit =行太复杂了。

  • code reinvents the wheel. In a big way.

    代码重新发明*。在很大程度上。

#7


Some problems that I see:

我看到的一些问题:

This will work great if it is only used for an input container that stores 8 bit types - e.g. char or unsigned char. For example, the following code will fail if used with a 32 bit type whose value after the right shift is greater than 15 - recommend that you always use a mask to ensure that lookup index is always within range.

如果它仅用于存储8位类型的输入容器,那么它将工作得很好 - 例如, char或unsigned char。例如,如果使用32位类型(右移后的值大于15),则以下代码将失败 - 建议您始终使用掩码以确保查找索引始终在范围内。

*outit++ = hexDigits[*it >> 4];

What is the expected behavior if you pass in a container containing unsigned longs - for this to be a generic class it should probably be able to handle the conversion of 32 bit numbers to hext strings also.

如果传入包含无符号long的容器,预期的行为是什么 - 为了使它成为泛型类,它应该能够处理32位数字到hext字符串的转换。

This only works when the input is a container - what if I just want to convert a single byte? A suggestion here is to refactor the code into a core function that can covert a single byte (hex=>ascii and ascii=>hex) and then provide additional functions to use this core function for coverting containers of bytes etc.

这仅在输入是容器时才有效 - 如果我只想转换单个字节怎么办?这里的一个建议是将代码重构为一个核心函数,它可以转换单个字节(hex => ascii和ascii => hex),然后提供额外的函数来使用这个核心函数来转换字节容器等。

In asciihex(), bad things will happen if the size of the input container is not divisible by 2. The use of:

在asciihex()中,如果输入容器的大小不能被2整除,则会发生不好的事情。使用:

it != in.end(); it += 2

is dangerous since if the container size is not divisible by 2 then the increment by two will advance the iterator past the end of the container and the comparison against end() will never work. This is somewhat protected against via the assert call but assert can be compiled out (e.g. it is often compiled out in release builds) so it would be much better to make this an if statement.

是危险的,因为如果容器大小不能被2整除,则增加2会使迭代器超过容器的末端,并且与end()的比较将永远不会起作用。这可以通过断言调用进行一定程度的保护,但可以编译掉断言(例如,它通常在发布版本中编译),因此将其作为if语句会好得多。

#8


Problems I spot:

我发现的问题:

hexascii does not check if sizeof(T2::value_type)==1

hexascii不检查sizeof(T2 :: value_type)== 1

hexascii dereferences it twice, asciihex even more. There's no reason for this, as you can store the result. This means you can't use an istream_iterator.

hexascii取消引用它两次,asciihex甚至更多。这没有理由,因为您可以存储结果。这意味着您无法使用istream_iterator。

asciihex needs a random iterator as input, because (it+1) and (it+=2) are used. The algorithm could work on a forward iterator if you use only (++it).

asciihex需要一个随机迭代器作为输入,因为使用了(它+ 1)和(它+ = 2)。如果仅使用(++ it),该算法可以在前向迭代器上工作。

(*it > '9' ? *it - 0x07 : *it) - 0x30 can be simplified to *it - (*it > '9' ? 0x37 : 0x30) so there is only one unconditional subtraction left. Still, an array lookup would be more efficient. Subtract 0x30. '0' will become 0;'A' will become 0x11 and 'a' will become 0x31. Mask with 0x1f to make it case-insensitive, and you can do the resulting lookup in a char[0x20] without overflow risks. Non-hex chars will just give you weird values.

(*它>'9'?*它 - 0x07:*它) - 0x30可以简化为*它 - (*它>'9'?0x37:0x30)所以只剩下一个无条件减法。尽管如此,数组查找会更有效。减去0x30。 '0'将变为0;'A'将变为0x11,'a'将变为0x31。使用0x1f进行掩码以使其不区分大小写,并且可以在char [0x20]中执行结果查找而不会出现溢出风险。非十六进制字符只会给你奇怪的值。

#9


The reason I would consider it toy code is there is no error checking.

我认为玩具代码的原因是没有错误检查。

I could pass it two vector and it would happily try and do something and make a complete mess generating random gibberish.

我可以传递它两个向量,它会愉快地尝试做一些事情,并使一个完整的混乱产生随机乱码。

#1


It seems like a lot of templated code to achieve very little, given you have direct hex conversion in the standard C scanf and printf functions. why bother?

考虑到在标准C scanf和printf函数中有直接的十六进制转换,似乎很多模板化的代码实现得很少。何必?

#2


My main comment about it is that it's very difficult to read.

我对它的主要评论是,它很难阅读。

Especially:

*outit = ((( (*it > '9' ? *it - 0x07 : *it)  - 0x30) << 4) & 0x00f0) + 
            (((*(it+1) > '9' ? *(it+1) - 0x07 : *(it+1)) - 0x30) & 0x000f)

It would take my brain a little while to grok that, and annoy me if I inherited the code.

如果我继承了代码,那么我的大脑会花一点时间来解决这个问题并让我烦恼。

#3


What is it supposed to do? There is no well-known accepted meaning of hexascii or asciihex, so the names should change.

该怎么办? hexascii或asciihex没有众所周知的公认含义,因此名称应该改变。

[edit] Converting from binary to hex notation should often not be called ascii..., as ascii is a 7-bit format.

[edit]从二进制转换为十六进制表示法通常不应被称为ascii ...,因为ascii是一种7位格式。

#4


I don't really object against it. It's generic (within limits), it uses consts, references where needed, etc... It lacks a bit of documentation, and the asciihex *outit assignment is not quite clear at first sight.

我并不反对它。它是通用的(在限制范围内),它使用了consts,在需要的地方引用了等等......它缺少一些文档,并且asciihex * outit赋值乍一看并不十分清楚。

resize initializes the output's elements unnecessary (use reserve instead).

resize初始化输出的元素是不必要的(改为使用reserve)。

Maybe the genericity is somewhat too flexible: you can feed the algorithms with any datatype you like, while you should only give it hex numbers (not e.g. a vector of doubles)

也许通用性有点过于灵活:您可以使用您喜欢的任何数据类型提供算法,而您应该只给它十六进制数字(不是例如双精度矢量)

And indeed, it may be a bit overkill, given the presence of good library functions.

事实上,鉴于存在良好的库函数,它可能有点矫枉过正。

#5


What's wrong with

怎么了?

*outit = hexDigits[*it]

Why can't these two functions share a common list of hexDigits and eliminate the complex (and slow) calculation of an ASCII character?

为什么这两个函数不能共享一个公共的hexDigit列表并消除ASCII字符的复杂(和慢速)计算?

#6


  • Code has assert statements instead of proper handling of an error condition (and if your assert is turned off, the code may blow up)

    代码具有断言语句而不是正确处理错误条件(如果断言被关闭,代码可能会爆炸)

  • for loop has dangerous double-increase of iterator (it+=2). Especially in case your assert did not fire. What happens when your iterator is already at the end and you ++ it?

    for循环具有危险的迭代器双倍增加(它+ = 2)。特别是在你的断言没有开火的情况下。当你的迭代器已经结束时你会发生什么?

  • Code is templated, but what you're doing is simply converting characters to numbers or the other way round. It's cargo cult programming. You hope that the blessings of template programming will come upon you by using templates. You even tagged this as a template question although the template aspect is completely irrelevant in your functions.

    代码是模板化的,但你所做的只是将字符转换为数字或反过来。这是货物崇拜节目。您希望通过使用模板来获得模板编程的祝福。您甚至将此标记为模板问题,尽管模板方面与您的函数完全无关。

  • the *outit= line is too complicated.

    * outit =行太复杂了。

  • code reinvents the wheel. In a big way.

    代码重新发明*。在很大程度上。

#7


Some problems that I see:

我看到的一些问题:

This will work great if it is only used for an input container that stores 8 bit types - e.g. char or unsigned char. For example, the following code will fail if used with a 32 bit type whose value after the right shift is greater than 15 - recommend that you always use a mask to ensure that lookup index is always within range.

如果它仅用于存储8位类型的输入容器,那么它将工作得很好 - 例如, char或unsigned char。例如,如果使用32位类型(右移后的值大于15),则以下代码将失败 - 建议您始终使用掩码以确保查找索引始终在范围内。

*outit++ = hexDigits[*it >> 4];

What is the expected behavior if you pass in a container containing unsigned longs - for this to be a generic class it should probably be able to handle the conversion of 32 bit numbers to hext strings also.

如果传入包含无符号long的容器,预期的行为是什么 - 为了使它成为泛型类,它应该能够处理32位数字到hext字符串的转换。

This only works when the input is a container - what if I just want to convert a single byte? A suggestion here is to refactor the code into a core function that can covert a single byte (hex=>ascii and ascii=>hex) and then provide additional functions to use this core function for coverting containers of bytes etc.

这仅在输入是容器时才有效 - 如果我只想转换单个字节怎么办?这里的一个建议是将代码重构为一个核心函数,它可以转换单个字节(hex => ascii和ascii => hex),然后提供额外的函数来使用这个核心函数来转换字节容器等。

In asciihex(), bad things will happen if the size of the input container is not divisible by 2. The use of:

在asciihex()中,如果输入容器的大小不能被2整除,则会发生不好的事情。使用:

it != in.end(); it += 2

is dangerous since if the container size is not divisible by 2 then the increment by two will advance the iterator past the end of the container and the comparison against end() will never work. This is somewhat protected against via the assert call but assert can be compiled out (e.g. it is often compiled out in release builds) so it would be much better to make this an if statement.

是危险的,因为如果容器大小不能被2整除,则增加2会使迭代器超过容器的末端,并且与end()的比较将永远不会起作用。这可以通过断言调用进行一定程度的保护,但可以编译掉断言(例如,它通常在发布版本中编译),因此将其作为if语句会好得多。

#8


Problems I spot:

我发现的问题:

hexascii does not check if sizeof(T2::value_type)==1

hexascii不检查sizeof(T2 :: value_type)== 1

hexascii dereferences it twice, asciihex even more. There's no reason for this, as you can store the result. This means you can't use an istream_iterator.

hexascii取消引用它两次,asciihex甚至更多。这没有理由,因为您可以存储结果。这意味着您无法使用istream_iterator。

asciihex needs a random iterator as input, because (it+1) and (it+=2) are used. The algorithm could work on a forward iterator if you use only (++it).

asciihex需要一个随机迭代器作为输入,因为使用了(它+ 1)和(它+ = 2)。如果仅使用(++ it),该算法可以在前向迭代器上工作。

(*it > '9' ? *it - 0x07 : *it) - 0x30 can be simplified to *it - (*it > '9' ? 0x37 : 0x30) so there is only one unconditional subtraction left. Still, an array lookup would be more efficient. Subtract 0x30. '0' will become 0;'A' will become 0x11 and 'a' will become 0x31. Mask with 0x1f to make it case-insensitive, and you can do the resulting lookup in a char[0x20] without overflow risks. Non-hex chars will just give you weird values.

(*它>'9'?*它 - 0x07:*它) - 0x30可以简化为*它 - (*它>'9'?0x37:0x30)所以只剩下一个无条件减法。尽管如此,数组查找会更有效。减去0x30。 '0'将变为0;'A'将变为0x11,'a'将变为0x31。使用0x1f进行掩码以使其不区分大小写,并且可以在char [0x20]中执行结果查找而不会出现溢出风险。非十六进制字符只会给你奇怪的值。

#9


The reason I would consider it toy code is there is no error checking.

我认为玩具代码的原因是没有错误检查。

I could pass it two vector and it would happily try and do something and make a complete mess generating random gibberish.

我可以传递它两个向量,它会愉快地尝试做一些事情,并使一个完整的混乱产生随机乱码。