使用C#,将包含二进制数据的字符串转换为字节数组的最有效方法是什么

时间:2022-09-23 17:53:07

While there are 100 ways to solve the conversion problem, I am focusing on performance.

虽然有100种方法可以解决转换问题,但我关注的是性能。

Give that the string only contains binary data, what is the fastest method, in terms of performance, of converting that data to a byte[] (not char[]) under C#?

假设字符串只包含二进制数据,就性能而言,在C#下将该数据转换为byte [](而不是char [])的最快方法是什么?

Clarification: This is not ASCII data, rather binary data that happens to be in a string.

澄清:这不是ASCII数据,而是恰好在字符串中的二进制数据。

4 个解决方案

#1


3  

I'm not sure ASCIIEncoding.GetBytes is going to do it, because it only supports the range 0x0000 to 0x007F.

我不确定ASCIIEncoding.GetBytes是否会这样做,因为它只支持0x0000到0x007F的范围。

You tell the string contains only bytes. But a .NET string is an array of chars, and 1 char is 2 bytes (because a .NET stores strings as UTF16). So you can either have two situations for storing the bytes 0x42 and 0x98:

你告诉字符串只包含字节。但是.NET字符串是一个字符数组,1个字符是2个字节(因为.NET将字符串存储为UTF16)。所以你可以有两种情况来存储字节0x42和0x98:

  1. The string was an ANSI string and contained bytes and is converted to an unicode string, thus the bytes will be 0x00 0x42 0x00 0x98. (The string is stored as 0x0042 and 0x0098)
  2. 该字符串是ANSI字符串并包含字节并转换为unicode字符串,因此字节将为0x00 0x42 0x00 0x98。 (字符串存储为0x0042和0x0098)

  3. The string was just a byte array which you typecasted or just recieved to an string and thus became the following bytes 0x42 0x98. (The string is stored as 0x9842)
  4. 该字符串只是一个字节数组,您将其类型化或刚收到一个字符串,因此成为以下字节0x42 0x98。 (该字符串存储为0x9842)

In the first situation on the result would be 0x42 and 0x3F (ascii for "B?"). The second situation would result in 0x3F (ascii for "?"). This is logical, because the chars are outside of the valid ascii range and the encoder does not know what to do with those values.

在第一种情况下,结果将是0x42和0x3F(ascii为“B?”)。第二种情况会导致0x3F(ascii表示“?”)。这是合乎逻辑的,因为字符在有效的ascii范围之外,并且编码器不知道如何处理这些值。

So i'm wondering why it's a string with bytes?

所以我想知道为什么它是一个字节的字符串?

  • Maybe it contains a byte encoded as a string (for instance Base64)?
  • 也许它包含一个编码为字符串的字节(例如Base64)?

  • Maybe you should start with an char array or a byte array?
  • 也许你应该从char数组或字节数组开始?

If you realy do have situation 2 and you want to get the bytes out of it you should use the UnicodeEncoding.GetBytes call. Because that will return 0x42 and 0x98.

如果您确实遇到情况2并且想要从中获取字节,则应使用UnicodeEncoding.GetBytes调用。因为那将返回0x42和0x98。

If you'd like to go from a char array to byte array, the fastest way would be Marshaling.. But that's not really nice, and uses double memory.

如果你想从一个char数组转到字节数组,那么最快的方法就是Marshaling ..但这并不是很好,并且使用双内存。

public Byte[] ConvertToBytes(Char[] source)
{
    Byte[] result = new Byte[source.Length * sizeof(Char)];
    IntPtr tempBuffer = Marshal.AllocHGlobal(result.Length);
    try
    {
        Marshal.Copy(source, 0, tempBuffer, source.Length);
        Marshal.Copy(tempBuffer, result, 0, result.Length);
    }
    finally
    {
        Marshal.FreeHGlobal(tempBuffer);
    }
    return result;
}

#3


0  

There is no such thing as an ASCII string in C#! Strings always contain UTF-16. Not realizing this leads to a lot of problems. That said, the methods mentioned before work because they consider the string as UTF-16 encoded and transform the characters to ASCII symbols.

在C#中没有ASCII字符串这样的东西!字符串始终包含UTF-16。没有意识到这会导致很多问题。也就是说,之前提到的方法是因为它们将字符串视为UTF-16编码并将字符转换为ASCII符号。

/EDIT in response to the clarification: how did the binary data get in the string? Strings aren't supposed to contain binary data (use byte[] for that).

/ EDIT响应澄清:二进制数据是如何进入字符串的?字符串不应包含二进制数据(使用byte [])。

#4


0  

If you want to go from a string to binary data, you must know what encoding was used to convert the binary data to a string in the first place. Otherwise, you might not end up with the correct binary data. So, the most efficient way is likely GetBytes() on an Encoding subclass (such as UTF8Encoding), but you must know for sure which encoding.

如果要从字符串转换为二进制数据,则必须知道首先使用哪种编码将二进制数据转换为字符串。否则,您可能无法获得正确的二进制数据。因此,最有效的方法可能是编码子类上的GetBytes()(例如UTF8Encoding),但您必须确定哪种编码。

The comment by Kent Boogaart on the original question sums it up pretty well. ;]

Kent Boogaart对原始问题的评论总结得很好。 ]

#1


3  

I'm not sure ASCIIEncoding.GetBytes is going to do it, because it only supports the range 0x0000 to 0x007F.

我不确定ASCIIEncoding.GetBytes是否会这样做,因为它只支持0x0000到0x007F的范围。

You tell the string contains only bytes. But a .NET string is an array of chars, and 1 char is 2 bytes (because a .NET stores strings as UTF16). So you can either have two situations for storing the bytes 0x42 and 0x98:

你告诉字符串只包含字节。但是.NET字符串是一个字符数组,1个字符是2个字节(因为.NET将字符串存储为UTF16)。所以你可以有两种情况来存储字节0x42和0x98:

  1. The string was an ANSI string and contained bytes and is converted to an unicode string, thus the bytes will be 0x00 0x42 0x00 0x98. (The string is stored as 0x0042 and 0x0098)
  2. 该字符串是ANSI字符串并包含字节并转换为unicode字符串,因此字节将为0x00 0x42 0x00 0x98。 (字符串存储为0x0042和0x0098)

  3. The string was just a byte array which you typecasted or just recieved to an string and thus became the following bytes 0x42 0x98. (The string is stored as 0x9842)
  4. 该字符串只是一个字节数组,您将其类型化或刚收到一个字符串,因此成为以下字节0x42 0x98。 (该字符串存储为0x9842)

In the first situation on the result would be 0x42 and 0x3F (ascii for "B?"). The second situation would result in 0x3F (ascii for "?"). This is logical, because the chars are outside of the valid ascii range and the encoder does not know what to do with those values.

在第一种情况下,结果将是0x42和0x3F(ascii为“B?”)。第二种情况会导致0x3F(ascii表示“?”)。这是合乎逻辑的,因为字符在有效的ascii范围之外,并且编码器不知道如何处理这些值。

So i'm wondering why it's a string with bytes?

所以我想知道为什么它是一个字节的字符串?

  • Maybe it contains a byte encoded as a string (for instance Base64)?
  • 也许它包含一个编码为字符串的字节(例如Base64)?

  • Maybe you should start with an char array or a byte array?
  • 也许你应该从char数组或字节数组开始?

If you realy do have situation 2 and you want to get the bytes out of it you should use the UnicodeEncoding.GetBytes call. Because that will return 0x42 and 0x98.

如果您确实遇到情况2并且想要从中获取字节,则应使用UnicodeEncoding.GetBytes调用。因为那将返回0x42和0x98。

If you'd like to go from a char array to byte array, the fastest way would be Marshaling.. But that's not really nice, and uses double memory.

如果你想从一个char数组转到字节数组,那么最快的方法就是Marshaling ..但这并不是很好,并且使用双内存。

public Byte[] ConvertToBytes(Char[] source)
{
    Byte[] result = new Byte[source.Length * sizeof(Char)];
    IntPtr tempBuffer = Marshal.AllocHGlobal(result.Length);
    try
    {
        Marshal.Copy(source, 0, tempBuffer, source.Length);
        Marshal.Copy(tempBuffer, result, 0, result.Length);
    }
    finally
    {
        Marshal.FreeHGlobal(tempBuffer);
    }
    return result;
}

#2


#3


0  

There is no such thing as an ASCII string in C#! Strings always contain UTF-16. Not realizing this leads to a lot of problems. That said, the methods mentioned before work because they consider the string as UTF-16 encoded and transform the characters to ASCII symbols.

在C#中没有ASCII字符串这样的东西!字符串始终包含UTF-16。没有意识到这会导致很多问题。也就是说,之前提到的方法是因为它们将字符串视为UTF-16编码并将字符转换为ASCII符号。

/EDIT in response to the clarification: how did the binary data get in the string? Strings aren't supposed to contain binary data (use byte[] for that).

/ EDIT响应澄清:二进制数据是如何进入字符串的?字符串不应包含二进制数据(使用byte [])。

#4


0  

If you want to go from a string to binary data, you must know what encoding was used to convert the binary data to a string in the first place. Otherwise, you might not end up with the correct binary data. So, the most efficient way is likely GetBytes() on an Encoding subclass (such as UTF8Encoding), but you must know for sure which encoding.

如果要从字符串转换为二进制数据,则必须知道首先使用哪种编码将二进制数据转换为字符串。否则,您可能无法获得正确的二进制数据。因此,最有效的方法可能是编码子类上的GetBytes()(例如UTF8Encoding),但您必须确定哪种编码。

The comment by Kent Boogaart on the original question sums it up pretty well. ;]

Kent Boogaart对原始问题的评论总结得很好。 ]