某些字符比其他字符占用更多字节吗?

时间:2022-09-23 00:09:50

I'm not very experienced with lower level things such as howmany bytes a character is. I tried finding out if one character equals one byte, but without success.

我对低级别的东西不是很有经验,比如字符的字节数。我试着找出一个字符是否等于一个字节,但没有成功。

I need to set a delimiter used for socket connections between a server and clients. This delimiter has to be as small (in bytes) as possible, to minimize bandwidth.

我需要设置一个用于服务器和客户端之间的套接字连接的分隔符。此分隔符必须尽可能小(以字节为单位),以最小化带宽。

The current delimiter is "#". Would getting an other delimiter decrease my bandwidth?

当前分隔符是“#”。获得另一个分隔符会减少我的带宽吗?

4 个解决方案

#1


It depends on what character encoding you use to translate between characters and bytes (which are not at all the same thing):

这取决于您使用什么字符编码在字符和字节之间进行转换(这些都不是完全相同的):

  • In ASCII or ISO 8859, each character is represented by one byte
  • 在ASCII或ISO 8859中,每个字符由一个字节表示

  • In UTF-32, each character is represented by 4 bytes
  • 在UTF-32中,每个字符由4个字节表示

  • In UTF-8, each character uses between 1 and 4 bytes
  • 在UTF-8中,每个字符使用1到4个字节

  • In ISO 2022, it's much more complicated
  • 在ISO 2022中,它要复杂得多

US-ASCII characters (of whcich # is one) will take only 1 byte in UTF-8, which is the most popular encoding that allows multibyte characters.

US-ASCII字符(其中#是一个)在UTF-8中只占用1个字节,这是允许多字节字符的最流行的编码。

#2


It depends on the encoding. In Single-byte character sets such as ANSI and the various ISO8859 character sets it is one byte per character. Some encodings such as UTF8 are variable width where the number of bytes to encode a character depends on the glyph being encoded.

这取决于编码。在ANSI等单字节字符集和各种ISO8859字符集中,每个字符一个字节。某些编码(如UTF8)是可变宽度,其中编码字符的字节数取决于要编码的字形。

#3


The answer of course is that it depends. If you are in a pure ASCII env, then yes, every char takes 1 byte, but if you are in a Unicode env (all of Windows for example), then chars can range from 1 to 4 bytes in size.

答案当然是取决于它。如果您使用纯ASCII env,那么是,每个char占用1个字节,但如果您使用Unicode env(例如所有Windows),则字符的大小范围为1到4个字节。

If you choose a char from the ASCII set, then yes your delimter is a small as possible.

如果您从ASCII集中选择一个字符,那么您的分隔符尽可能小。

#4


No, all characters are 1 byte, unless you're using Unicode or wide characters (for accents and other symbols for example).

不,所有字符都是1个字节,除非您使用的是Unicode或宽字符(例如,重音和其他符号)。

A character is 1 byte, or 8 bits, long which gives 256 possible combination to form characters with. 1 byte characters are called ASCII characters. They only use 7 bits (even though 8 are available, but you can't use this 8th bit) to form the standard alphabet and various symbols used when teletypes and typewriters were still common.

字符长度为1个字节或8位,它提供256种可能的组合以形成字符。 1个字节的字符称为ASCII字符。它们只使用7位(即使有8位可用,但你不能使用这个第8位)来形成标准字母和各种符号,当电传和打字机仍然很常见时使用。

You can find an ASCII chart and what numbers correspond to what characters here.

您可以找到ASCII图表,其中的数字与此处的字符相对应。

#1


It depends on what character encoding you use to translate between characters and bytes (which are not at all the same thing):

这取决于您使用什么字符编码在字符和字节之间进行转换(这些都不是完全相同的):

  • In ASCII or ISO 8859, each character is represented by one byte
  • 在ASCII或ISO 8859中,每个字符由一个字节表示

  • In UTF-32, each character is represented by 4 bytes
  • 在UTF-32中,每个字符由4个字节表示

  • In UTF-8, each character uses between 1 and 4 bytes
  • 在UTF-8中,每个字符使用1到4个字节

  • In ISO 2022, it's much more complicated
  • 在ISO 2022中,它要复杂得多

US-ASCII characters (of whcich # is one) will take only 1 byte in UTF-8, which is the most popular encoding that allows multibyte characters.

US-ASCII字符(其中#是一个)在UTF-8中只占用1个字节,这是允许多字节字符的最流行的编码。

#2


It depends on the encoding. In Single-byte character sets such as ANSI and the various ISO8859 character sets it is one byte per character. Some encodings such as UTF8 are variable width where the number of bytes to encode a character depends on the glyph being encoded.

这取决于编码。在ANSI等单字节字符集和各种ISO8859字符集中,每个字符一个字节。某些编码(如UTF8)是可变宽度,其中编码字符的字节数取决于要编码的字形。

#3


The answer of course is that it depends. If you are in a pure ASCII env, then yes, every char takes 1 byte, but if you are in a Unicode env (all of Windows for example), then chars can range from 1 to 4 bytes in size.

答案当然是取决于它。如果您使用纯ASCII env,那么是,每个char占用1个字节,但如果您使用Unicode env(例如所有Windows),则字符的大小范围为1到4个字节。

If you choose a char from the ASCII set, then yes your delimter is a small as possible.

如果您从ASCII集中选择一个字符,那么您的分隔符尽可能小。

#4


No, all characters are 1 byte, unless you're using Unicode or wide characters (for accents and other symbols for example).

不,所有字符都是1个字节,除非您使用的是Unicode或宽字符(例如,重音和其他符号)。

A character is 1 byte, or 8 bits, long which gives 256 possible combination to form characters with. 1 byte characters are called ASCII characters. They only use 7 bits (even though 8 are available, but you can't use this 8th bit) to form the standard alphabet and various symbols used when teletypes and typewriters were still common.

字符长度为1个字节或8位,它提供256种可能的组合以形成字符。 1个字节的字符称为ASCII字符。它们只使用7位(即使有8位可用,但你不能使用这个第8位)来形成标准字母和各种符号,当电传和打字机仍然很常见时使用。

You can find an ASCII chart and what numbers correspond to what characters here.

您可以找到ASCII图表,其中的数字与此处的字符相对应。