在Python 3中转换字符串到字节的最佳方法是什么?

时间:2022-08-16 20:10:43

There appears to be two different ways to convert a string to bytes, as seen in the answers to TypeError: 'str' does not support the buffer interface


Which of these methods would be better or more Pythonic? Or is it just a matter of personal preference?


b = bytes(mystring, 'utf-8')

b = mystring.encode('utf-8')

5 个解决方案



If you look at the docs for bytes, it points you to bytearray:


bytearray([source[, encoding[, errors]]])


Return a new array of bytes. The bytearray type is a mutable sequence of integers in the range 0 <= x < 256. It has most of the usual methods of mutable sequences, described in Mutable Sequence Types, as well as most methods that the bytes type has, see Bytes and Byte Array Methods.

返回一个新的字节数组。bytearray类型是范围0 <= x < 256的整数的可变序列。它拥有大多数常用的可变序列方法,在可变序列类型中描述,以及字节类型所拥有的大多数方法,见字节和字节数组方法。

The optional source parameter can be used to initialize the array in a few different ways:


If it is a string, you must also give the encoding (and optionally, errors) parameters; bytearray() then converts the string to bytes using str.encode().


If it is an integer, the array will have that size and will be initialized with null bytes.


If it is an object conforming to the buffer interface, a read-only buffer of the object will be used to initialize the bytes array.


If it is an iterable, it must be an iterable of integers in the range 0 <= x < 256, which are used as the initial contents of the array.

如果它是可迭代的,那么它必须是在0 <= x < 256的范围内的整数的iterable,它被用作数组的初始内容。

Without an argument, an array of size 0 is created.


So bytes can do much more than just encode a string. It's Pythonic that it would allow you to call the constructor with any type of source parameter that makes sense.


For encoding a string, I think that some_string.encode(encoding) is more Pythonic than using the constructor, because it is the most self documenting -- "take this string and encode it with this encoding" is clearer than bytes(some_string, encoding) -- there is no explicit verb when you use the constructor.


Edit: I checked the Python source. If you pass a unicode string to bytes using CPython, it calls PyUnicode_AsEncodedString, which is the implementation of encode; so you're just skipping a level of indirection if you call encode yourself.


Also, see Serdalis' comment -- unicode_string.encode(encoding) is also more Pythonic because its inverse is byte_string.decode(encoding) and symmetry is nice.




Its easier than it is thought:


my_str = "hello world"
my_str_as_bytes = str.encode(my_str)
type(my_str_as_bytes) # ensure it is byte representation
my_decoded_str = my_str_as_bytes.decode()
type(my_decoded_str) # ensure it is string representation



The absolutely best way is neither of the 2, but the 3rd. The first parameter to encode defaults to 'utf-8'. Thus the best way is


b = mystring.encode()

This will also be faster, because the default argument results not in the string "utf-8" in the C code, but NULL, which is much faster to check!


Here be some timings:


In [1]: %timeit -r 10 'abc'.encode('utf-8')
The slowest run took 38.07 times longer than the fastest. 
This could mean that an intermediate result is being cached.
10000000 loops, best of 10: 183 ns per loop

In [2]: %timeit -r 10 'abc'.encode()
The slowest run took 27.34 times longer than the fastest. 
This could mean that an intermediate result is being cached.
10000000 loops, best of 10: 137 ns per loop

Despite the warning the times were very stable after repeated runs - the deviation was just ~2 per cent.




You can simply convert string to bytes using:




and you can simply convert bytes to string using:




bytes.decode and str.encode have encoding='utf-8' as default value.

decode和strl .encode将编码='utf-8'作为默认值。

The following functions (taken from Effective Python) might be useful to convert str to bytes and bytes to str:


def to_bytes(bytes_or_str):
    if isinstance(bytes_or_str, str):
        value = bytes_or_str.encode() # uses 'utf-8' for encoding
        value = bytes_or_str
    return value # Instance of bytes

def to_str(bytes_or_str):
    if isinstance(bytes_or_str, bytes):
        value = bytes_or_str.decode() # uses 'utf-8' for encoding
        value = bytes_or_str
    return value # Instance of str



so_string = '*'
so_bytes = so_string.encode( )



If you look at the docs for bytes, it points you to bytearray:


bytearray([source[, encoding[, errors]]])


Return a new array of bytes. The bytearray type is a mutable sequence of integers in the range 0 <= x < 256. It has most of the usual methods of mutable sequences, described in Mutable Sequence Types, as well as most methods that the bytes type has, see Bytes and Byte Array Methods.

返回一个新的字节数组。bytearray类型是范围0 <= x < 256的整数的可变序列。它拥有大多数常用的可变序列方法,在可变序列类型中描述,以及字节类型所拥有的大多数方法,见字节和字节数组方法。

The optional source parameter can be used to initialize the array in a few different ways:


If it is a string, you must also give the encoding (and optionally, errors) parameters; bytearray() then converts the string to bytes using str.encode().


If it is an integer, the array will have that size and will be initialized with null bytes.


If it is an object conforming to the buffer interface, a read-only buffer of the object will be used to initialize the bytes array.


If it is an iterable, it must be an iterable of integers in the range 0 <= x < 256, which are used as the initial contents of the array.

如果它是可迭代的,那么它必须是在0 <= x < 256的范围内的整数的iterable,它被用作数组的初始内容。

Without an argument, an array of size 0 is created.


So bytes can do much more than just encode a string. It's Pythonic that it would allow you to call the constructor with any type of source parameter that makes sense.


For encoding a string, I think that some_string.encode(encoding) is more Pythonic than using the constructor, because it is the most self documenting -- "take this string and encode it with this encoding" is clearer than bytes(some_string, encoding) -- there is no explicit verb when you use the constructor.


Edit: I checked the Python source. If you pass a unicode string to bytes using CPython, it calls PyUnicode_AsEncodedString, which is the implementation of encode; so you're just skipping a level of indirection if you call encode yourself.


Also, see Serdalis' comment -- unicode_string.encode(encoding) is also more Pythonic because its inverse is byte_string.decode(encoding) and symmetry is nice.




Its easier than it is thought:


my_str = "hello world"
my_str_as_bytes = str.encode(my_str)
type(my_str_as_bytes) # ensure it is byte representation
my_decoded_str = my_str_as_bytes.decode()
type(my_decoded_str) # ensure it is string representation



The absolutely best way is neither of the 2, but the 3rd. The first parameter to encode defaults to 'utf-8'. Thus the best way is


b = mystring.encode()

This will also be faster, because the default argument results not in the string "utf-8" in the C code, but NULL, which is much faster to check!


Here be some timings:


In [1]: %timeit -r 10 'abc'.encode('utf-8')
The slowest run took 38.07 times longer than the fastest. 
This could mean that an intermediate result is being cached.
10000000 loops, best of 10: 183 ns per loop

In [2]: %timeit -r 10 'abc'.encode()
The slowest run took 27.34 times longer than the fastest. 
This could mean that an intermediate result is being cached.
10000000 loops, best of 10: 137 ns per loop

Despite the warning the times were very stable after repeated runs - the deviation was just ~2 per cent.




You can simply convert string to bytes using:




and you can simply convert bytes to string using:




bytes.decode and str.encode have encoding='utf-8' as default value.

decode和strl .encode将编码='utf-8'作为默认值。

The following functions (taken from Effective Python) might be useful to convert str to bytes and bytes to str:


def to_bytes(bytes_or_str):
    if isinstance(bytes_or_str, str):
        value = bytes_or_str.encode() # uses 'utf-8' for encoding
        value = bytes_or_str
    return value # Instance of bytes

def to_str(bytes_or_str):
    if isinstance(bytes_or_str, bytes):
        value = bytes_or_str.decode() # uses 'utf-8' for encoding
        value = bytes_or_str
    return value # Instance of str



so_string = '*'
so_bytes = so_string.encode( )