如何用python将utf-8字符串转换为big5 ?

时间:2023-01-06 11:07:02

I use Python 2.6.6 My locale is ('en_US', 'UTF8')

我使用Python 2.6.6我的语言环境是('en_US', 'UTF8')

I try many ways to convert utf-8 string to big5, but it can't work. If you know how to do that, please give me some advice, thanks a lot.

我尝试了很多方法来将utf-8字符串转换为big5,但是它不能工作。如果你知道怎么做,请给我一些建议,非常感谢。


A chinese word called '單車', it mean 'bicycle'

中国的词称为“單車”,这意味着“自行车”

It's unicode is \u55ae\u8eca

这是unicode \ u55ae \ u8eca

str_a = u'\u55ae\u8eca'
str_b = '\u55ae\u8eca'
print str_a    # output '單車'
print str_b    # output '\u55ae\u8eca'

I know the str_a can be work, but I want to convert str_b to big5, too.

我知道str_a可以工作,但我也想将str_b转换为big5。

I try out decode, encode, unicode, but it still can't work.

我试着解码、编码、unicode,但还是不行。

Have any good idea? Thanks.

有什么好主意吗?谢谢。

2 个解决方案

#1


5  

str_b is a sequence of bytes:

str_b是一个字节序列:

In [19]: list(str_b)
Out[19]: ['\\', 'u', '5', '5', 'a', 'e', '\\', 'u', '8', 'e', 'c', 'a']

The backslash and u and so forth all are just separate characters. Compare that to sequence of unicode code points in the unicode object str_a:

反斜杠和u等等都是独立的字符。将此与unicode对象str_a中的unicode代码点序列进行比较:

In [24]: list(str_a)
Out[24]: [u'\u55ae', u'\u8eca']

To convert the mal-formed string str_b to unicode decode with unicode-escape:

将格式错误的字符串str_b转换为unicode解码,并使用unicode-escape:

In [20]: str_b.decode('unicode-escape')
Out[20]: u'\u55ae\u8eca'

In [21]: print(str_b.decode('unicode-escape'))
單車

#2


3  

You should be able to do this:

你应该能够做到:

str_a = u'\u55ae\u8eca'
str_b = str_a.encode('big5')
print str_a
print str_b.decode('big5')

#1


5  

str_b is a sequence of bytes:

str_b是一个字节序列:

In [19]: list(str_b)
Out[19]: ['\\', 'u', '5', '5', 'a', 'e', '\\', 'u', '8', 'e', 'c', 'a']

The backslash and u and so forth all are just separate characters. Compare that to sequence of unicode code points in the unicode object str_a:

反斜杠和u等等都是独立的字符。将此与unicode对象str_a中的unicode代码点序列进行比较:

In [24]: list(str_a)
Out[24]: [u'\u55ae', u'\u8eca']

To convert the mal-formed string str_b to unicode decode with unicode-escape:

将格式错误的字符串str_b转换为unicode解码,并使用unicode-escape:

In [20]: str_b.decode('unicode-escape')
Out[20]: u'\u55ae\u8eca'

In [21]: print(str_b.decode('unicode-escape'))
單車

#2


3  

You should be able to do this:

你应该能够做到:

str_a = u'\u55ae\u8eca'
str_b = str_a.encode('big5')
print str_a
print str_b.decode('big5')