如何将unicode转义序列转换为python字符串中的unicode字符?

时间:2020-12-27 22:28:04

When I tried to get the content of a tag using "unicode(head.contents[3])" i get the output similar to this: "Christensen Sk\xf6ld". I want the escape sequence to be returned as string. How to do it in python?

当我尝试使用“unicode(header .contents[3])”来获取标记的内容时,我得到的输出与此类似:“Christensen Sk\xf6ld”。我希望将转义序列作为字符串返回。如何在python中实现它?

3 个解决方案

#1


28  

Assuming Python sees the name as a normal string, you'll first have to decode it to unicode:

假设Python将名称视为一个普通字符串,那么您首先必须将其解码为unicode:

>>> name
'Christensen Sk\xf6ld'
>>> unicode(name, 'latin-1')
u'Christensen Sk\xf6ld'

Another way of achieving this:

另一种实现方法是:

>>> name.decode('latin-1')
u'Christensen Sk\xf6ld'

Note the "u" in front of the string, signalling it is uncode. If you print this, the accented letter is shown properly:

注意字符串前面的“u”,表示它是uncode。如果你打印这个,重音字母显示正确:

>>> print name.decode('latin-1')
Christensen Sköld

BTW: when necessary, you can use de "encode" method to turn the unicode into e.g. a UTF-8 string:

在必要的时候,你可以使用“编码”方法将unicode转换成UTF-8字符串:

>>> name.decode('latin-1').encode('utf-8')
'Christensen Sk\xc3\xb6ld'

#2


8  

I suspect that it's acutally working correctly. By default, Python displays strings in ASCII encoding, since not all terminals support unicode. If you actually print the string, though, it should work. See the following example:

我怀疑它的工作方式是正确的。默认情况下,Python在ASCII编码中显示字符串,因为不是所有的终端都支持unicode。如果您实际打印了字符串,那么它应该可以工作。看下面的例子:

>>> u'\xcfa'
u'\xcfa'
>>> print u'\xcfa'
Ïa

#3


6  

Given a byte string with Unicode escapes b"\N{SNOWMAN}", b"\N{SNOWMAN}".decode('unicode-escape) will produce the expected Unicode string u'\u2603'.

给定一个字节字符串,Unicode转义b“\N{雪人}”,b“\N{SNOWMAN}”.decode(“Unicode -escape”)将产生预期的Unicode字符串u'\u2603'。

#1


28  

Assuming Python sees the name as a normal string, you'll first have to decode it to unicode:

假设Python将名称视为一个普通字符串,那么您首先必须将其解码为unicode:

>>> name
'Christensen Sk\xf6ld'
>>> unicode(name, 'latin-1')
u'Christensen Sk\xf6ld'

Another way of achieving this:

另一种实现方法是:

>>> name.decode('latin-1')
u'Christensen Sk\xf6ld'

Note the "u" in front of the string, signalling it is uncode. If you print this, the accented letter is shown properly:

注意字符串前面的“u”,表示它是uncode。如果你打印这个,重音字母显示正确:

>>> print name.decode('latin-1')
Christensen Sköld

BTW: when necessary, you can use de "encode" method to turn the unicode into e.g. a UTF-8 string:

在必要的时候,你可以使用“编码”方法将unicode转换成UTF-8字符串:

>>> name.decode('latin-1').encode('utf-8')
'Christensen Sk\xc3\xb6ld'

#2


8  

I suspect that it's acutally working correctly. By default, Python displays strings in ASCII encoding, since not all terminals support unicode. If you actually print the string, though, it should work. See the following example:

我怀疑它的工作方式是正确的。默认情况下,Python在ASCII编码中显示字符串,因为不是所有的终端都支持unicode。如果您实际打印了字符串,那么它应该可以工作。看下面的例子:

>>> u'\xcfa'
u'\xcfa'
>>> print u'\xcfa'
Ïa

#3


6  

Given a byte string with Unicode escapes b"\N{SNOWMAN}", b"\N{SNOWMAN}".decode('unicode-escape) will produce the expected Unicode string u'\u2603'.

给定一个字节字符串,Unicode转义b“\N{雪人}”,b“\N{SNOWMAN}”.decode(“Unicode -escape”)将产生预期的Unicode字符串u'\u2603'。