我应该如何将包含unicode字符的字符串转换为unicode？

I thought that I dominated all the Unicode stuff in Python 2, but it seems that there's something I don't understand. I have this user input from HTML that goes to my python script:

我认为我主宰了Python 2中的所有Unicode内容，但似乎有些东西我不明白。我有来自HTML的用户输入到我的python脚本：

a = "m\xe9dico"

I want this to be médico (that means doctor). So to convert that to unicode I'm doing:

我希望这是médico（这意味着医生）。所以要将其转换为unicode我正在做：

a.decode("utf-8")

Or:

要么：

unicode(a, "utf-8")

But this is throwing:

但这是投掷：

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 1: ordinal not in range(128)

How can achieve this?

怎么能实现这个？

2 个解决方案

#1

This is not utf-8:

这不是utf-8：

print txt.decode('iso8859-1')
Out[14]: médico

If you want utf-8 string, use:

如果你想要utf-8字符串，请使用：

txt.decode('iso8859-1').encode('utf-8')
Out[15]: 'm\xc3\xa9dico'

#2

You can prefix your string with a u to mark it as a unicode literal:

您可以在字符串前加上u来将其标记为unicode文字：

>>> a = u'm\xe9dico'
>>> print a
médico
>>> type(a)
<type 'unicode'>

or, to convert an existing string:

或者，转换现有字符串：

>>> a = 'm\xe9dico'
>>> type(a)
<type 'str'>
>>> new_a = unicode(a,'iso-8859-1')
>>> print new_a
médico
>>> type(new_a)
<type 'unicode'>
>>> new_a == u'm\xe9dico'
True

Further reading: Python docs - Unicode HOWTO.

进一步阅读：Python文档 - Unicode HOWTO。

#1