为什么Python 2的raw_input输出unicode字符串?

时间:2023-02-07 20:15:58

I tried the following on Codecademy's Python lesson

我在Codecademy的Python课程上尝试了以下内容

hobbies = []

# Add your code below!
for i in range(3):
    Hobby = str(raw_input("Enter a hobby:"))
    hobbies.append(Hobby)

print hobbies

With this, it works fine but if instead I try

有了它,它工作正常,但如果相反,我尝试

Hobby = raw_input("Enter a hobby:")

I get [u'Hobby1', u'Hobby2', u'Hobby3']. Where are the extra us coming from?

我得到[u'Hobby1',u'Hobby2',u'Hobby3']。额外的我们来自哪里?

4 个解决方案

#1


5  

The question's subject line might be a bit misleading: Python 2's raw_input() normally returns a byte string, NOT a Unicode string.

问题的主题可能有点误导:Python 2的raw_input()通常返回一个字节字符串,而不是Unicode字符串。

However, it could return a Unicode string if it or sys.stdin has been altered or replaced (by an application, or as part of an alternative implementation of Python).

但是,如果它或sys.stdin被更改或替换(由应用程序或作为Python的替代实现的一部分),它可以返回Unicode字符串。

Therefore, I believe @ByteCommander is on the right track with his comment:

因此,我相信@ByteCommander的评论是正确的:

Maybe this has something to do with the console it's running in?

The Python used by Codecademy is ostensibly 2.7, but (a) it was implemented by compiling the Python interpreter to JavaScript using Emscripten and (b) it's running in the browser; so between those factors, there could very well be some string encoding and decoding injected by Codecademy that isn't present in plain-vanilla CPython.

Codecademy使用的Python表面上是2.7,但是(a)它是通过使用Emscripten将Python解释器编译为JavaScript并且(b)它在浏览器中运行来实现的。因此,在这些因素之间,Codecademy注入的一些字符串编码和解码很可能在普通CPython中不存在。

Note: I have not used Codecademy myself nor do I have any inside knowledge of its inner workings.

注意:我自己没有使用过Codecademy,也没有内部工作的内部知识。

#2


4  

'u' means its a unicode. You can also specify raw_input().encode('utf8') to convert to string.

'你'意思是它的unicode。您还可以指定raw_input()。encode('utf8')转换为字符串。

Edited: I checked in python 2.7 it returns byte string not unicode string. So problem is something else here.

编辑:我在python 2.7中检查它返回字节字符串而不是unicode字符串。所以这里的问题就是其他问题。

Edited: raw_input() returns unicode if sys.stdin.encoding is unicode.

编辑:如果sys.stdin.encoding是unicode,raw_input()将返回unicode。

In codeacademy python environment, sys.stdin.encoding and sys.stdout.decoding both are none and default endcoding scheme is ascii.

在codeacademy python环境中,sys.stdin.encoding和sys.stdout.decoding都是none,默认的endcoding方案是ascii。

Python will use this default encoding only if it is unable to find proper encoding scheme from environment.

只有在无法从环境中找到正确的编码方案时,Python才会使用此默认编码。

#3


2  

Where are the extra us coming from?

额外的我们来自哪里?

  • raw_input() returns Unicode strings in your environment
  • raw_input()返回环境中的Unicode字符串
  • repr() is called for each item of a list if you print it (convert to string)
  • 如果打印它,则为列表的每个项调用repr()(转换为字符串)
  • the text representation (repr()) of a Unicode string is the same as Unicode literal in Python: u'abc'.
  • Unicode字符串的文本表示(repr())与Python中的Unicode文字相同:u'abc'。

that is why print [raw_input()] may produce: [u'abc'].

这就是print [raw_input()]可能产生的原因:[u'abc']。

You don't see u'' in the first code example because str(unicode_string) calls the equivalent of unicode_string.encode(sys.getdefaultencoding()) i.e., it converts Unicode strings to bytestrings—don't do it unless you mean it.

你没有在第一个代码示例中看到u''因为str(unicode_string)调用了unicode_string.encode(sys.getdefaultencoding())的等价物,即它将Unicode字符串转换为bytestrings - 除非你的意思是它,否则不要这样做。

Can raw_input() return unicode?

Yes:

是:

#!/usr/bin/env python2
"""Demonstrate that raw_input() can return Unicode."""
import sys

class UnicodeFile:
    def readline(self, n=-1):
        return u'\N{SNOWMAN}'

sys.stdin = UnicodeFile()
s = raw_input()
print type(s)
print s

Output:

输出:

<type 'unicode'>
☃

The practical example is win-unicode-console package which can replace raw_input() to support entering Unicode characters outside of the range of a console codepage on Windows. Related: here's why sys.stdout should be replaced.

实际示例是win-unicode-console软件包,它可以替换raw_input()以支持在Windows上的控制台代码页范围之外输入Unicode字符。相关:这就是为什么应该替换sys.stdout。

May raw_input() return unicode?

Yes.

是。

raw_input() is documented to return a string:

记录raw_input()以返回字符串:

The function then reads a line from input, converts it to a string (stripping a trailing newline), and returns that.

然后该函数从输入中读取一行,将其转换为字符串(剥离尾部换行符),然后返回该行。

String in Python 2 is either a bytestring or Unicode string :isinstance(s, basestring).

Python 2中的字符串是字节字符串或Unicode字符串:isinstance(s,basestring)。

CPython implementation of raw_input() supports Unicode strings explicitly: builtin_raw_input() can call PyFile_GetLine() and PyFile_GetLine() considers bytestrings and Unicode strings to be strings—it raises TypeError("object.readline() returned non-string") otherwise.

raw_input()的CPython实现显式支持Unicode字符串:builtin_raw_input()可以调用PyFile_GetLine(),PyFile_GetLine()将字节串和Unicode字符串视为字符串 - 否则引发TypeError(“object.readline()返回非字符串”)。

#4


1  

You could encode the strings before appending them to your list:

您可以在将字符串附加到列表之前对其进行编码:

hobbies = []

# Add your code below!
for i in range(3):
    Hobby = raw_input("Enter a hobby:")
    hobbies.append(Hobby.encode('utf-8')

print hobbies

#1


5  

The question's subject line might be a bit misleading: Python 2's raw_input() normally returns a byte string, NOT a Unicode string.

问题的主题可能有点误导:Python 2的raw_input()通常返回一个字节字符串,而不是Unicode字符串。

However, it could return a Unicode string if it or sys.stdin has been altered or replaced (by an application, or as part of an alternative implementation of Python).

但是,如果它或sys.stdin被更改或替换(由应用程序或作为Python的替代实现的一部分),它可以返回Unicode字符串。

Therefore, I believe @ByteCommander is on the right track with his comment:

因此,我相信@ByteCommander的评论是正确的:

Maybe this has something to do with the console it's running in?

The Python used by Codecademy is ostensibly 2.7, but (a) it was implemented by compiling the Python interpreter to JavaScript using Emscripten and (b) it's running in the browser; so between those factors, there could very well be some string encoding and decoding injected by Codecademy that isn't present in plain-vanilla CPython.

Codecademy使用的Python表面上是2.7,但是(a)它是通过使用Emscripten将Python解释器编译为JavaScript并且(b)它在浏览器中运行来实现的。因此,在这些因素之间,Codecademy注入的一些字符串编码和解码很可能在普通CPython中不存在。

Note: I have not used Codecademy myself nor do I have any inside knowledge of its inner workings.

注意:我自己没有使用过Codecademy,也没有内部工作的内部知识。

#2


4  

'u' means its a unicode. You can also specify raw_input().encode('utf8') to convert to string.

'你'意思是它的unicode。您还可以指定raw_input()。encode('utf8')转换为字符串。

Edited: I checked in python 2.7 it returns byte string not unicode string. So problem is something else here.

编辑:我在python 2.7中检查它返回字节字符串而不是unicode字符串。所以这里的问题就是其他问题。

Edited: raw_input() returns unicode if sys.stdin.encoding is unicode.

编辑:如果sys.stdin.encoding是unicode,raw_input()将返回unicode。

In codeacademy python environment, sys.stdin.encoding and sys.stdout.decoding both are none and default endcoding scheme is ascii.

在codeacademy python环境中,sys.stdin.encoding和sys.stdout.decoding都是none,默认的endcoding方案是ascii。

Python will use this default encoding only if it is unable to find proper encoding scheme from environment.

只有在无法从环境中找到正确的编码方案时,Python才会使用此默认编码。

#3


2  

Where are the extra us coming from?

额外的我们来自哪里?

  • raw_input() returns Unicode strings in your environment
  • raw_input()返回环境中的Unicode字符串
  • repr() is called for each item of a list if you print it (convert to string)
  • 如果打印它,则为列表的每个项调用repr()(转换为字符串)
  • the text representation (repr()) of a Unicode string is the same as Unicode literal in Python: u'abc'.
  • Unicode字符串的文本表示(repr())与Python中的Unicode文字相同:u'abc'。

that is why print [raw_input()] may produce: [u'abc'].

这就是print [raw_input()]可能产生的原因:[u'abc']。

You don't see u'' in the first code example because str(unicode_string) calls the equivalent of unicode_string.encode(sys.getdefaultencoding()) i.e., it converts Unicode strings to bytestrings—don't do it unless you mean it.

你没有在第一个代码示例中看到u''因为str(unicode_string)调用了unicode_string.encode(sys.getdefaultencoding())的等价物,即它将Unicode字符串转换为bytestrings - 除非你的意思是它,否则不要这样做。

Can raw_input() return unicode?

Yes:

是:

#!/usr/bin/env python2
"""Demonstrate that raw_input() can return Unicode."""
import sys

class UnicodeFile:
    def readline(self, n=-1):
        return u'\N{SNOWMAN}'

sys.stdin = UnicodeFile()
s = raw_input()
print type(s)
print s

Output:

输出:

<type 'unicode'>
☃

The practical example is win-unicode-console package which can replace raw_input() to support entering Unicode characters outside of the range of a console codepage on Windows. Related: here's why sys.stdout should be replaced.

实际示例是win-unicode-console软件包,它可以替换raw_input()以支持在Windows上的控制台代码页范围之外输入Unicode字符。相关:这就是为什么应该替换sys.stdout。

May raw_input() return unicode?

Yes.

是。

raw_input() is documented to return a string:

记录raw_input()以返回字符串:

The function then reads a line from input, converts it to a string (stripping a trailing newline), and returns that.

然后该函数从输入中读取一行,将其转换为字符串(剥离尾部换行符),然后返回该行。

String in Python 2 is either a bytestring or Unicode string :isinstance(s, basestring).

Python 2中的字符串是字节字符串或Unicode字符串:isinstance(s,basestring)。

CPython implementation of raw_input() supports Unicode strings explicitly: builtin_raw_input() can call PyFile_GetLine() and PyFile_GetLine() considers bytestrings and Unicode strings to be strings—it raises TypeError("object.readline() returned non-string") otherwise.

raw_input()的CPython实现显式支持Unicode字符串:builtin_raw_input()可以调用PyFile_GetLine(),PyFile_GetLine()将字节串和Unicode字符串视为字符串 - 否则引发TypeError(“object.readline()返回非字符串”)。

#4


1  

You could encode the strings before appending them to your list:

您可以在将字符串附加到列表之前对其进行编码:

hobbies = []

# Add your code below!
for i in range(3):
    Hobby = raw_input("Enter a hobby:")
    hobbies.append(Hobby.encode('utf-8')

print hobbies