将json.dumps中的utf-8文本保存为UTF8，而不是\ u转义序列

sample code:

>>> import json>>> json_string = json.dumps("ברי צקלה")>>> print json_string"\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4"

The problem: it's not human readable. My (smart) users want to verify or even edit text files with JSON dumps. (and i'd rather not use XML)

问题是:它不是人类可读的。我(智能)用户希望使用JSON转储验证甚至编辑文本文件。 (我宁愿不使用XML)

Is there a way to serialize objects into utf-8 json string (instead of \uXXXX ) ?

有没有办法将对象序列化为utf-8 json字符串(而不是\ uXXXX)?

this doesn't help:

这没有帮助:

>>> output = json_string.decode('string-escape')"\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4"

this works, but if any sub-objects is a python-unicode and not utf-8, it'll dump garbage:

这工作,但如果任何子对象是python-unicode而不是utf-8,它将转储垃圾:

>>> #### ok:>>> s= json.dumps( "ברי צקלה", ensure_ascii=False)    >>> print json.loads(s)   ברי צקלה>>> #### NOT ok:>>> d={ 1: "ברי צקלה", 2: u"ברי צקלה" }>>> print d{1: '\xd7\x91\xd7\xa8\xd7\x99 \xd7\xa6\xd7\xa7\xd7\x9c\xd7\x94',  2: u'\xd7\x91\xd7\xa8\xd7\x99 \xd7\xa6\xd7\xa7\xd7\x9c\xd7\x94'}>>> s = json.dumps( d, ensure_ascii=False, encoding='utf8')>>> print json.loads(s)['1']ברי צקלה>>> print json.loads(s)['2']××¨× ×¦×§××

8 个解决方案

#1

377

Use the ensure_ascii=False switch to json.dumps(), then encode the value to UTF-8 manually:

使用ensure_ascii = False切换到json.dumps(),然后手动将值编码为UTF-8:

>>> json_string = json.dumps(u"ברי צקלה", ensure_ascii=False).encode('utf8')>>> json_string'"\xd7\x91\xd7\xa8\xd7\x99 \xd7\xa6\xd7\xa7\xd7\x9c\xd7\x94"'>>> print json_string"ברי צקלה"

If you are writing this to a file, you can use io.open() instead of open() to produce a file object that encodes Unicode values for you as you write, then use json.dump() instead to write to that file:

如果要将其写入文件,则可以使用io.open()而不是open()来生成在编写时为您编码Unicode值的文件对象,然后使用json.dump()代替写入该文件:

with io.open('filename', 'w', encoding='utf8') as json_file:    json.dump(u"ברי צקלה", json_file, ensure_ascii=False)

In Python 3, the built-in open() is an alias for io.open(). Do note that there is a bug in the json module where the ensure_ascii=False flag can produce a mix of unicode and str objects. The workaround for Python 2 then is:

在Python 3中,内置的open()是io.open()的别名。请注意,json模块中存在一个错误,其中ensure_ascii = False标志可以生成unicode和str对象的混合。 Python 2的解决方法是:

with io.open('filename', 'w', encoding='utf8') as json_file:    data = json.dumps(u"ברי צקלה", ensure_ascii=False)    # unicode(data) auto-decodes data to unicode if str    json_file.write(unicode(data))

If you are passing in byte strings (type str in Python 2, bytes in Python 3) encoded to UTF-8, make sure to also set the encoding keyword:

如果要传入编码为UTF-8的字节字符串(在Python 2中键入str,在Python 3中键入字节),请确保还设置encoding关键字:

>>> d={ 1: "ברי צקלה", 2: u"ברי צקלה" }>>> d{1: '\xd7\x91\xd7\xa8\xd7\x99 \xd7\xa6\xd7\xa7\xd7\x9c\xd7\x94', 2: u'\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4'}>>> s=json.dumps(d, ensure_ascii=False, encoding='utf8')>>> su'{"1": "\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4", "2": "\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4"}'>>> json.loads(s)['1']u'\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4'>>> json.loads(s)['2']u'\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4'>>> print json.loads(s)['1']ברי צקלה>>> print json.loads(s)['2']ברי צקלה

Note that your second sample is not valid Unicode; you gave it UTF-8 bytes as a unicode literal, that would never work:

请注意,您的第二个示例不是有效的Unicode;你给它UTF-8字节作为unicode文字,这将永远不会工作:

>>> s = u'\xd7\x91\xd7\xa8\xd7\x99 \xd7\xa6\xd7\xa7\xd7\x9c\xd7\x94'>>> print s××¨× ×¦×§××>>> print s.encode('latin1').decode('utf8')ברי צקלה

Only when I encoded that string to Latin 1 (whose unicode codepoints map one-to-one to bytes) then decode as UTF-8 do you see the expected output. That has nothing to do with JSON and everything to do with that you use the wrong input. The result is called a Mojibake.

只有当我将该字符串编码为Latin 1(其unicode代码点一对一映射到字节)然后解码为UTF-8时,您是否看到了预期的输出。这与JSON无关,而且与使用错误输入的一切有关。结果被称为Mojibake。

If you got that Unicode value from a string literal, it was decoded using the wrong codec. It could be your terminal is mis-configured, or that your text editor saved your source code using a different codec than what you told Python to read the file with. Or you sourced it from a library that applied the wrong codec. This all has nothing to do with the JSON library.

如果从字符串文字中获取Unicode值,则使用错误的编解码器对其进行解码。可能是您的终端配置错误,或者您的文本编辑器使用与您告诉Python读取文件不同的编解码器保存了源代码。或者您从应用了错误编解码器的库中获取它。这一切都与JSON库无关。

#2

easy like a cake

容易像蛋糕

To write to a file

写入文件

import codecsimport jsonwith codecs.open('your_file.txt', 'w', encoding='utf-8') as f:    json.dump({"message":"xin chào việt nam"}, f, ensure_ascii=False)

To print to stdin

打印到stdin

import codecsimport jsonprint(json.dumps({"message":"xin chào việt nam"}, ensure_ascii=False))

#3

UPDATE: This is wrong answer, but it's still useful to understand why it's wrong. See comments.

更新:这是错误的答案,但理解为什么它是错的仍然是有用的。看评论。

How about unicode-escape?

unicode逃脱怎么样?

>>> d = {1: "ברי צקלה", 2: u"ברי צקלה"}>>> json_str = json.dumps(d).decode('unicode-escape').encode('utf8')>>> print json_str{"1": "ברי צקלה", "2": "ברי צקלה"}

#4

Peters' python 2 workaround fails on an edge case:

Peters的python 2解决方案在边缘情况下失败:

d = {u'keyword': u'bad credit  \xe7redit cards'}with io.open('filename', 'w', encoding='utf8') as json_file:    data = json.dumps(d, ensure_ascii=False).decode('utf8')    try:        json_file.write(data)    except TypeError:        # Decode data to Unicode first        json_file.write(data.decode('utf8'))UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' in position 25: ordinal not in range(128)

It was crashing on the .decode('utf8') part of line 3. I fixed the problem by making the program much simpler by avoiding that step as well as the special casing of ascii:

它崩溃在第3行的.decode('utf8')部分。我通过避免该步骤以及ascii的特殊外壳使程序更简单来解决问题:

with io.open('filename', 'w', encoding='utf8') as json_file:  data = json.dumps(d, ensure_ascii=False, encoding='utf8')  json_file.write(unicode(data))cat filename{"keyword": "bad credit  çredit cards"}

#5

The following is my understanding var reading answer above and google.

以下是我的理解var阅读上面的答案和谷歌。

# coding:utf-8r"""@update: 2017-01-09 14:44:39@explain: str, unicode, bytes in python2to3    #python2 UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 7: ordinal not in range(128)    #1.reload    #importlib,sys    #importlib.reload(sys)    #sys.setdefaultencoding('utf-8') #python3 don't have this attribute.    #not suggest even in python2 #see:http://*.com/questions/3828723/why-should-we-not-use-sys-setdefaultencodingutf-8-in-a-py-script    #2.overwrite /usr/lib/python2.7/sitecustomize.py or (sitecustomize.py and PYTHONPATH=".:$PYTHONPATH" python)    #too complex    #3.control by your own (best)    #==> all string must be unicode like python3 (u'xx'|b'xx'.encode('utf-8')) (unicode 's disappeared in python3)    #see: http://blog.ernest.me/post/python-setdefaultencoding-unicode-bytes    #how to Saving utf-8 texts in json.dumps as UTF8, not as \u escape sequence    #http://*.com/questions/18337407/saving-utf-8-texts-in-json-dumps-as-utf8-not-as-u-escape-sequence"""from __future__ import print_functionimport jsona = {"b": u"中文"}  # add u for python2 compatibilityprint('%r' % a)print('%r' % json.dumps(a))print('%r' % (json.dumps(a).encode('utf8')))a = {"b": u"中文"}print('%r' % json.dumps(a, ensure_ascii=False))print('%r' % (json.dumps(a, ensure_ascii=False).encode('utf8')))# print(a.encode('utf8')) #AttributeError: 'dict' object has no attribute 'encode'print('')# python2:bytes=str; python3:bytesb = a['b'].encode('utf-8')print('%r' % b)print('%r' % b.decode("utf-8"))print('')# python2:unicode; python3:str=unicodec = b.decode('utf-8')print('%r' % c)print('%r' % c.encode('utf-8'))"""#python2{'b': u'\u4e2d\u6587'}'{"b": "\\u4e2d\\u6587"}''{"b": "\\u4e2d\\u6587"}'u'{"b": "\u4e2d\u6587"}''{"b": "\xe4\xb8\xad\xe6\x96\x87"}''\xe4\xb8\xad\xe6\x96\x87'u'\u4e2d\u6587'u'\u4e2d\u6587''\xe4\xb8\xad\xe6\x96\x87'#python3{'b': '中文'}'{"b": "\\u4e2d\\u6587"}'b'{"b": "\\u4e2d\\u6587"}''{"b": "中文"}'b'{"b": "\xe4\xb8\xad\xe6\x96\x87"}'b'\xe4\xb8\xad\xe6\x96\x87''中文''中文'b'\xe4\xb8\xad\xe6\x96\x87'"""

#6

Here's my solution using json.dump():

这是我使用json.dump()的解决方案:

def jsonWrite(p, pyobj, ensure_ascii=False, encoding=SYSTEM_ENCODING, **kwargs):    with codecs.open(p, 'wb', 'utf_8') as fileobj:        json.dump(pyobj, fileobj, ensure_ascii=ensure_ascii,encoding=encoding, **kwargs)

where SYSTEM_ENCODING is set to:

其中SYSTEM_ENCODING设置为:

locale.setlocale(locale.LC_ALL, '')SYSTEM_ENCODING = locale.getlocale()[1]

#7

Use codecs if possible,

尽可能使用编解码器,

with codecs.open('file_path', 'a+', 'utf-8') as fp:    fp.write(json.dumps(res, ensure_ascii=False))

#8

-3

Using ensure_ascii=False in json.dumps is the right direction to solve this problem, as pointed out by Martijn. However, this may raise an exception:

正如Martijn所指出的,在json.dumps中使用ensure_ascii = False是解决这个问题的正确方向。但是,这可能会引发异常:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position 1: ordinal not in range(128)

You need extra settings in either site.py or sitecustomize.py to set your sys.getdefaultencoding() correct. site.py is under lib/python2.7/ and sitecustomize.py is under lib/python2.7/site-packages.

您需要在site.py或sitecustomize.py中进行额外设置才能将sys.getdefaultencoding()设置为正确。 site.py位于lib / python2.7 /下,sitecustomize.py位于lib / python2.7 / site-packages下。

If you want to use site.py, under def setencoding(): change the first if 0: to if 1: so that python will use your operation system's locale.

如果你想使用site.py,在def setencoding()下:将第一个if 0:更改为1:如果python将使用你的操作系统的语言环境。

If you prefer to use sitecustomize.py, which may not exist if you haven't created it. simply put these lines:

如果您更喜欢使用sitecustomize.py,如果您尚未创建它,则可能不存在。简单地说这些行:

import sysreload(sys)sys.setdefaultencoding('utf-8')

Then you can do some Chinese json output in utf-8 format, such as:

然后你可以用utf-8格式做一些中文json输出,例如:

name = {"last_name": u"王"}json.dumps(name, ensure_ascii=False)

You will get an utf-8 encoded string, rather than \u escaped json string.

您将获得一个utf-8编码的字符串,而不是\ u转义的json字符串。

To verify your default encoding:

要验证您的默认编码:

print sys.getdefaultencoding()

You should get "utf-8" or "UTF-8" to verify your site.py or sitecustomize.py settings.

你应该得到“utf-8”或“UTF-8”来验证你的site.py或sitecustomize.py设置。

Please note that you could not do sys.setdefaultencoding("utf-8") at interactive python console.

请注意,您无法在交互式python控制台上执行sys.setdefaultencoding(“utf-8”)。

#1

377