json。dump - UnicodeDecodeError: 'utf8' codec无法解码位置0中的字节0xbf:无效的开始字节。

I have a dictionary data where I have stored:

我有一个字典数据，我储存了:

key - ID of an event

事件的关键ID。
value - the name of this event, where value is a UTF-8 string

值——此事件的名称，其中值为UTF-8字符串。

Now, I want to write down this map into a json file. I tried with this:

现在，我想把这个映射写进一个json文件。我试着用这个:

with open('events_map.json', 'w') as out_file:
    json.dump(data, out_file, indent = 4)

but this gives me the error:

但这给了我一个错误:

UnicodeDecodeError: 'utf8' codec can't decode byte 0xbf in position 0: invalid start byte

UnicodeDecodeError: 'utf8' codec不能解码位置0中的字节0xbf:无效的开始字节。

Now, I also tried with:

现在，我也尝试了:

with io.open('events_map.json', 'w', encoding='utf-8') as out_file:
   out_file.write(unicode(json.dumps(data, encoding="utf-8")))

but this raises the same error:

但这也引发了同样的错误:

UnicodeDecodeError: 'utf8' codec can't decode byte 0xbf in position 0: invalid start byte

UnicodeDecodeError: 'utf8' codec不能解码位置0中的字节0xbf:无效的开始字节。

I also tried with:

我也试过用:

with io.open('events_map.json', 'w', encoding='utf-8') as out_file:
    out_file.write(unicode(json.dumps(data, encoding="utf-8", ensure_ascii=False)))

but this raises the error:

但这引起了错误:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xbf in position 3114: ordinal not in range(128)

UnicodeDecodeError: 'ascii' codec不能解码位置3114:序数不在范围(128)中的字节0xbf

Any suggestions about how can I solve this problem?

我如何解决这个问题?

EDIT: I believe this is the line that is causing me the problem:

编辑:我相信这是我的问题所在:

> data['142']
'\xbf/ANCT25'

EDIT 2: The data variable is read from a file. So, after reading it from a file:

编辑2:从文件中读取数据变量。所以，在从一个文件中读到它:

data_file_lines = io.open(file_name, 'r', encoding='utf8').readlines()

I then do:

然后我做:

with io.open('data/events_map.json', 'w', encoding='utf8') as json_file:
        json.dump(data, json_file, ensure_ascii=False)

Which gives me the error:

这给了我一个错误:

TypeError: must be unicode, not str

类型错误:必须是unicode，而不是str。

Then, I try to do this with the data dictionary:

然后，我试着用数据字典做这个:

for tuple in sorted_tuples (the `data` variable is initialized by a tuple):
    data[str(tuple[1])] = json.dumps(tuple[0], ensure_ascii=False, encoding='utf8')

which is, again, followed by:

也就是说，

with io.open('data/events_map.json', 'w', encoding='utf8') as json_file:
    json.dump(data, json_file, ensure_ascii=False)

but again, the same error:

但是同样的错误

TypeError: must be unicode, not str

I get the same error when I use the simple open function for reading from the file:

当我使用简单的开放函数来读取文件时，我得到了相同的错误:

data_file_lines = open(file_name, "r").readlines()

1 个解决方案

#1

The exception is caused by the contents of your data dictionary, at least one of the keys or values is not UTF-8 encoded.

异常是由数据字典的内容引起的，至少其中一个键或值不是UTF-8编码的。

You'll have to replace this value; either by substituting a value that is UTF-8 encoded, or by decoding it to a unicode object by decoding just that value with whatever encoding is the correct encoding for that value:

你必须替换这个值;要么用UTF-8编码的值替换，要么通过解码，将其解码为unicode对象，然后用任何编码对该值进行正确编码:

data['142'] = data['142'].decode('latin-1')

to decode that string as a Latin-1-encoded value instead.

将该字符串解码为一个latin -1编码的值。

#1