Python错误;UnicodeEncodeError:“ascii”编码解码器不能对字符u'\u2026进行编码。

时间:2023-01-06 13:26:13

I am trying to extract some data from a JSON file which contains tweets and write it to a csv. The file contains all kinds of characters, I'm guessing this is why i get this error message:

我试图从包含tweet并将其写入csv的JSON文件中提取一些数据。这个文件包含了所有类型的字符,我猜这就是为什么我得到这个错误信息的原因:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2026'

UnicodeEncodeError:“ascii”编码解码器不能对字符u'\u2026进行编码。

I guess I have to convert the output to utf-8 before writing the csv file, but I have not been able to do that. I have found similar questions here on *, but not I've not been able to adapt the solutions to my problem (I should add that I am not really familiar with python. I'm a social scientist, not a programmer)

我想在编写csv文件之前,我必须将输出转换为utf-8,但是我还没有做到这一点。我在*上也发现了类似的问题,但我并没有能够适应我的问题的解决方案(我应该补充一点,我不是很熟悉python)。我是社会科学家,不是程序员)

import csv
import json

fieldnames = ['id', 'text']

with open('MY_SOURCE_FILE', 'r') as f, open('MY_OUTPUT', 'a') as out:

    writer = csv.DictWriter(
                    out, fieldnames=fieldnames, delimiter=',', quoting=csv.QUOTE_ALL)

    for line in f:
        tweet = json.loads(line)
        user = tweet['user']
        output = {
            'text': tweet['text'],
            'id': tweet['id'],
        }
        writer.writerow(output)

1 个解决方案

#1


6  

You just need to encode the text to utf-8:

你只需要将文本编码为utf-8:

for line in f:
    tweet = json.loads(line)
    user = tweet['user']
    output = {
        'text': tweet['text'].encode("utf-8"),
        'id': tweet['id'],
    }
    writer.writerow(output)

The csv module does not support writing unicode in python2:

csv模块不支持在python2中编写unicode:

Note This version of the csv module doesn’t support Unicode input. Also, there are currently some issues regarding ASCII NUL characters. Accordingly, all input should be UTF-8 or printable ASCII to be safe; see the examples in section Examples.

注意,这个版本的csv模块不支持Unicode输入。此外,目前还有一些关于ASCII NUL字符的问题。因此,所有输入都应该是UTF-8或可打印的ASCII以保证安全;请参见小节示例中的示例。

#1


6  

You just need to encode the text to utf-8:

你只需要将文本编码为utf-8:

for line in f:
    tweet = json.loads(line)
    user = tweet['user']
    output = {
        'text': tweet['text'].encode("utf-8"),
        'id': tweet['id'],
    }
    writer.writerow(output)

The csv module does not support writing unicode in python2:

csv模块不支持在python2中编写unicode:

Note This version of the csv module doesn’t support Unicode input. Also, there are currently some issues regarding ASCII NUL characters. Accordingly, all input should be UTF-8 or printable ASCII to be safe; see the examples in section Examples.

注意,这个版本的csv模块不支持Unicode输入。此外,目前还有一些关于ASCII NUL字符的问题。因此,所有输入都应该是UTF-8或可打印的ASCII以保证安全;请参见小节示例中的示例。