Python:使用UnicodeWriter编写Unicode到CSV。

时间:2022-11-11 20:21:36

Python Documents have following code example on writing unicode to csv file. I think it has mentioned there that this is the way to do since csv module can't handle unicode strings.

Python文档有以下代码示例,用于将unicode写入csv文件。我想它已经提到了,这是一种方法,因为csv模块不能处理unicode字符串。

class UnicodeWriter:
    """
    A CSV writer which will write rows to CSV file "f",
    which is encoded in the given encoding.
    """

    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
        # Redirect output to a queue
        self.queue = cStringIO.StringIO()
        self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
        self.stream = f
        self.encoder = codecs.getincrementalencoder(encoding)()

    def writerow(self, row):
        self.writer.writerow([s.encode("utf-8") for s in row])
        # Fetch UTF-8 output from the queue ...
        data = self.queue.getvalue()
        data = data.decode("utf-8")
        # ... and reencode it into the target encoding
        data = self.encoder.encode(data)
        # write to the target stream
        self.stream.write(data)
        # empty queue
        self.queue.truncate(0)

    def writerows(self, rows):
        for row in rows:
            self.writerow(row)

I am writing more than one file and to keep it simple I have only put the section of my code to demonstrate how I use above class in my code:

我写了不止一个文件,为了简单起见,我只把代码的一部分放在代码中以演示如何使用上面的类:

def write(self):
    """
    Outputs the dataset to a csv.
    """
    f = codecs.open(self.filename, 'a')
    writer = UnicodeWriter(f)
    #with open(self.filename, 'a', encoding='utf-8') as f:
    if self.headers and not self.written:
        writer.writerow(self.headers)
        self.written = True
    for record in self.records[self.last_written:]:
        print record
        writer.writerow(record)
    self.last_written = len(self.records)
    f.close()

This is a method inside a class coll dataset which prepare the dataset prior to writing to csv, previously I was using writer = csv.writer(f) but due to codec errors I change my code to use `UnicodeWriter class.

这是一个类coll数据集的方法,它在写入csv之前准备数据集,之前我使用的是writer = csv.writer(f),但由于codec错误,我修改了代码以使用“UnicodeWriter类”。

But my problem is that when I open the csv file, I get the following:

但我的问题是,当我打开csv文件时,我得到如下:

some_header
B,r,ë,k,ò,w,n,i,k,_,b,s
B,r,ë,k,ò,w,n,i,k,_,c,s
B,r,ë,k,ò,w,n,i,k,_,c,s,b
B,r,ë,k,ò,w,n,i,k,_,d,e
B,r,ë,k,ò,w,n,i,k,_,d,e,-,1
B,r,ë,k,ò,w,n,i,k,_,d,e,-,2
B,r,ë,k,ò,w,n,i,k,_,d,e,-,3
B,r,ë,k,ò,w,n,i,k,_,d,e,-,4
B,r,ë,k,ò,w,n,i,k,_,d,e,-,5
B,r,ë,k,ò,w,n,i,k,_,d,e,-,M
B,r,ë,k,ò,w,n,i,k,_,e,n
B,r,ë,k,ò,w,n,i,k,_,e,n,-,1
B,r,ë,k,ò,w,n,i,k,_,e,n,-,2

Where as these rows should actually should be something like Brëkòwnik_de-1 I am not really whats happening.

当这些行应该是像Brekownik_de-1的时候,我并不是真的在发生什么。

To give a basic idea of how the data has been generated I would add the following line: title = unicode(row_page_title['page_title'], 'utf-8')

为了提供关于数据如何生成的基本概念,我将添加以下行:title = unicode(row_page_title['page_title'], 'utf-8')

1 个解决方案

#1


4  

This symptom points to something like feeding a string into a function/method that is expecting a list or tuple.

这一症状指的是将字符串输入到期望列表或元组的函数/方法中。

The writerows method is expecting a list of lists, and writerow expects a list (or tuple) containing the field values. Since you are feeding it a string, and a string can mimic a list of characters when you iterate over it, you get a CSV with one character in each column.

writerows方法期待一个列表列表,writerow期望一个包含字段值的列表(或tuple)。因为您正在给它提供一个字符串,并且当您在它上面迭代时,一个字符串可以模拟一个字符列表,您将得到一个在每个列中有一个字符的CSV。

If your CSV has just one column, you should use writer.writerow([data]) instead of writer.writerow(data). Some may question if you really need the csv module if you have only one column, but the csv module will handle things like a record containing funny stuff (CR/LF and others), so yes, it is a good idea.

如果您的CSV只有一个列,那么您应该使用writer.writerow([data])而不是写入。writerow(数据)。有些人可能会问,如果您只有一个列,那么您是否真的需要csv模块,但是csv模块将处理类似于包含有趣内容的记录(CR/LF等),所以是的,这是一个好主意。

#1


4  

This symptom points to something like feeding a string into a function/method that is expecting a list or tuple.

这一症状指的是将字符串输入到期望列表或元组的函数/方法中。

The writerows method is expecting a list of lists, and writerow expects a list (or tuple) containing the field values. Since you are feeding it a string, and a string can mimic a list of characters when you iterate over it, you get a CSV with one character in each column.

writerows方法期待一个列表列表,writerow期望一个包含字段值的列表(或tuple)。因为您正在给它提供一个字符串,并且当您在它上面迭代时,一个字符串可以模拟一个字符列表,您将得到一个在每个列中有一个字符的CSV。

If your CSV has just one column, you should use writer.writerow([data]) instead of writer.writerow(data). Some may question if you really need the csv module if you have only one column, but the csv module will handle things like a record containing funny stuff (CR/LF and others), so yes, it is a good idea.

如果您的CSV只有一个列,那么您应该使用writer.writerow([data])而不是写入。writerow(数据)。有些人可能会问,如果您只有一个列,那么您是否真的需要csv模块,但是csv模块将处理类似于包含有趣内容的记录(CR/LF等),所以是的,这是一个好主意。