从文件读取Python并保存到utf-8。

时间:2023-01-05 15:20:24

I'm having problems reading from a file, processing its string and saving to an UTF-8 File.

我在读取文件时遇到问题,处理它的字符串并保存到UTF-8文件。

Here is the code:

这是代码:

try:
    filehandle = open(filename,"r")
except:
    print("Could not open file " + filename)
    quit() 

text = filehandle.read()
filehandle.close()

I then do some processing on the variable text.

然后对变量文本进行一些处理。

And then

然后

try:
    writer = open(output,"w")
except:
    print("Could not open file " + output)
    quit() 

#data = text.decode("iso 8859-15")    
#writer.write(data.encode("UTF-8"))
writer.write(text)
writer.close()

This output the file perfectly but it does so in iso 8859-15 according to my editor. Since the same editor recognizes the input file (in the variable filename) as UTF-8 I don't know why this happened. As far as my reasearch has shown the commented lines should solve the problem. However when I use those lines the resulting file has gibberish in special character mainly, words with tilde as the text is in spanish. I would really appreciate any help as I am stumped....

这个文件完美地输出了文件,但在iso 8859-15中是这样做的。因为同一个编辑器识别输入文件(在变量文件名中)作为UTF-8,我不知道为什么会发生这种情况。就我的研究来看,注释行应该可以解决这个问题。然而,当我使用这些行时,结果文件主要是在特殊字符上胡言乱语,而文字是用西班牙语写成的。我会很感激任何帮助我难住了....

3 个解决方案

#1


117  

Process text to and from Unicode at the I/O boundaries of your program using the codecs module:

使用codecs模块在程序的I/O边界上对Unicode进行处理文本:

import codecs
with codecs.open(filename,'r',encoding='utf8') as f:
    text = f.read()
# process Unicode text
with codecs.open(filename,'w',encoding='utf8') as f:
    f.write(text)

Edit: The io module is now recommended instead of codecs and is compatible with Python 3's open syntax:

编辑:io模块现在被推荐,而不是codecs,并且与Python 3的开放语法兼容:

import io
with io.open(filename,'r',encoding='utf8') as f:
    text = f.read()
# process Unicode text
with io.open(filename,'w',encoding='utf8') as f:
    f.write(text)

#2


4  

You can't do that using open. use codecs.

你不能打开它。使用编解码器。

when you are opening a file in python using the open built-in function you will always read/write the file in ascii. To write it in utf-8 try this:

当您使用open内置函数在python中打开一个文件时,您将始终以ascii方式读取/写入文件。用utf-8来写:

import codecs
file = codecs.open('data.txt','w','utf-8')

#3


3  

You can also get through it by the code below:

你也可以通过下面的代码来完成:

file=open(completefilepath,'r',encoding='utf8',errors="ignore")
file.read()

#1


117  

Process text to and from Unicode at the I/O boundaries of your program using the codecs module:

使用codecs模块在程序的I/O边界上对Unicode进行处理文本:

import codecs
with codecs.open(filename,'r',encoding='utf8') as f:
    text = f.read()
# process Unicode text
with codecs.open(filename,'w',encoding='utf8') as f:
    f.write(text)

Edit: The io module is now recommended instead of codecs and is compatible with Python 3's open syntax:

编辑:io模块现在被推荐,而不是codecs,并且与Python 3的开放语法兼容:

import io
with io.open(filename,'r',encoding='utf8') as f:
    text = f.read()
# process Unicode text
with io.open(filename,'w',encoding='utf8') as f:
    f.write(text)

#2


4  

You can't do that using open. use codecs.

你不能打开它。使用编解码器。

when you are opening a file in python using the open built-in function you will always read/write the file in ascii. To write it in utf-8 try this:

当您使用open内置函数在python中打开一个文件时,您将始终以ascii方式读取/写入文件。用utf-8来写:

import codecs
file = codecs.open('data.txt','w','utf-8')

#3


3  

You can also get through it by the code below:

你也可以通过下面的代码来完成:

file=open(completefilepath,'r',encoding='utf8',errors="ignore")
file.read()