如何使用python将具有unicode的字符串转换为unicode

时间:2022-03-08 22:33:40

I am importing a bunch of data from excel using xlrd on python

我正在使用python上的xlrd从excel导入大量数据

I get all my data in string like this : text:u'L\xc9GENDE'

我将所有的数据以字符串形式显示:u' l \xc9GENDE'

I manipulate these data and I try to put them back in excel (using xlsxwriter) and when I do, I get the same block of text text:u'L\xc9GENDE' instead of LÉGENDE.

我操作这些数据,并试图将它们放回excel(使用xlsxwriter),当我操作这些数据时,我得到了相同的文本块:u' l \xc9GENDE'而不是LEGENDE。

What works for me :

什么对我有效:

#!/usr/bin/env python
# -*- coding: latin-1 -*-
import xlsxwriter
import sys

workbook = xlsxwriter.Workbook('hello.xlsx')
worksheet = workbook.add_worksheet()
data = u'L\xc9GENDE'
worksheet.write('A1',data)
workbook.close()

this will work, I will get LÉGENDE in the A1 cell

这将会奏效,我会在A1细胞中获得勒让德

but if I try to manipulate a string I have already to give me u'L\xc9GENDE', it will only show L\xc9GENDE in the A1 cell

但如果我试图操纵一个我已经给了我u' \xc9GENDE'的弦,它只会在A1细胞中显示L\xc9GENDE

---- EDIT ---- the code I use to retrieve data from excel

——编辑——我用来从excel中检索数据的代码

from xlrd import open_workbook

def grabexcelfile():
    wb = open_workbook('leg.xls',encoding_override='latin-1')    
    log = []
    txt = ''
    for s in wb.sheets():         
        for row in range(s.nrows):              
            values = []
            for col in range(s.ncols):
                 txt = str(s.cell(row,col))
                 txt.replace('-',' ',10) 
                 log.append(txt) 
    return log            

x = grabexcelfile()
print type(x[0]),x[0]

the print gives me : text:u'L\xc9GENDE'

打印出来的文字:u' l \xc9GENDE'

2 个解决方案

#1


0  

Try this.

试试这个。

import unicodedata
data = u'L\xc9GENDE'
unicodedata.normalize('NFKD',data).encode('ascii','ignore')

You can refer here for more -> Convert a Unicode string to a string in Python (containing extra symbols)

您可以在这里查阅更多信息——>在Python中将Unicode字符串转换为字符串(包含额外的符号)

#2


0  

Instead of trying to manipulate the text:u'L\xc9GENDE' , I instead changed the type of var excel was given me with this :

我没有试图操纵文本u' l \xc9GENDE',而是改变了var excel的类型:

from xlrd import open_workbook

def grabexcelfile():
    wb = open_workbook('leg.xls',encoding_override='latin-1')    
    log = []
    txt = ''
    for s in wb.sheets():         
        for row in range(s.nrows):              
            values = []
            for col in range(s.ncols):
                 #next line is changed
                 txt = sheet.cell(row,col).value
                 txt.replace('-',' ',10) 
                 log.append(txt) 
    return log            

x = grabexcelfile()
print type(x[0]),x[0]

#1


0  

Try this.

试试这个。

import unicodedata
data = u'L\xc9GENDE'
unicodedata.normalize('NFKD',data).encode('ascii','ignore')

You can refer here for more -> Convert a Unicode string to a string in Python (containing extra symbols)

您可以在这里查阅更多信息——>在Python中将Unicode字符串转换为字符串(包含额外的符号)

#2


0  

Instead of trying to manipulate the text:u'L\xc9GENDE' , I instead changed the type of var excel was given me with this :

我没有试图操纵文本u' l \xc9GENDE',而是改变了var excel的类型:

from xlrd import open_workbook

def grabexcelfile():
    wb = open_workbook('leg.xls',encoding_override='latin-1')    
    log = []
    txt = ''
    for s in wb.sheets():         
        for row in range(s.nrows):              
            values = []
            for col in range(s.ncols):
                 #next line is changed
                 txt = sheet.cell(row,col).value
                 txt.replace('-',' ',10) 
                 log.append(txt) 
    return log            

x = grabexcelfile()
print type(x[0]),x[0]