使用xlrd读取包含中文和/或印地文字符的Excel xls文件

时间:2022-11-14 16:03:34

http://scienceoss.com/read-excel-files-from-python/comment-page-1/#comment-1051

http://scienceoss.com/read-excel-files-from-python/comment-page-1/#comment-1051

From the above link, I used this utility to read an XLS file. If the XLS file contains different language characters like Chinese or Hindi, it does not output them correctly. Is there a workaround for this?

从上面的链接,我使用此实用程序来读取XLS文件。如果XLS文件包含不同的语言字符(如中文或印地语),则不会正确输出它们。这有解决方法吗?

After Googling, I found this:

谷歌搜索后,我发现了这个:

import xlrd

def upload_xls(dir,file,request):
    try:
        global msg
        global row_num
        row_num = []
        header_arr = []
        global file_path
        file_path = dir
        #reader = csv.reader(open(file), delimiter='#', quotechar='"')
        book = xlrd.open_workbook('dodgy.xls',encoding='cp1252')   ##To specify UTF8-encoding
        wb.sheet_names()
        sh =  wb.sheet_by_index(0)
        valid_xl_format = 0
        invalid_xl_format = 0
     except:
        print "Error

But there is an error in the line book = open_workbook('dodgy.xls',encoding='cp1252'):

但是行书中的错误= open_workbook('dodgy.xls',encoding ='cp1252'):

TypeError: open_workbook() got an unexpected keyword argument 'encoding'

TypeError:open_workbook()得到一个意外的关键字参数'encoding'

4 个解决方案

#1


4  

According to the xlrd module documentation, the correct parameter is: encoding_override="cp1252" and not encoding="cp1252".

根据xlrd模块文档,正确的参数是:encoding_override =“cp1252”而不是encoding =“cp1252”。

From the way you are importing the xlrd module you should be calling the function as xlrd.open_workbook but in the example code you use the function directly, as if you had used "from xlrd import *".

从导入xlrd模块的方式来看,您应该将函数调用为xlrd.open_workbook,但在示例代码中您直接使用该函数,就像您使用了“from xlrd import *”一样。

#2


8  

[dis]claimer: I'm the author of xlrd.

[dis] claimer:我是xlrd的作者。

"""If the xls contains different language characters like chine or hindi.It does not output the exact wordings.Is there a work around for this.."""

msgstr“”“如果xls包含不同的语言字符,如chine或hindi。它不输出确切的措辞。是否有解决方法...”“

The encoding_override argument is (as explained in the documentation) used ONLY for OLD files (produced by Excels earlier than Excel 97 (that's the year 1997)) and only then when the internally-recorded "codepage" is missing or incorrect.

encoding_override参数是(仅在文档中说明)仅用于OLD文件(由早于Excel 97的Excels(即1997年)生成),并且只有在内部记录的“代码页”丢失或不正确时才会使用。

Note: Old file with Chinese characters: Overriding with 'cp1252' is guaranteed to raise an exception.

注意:带有中文字符的旧文件:覆盖'cp1252'可以保证引发异常。

Note: Old file with "Hindi" (Devanagari?) characters: very unlikely ... as far as I know there never was an officially-supported codepage for any of the ISCII scripts, and I haven't heard of any unofficial one. Any information on this topic and/or sample files would be very welcome.

注意:旧文件带有“印地语”(梵文?)字符:非常不可能......据我所知,从来没有任何官方支持的任何ISCII脚本的代码页,我也没有听说过任何非官方的脚本。有关此主题和/或示例文件的任何信息都将非常受欢迎。

Excel 97 and later versions record all text data in (effectively) UTF-16LE. The encoding_override is ignored if the file is a valid Excel-97-or-later file.

Excel 97及更高版本以(有效)UTF-16LE记录所有文本数据。如果文件是有效的Excel-97或更高版本文件,则忽略encoding_override。

Whatever the version of Excel that produced the file, (as documented) xlrd returns unicode strings. Your problems are much more likely to be related to how you are displaying or converting those unicode strings.

无论生成文件的Excel版本如何(如文档所述),xlrd都会返回unicode字符串。您的问题更可能与您显示或转换这些unicode字符串的方式有关。

For further assistance, edit your question to show examples of the actual output together with the "exact wording".

如需进一步的帮助,请编辑您的问题以显示实际输出的示例以及“准确的措辞”。

#3


2  

There is a csv module in the standard library, which handles unicode in Python 3.1.

标准库中有一个csv模块,它在Python 3.1中处理unicode。

Warning: in Python 2.x the csv library does not handle unicode.

警告:在Python 2.x中,csv库不处理unicode。

#4


0  

There is a similar question. The answer was the Output was causing issue, not XLRD.

有一个类似的问题。答案是输出导致问题,而不是XLRD。

Answer on how set your script to UTF-8 -> https://*.com/a/17628350/713

回答如何将脚本设置为UTF-8 - > https://*.com/a/17628350/713

#1


4  

According to the xlrd module documentation, the correct parameter is: encoding_override="cp1252" and not encoding="cp1252".

根据xlrd模块文档,正确的参数是:encoding_override =“cp1252”而不是encoding =“cp1252”。

From the way you are importing the xlrd module you should be calling the function as xlrd.open_workbook but in the example code you use the function directly, as if you had used "from xlrd import *".

从导入xlrd模块的方式来看,您应该将函数调用为xlrd.open_workbook,但在示例代码中您直接使用该函数,就像您使用了“from xlrd import *”一样。

#2


8  

[dis]claimer: I'm the author of xlrd.

[dis] claimer:我是xlrd的作者。

"""If the xls contains different language characters like chine or hindi.It does not output the exact wordings.Is there a work around for this.."""

msgstr“”“如果xls包含不同的语言字符,如chine或hindi。它不输出确切的措辞。是否有解决方法...”“

The encoding_override argument is (as explained in the documentation) used ONLY for OLD files (produced by Excels earlier than Excel 97 (that's the year 1997)) and only then when the internally-recorded "codepage" is missing or incorrect.

encoding_override参数是(仅在文档中说明)仅用于OLD文件(由早于Excel 97的Excels(即1997年)生成),并且只有在内部记录的“代码页”丢失或不正确时才会使用。

Note: Old file with Chinese characters: Overriding with 'cp1252' is guaranteed to raise an exception.

注意:带有中文字符的旧文件:覆盖'cp1252'可以保证引发异常。

Note: Old file with "Hindi" (Devanagari?) characters: very unlikely ... as far as I know there never was an officially-supported codepage for any of the ISCII scripts, and I haven't heard of any unofficial one. Any information on this topic and/or sample files would be very welcome.

注意:旧文件带有“印地语”(梵文?)字符:非常不可能......据我所知,从来没有任何官方支持的任何ISCII脚本的代码页,我也没有听说过任何非官方的脚本。有关此主题和/或示例文件的任何信息都将非常受欢迎。

Excel 97 and later versions record all text data in (effectively) UTF-16LE. The encoding_override is ignored if the file is a valid Excel-97-or-later file.

Excel 97及更高版本以(有效)UTF-16LE记录所有文本数据。如果文件是有效的Excel-97或更高版本文件,则忽略encoding_override。

Whatever the version of Excel that produced the file, (as documented) xlrd returns unicode strings. Your problems are much more likely to be related to how you are displaying or converting those unicode strings.

无论生成文件的Excel版本如何(如文档所述),xlrd都会返回unicode字符串。您的问题更可能与您显示或转换这些unicode字符串的方式有关。

For further assistance, edit your question to show examples of the actual output together with the "exact wording".

如需进一步的帮助,请编辑您的问题以显示实际输出的示例以及“准确的措辞”。

#3


2  

There is a csv module in the standard library, which handles unicode in Python 3.1.

标准库中有一个csv模块,它在Python 3.1中处理unicode。

Warning: in Python 2.x the csv library does not handle unicode.

警告:在Python 2.x中,csv库不处理unicode。

#4


0  

There is a similar question. The answer was the Output was causing issue, not XLRD.

有一个类似的问题。答案是输出导致问题,而不是XLRD。

Answer on how set your script to UTF-8 -> https://*.com/a/17628350/713

回答如何将脚本设置为UTF-8 - > https://*.com/a/17628350/713