使用Python读取/解析Excel(xls)文件

时间:2023-01-15 11:22:37

What is the best way to read Excel (XLS) files with Python (not CSV files).

使用Python(而非CSV文件)读取Excel(XLS)文件的最佳方法是什么?

Is there a built-in package which is supported by default in Python to do this task?

是否有一个内置包,默认情况下在Python中支持执行此任务?

8 个解决方案

#1


72  

I highly recommend xlrd for reading .xls files.

我强烈建议使用xlrd来读取.xls文件。

voyager mentioned the use of COM automation. Having done this myself a few years ago, be warned that doing this is a real PITA. The number of caveats is huge and the documentation is lacking and annoying. I ran into many weird bugs and gotchas, some of which took many hours to figure out.

旅行者提到了COM自动化的使用。几年前我自己做了这个,请注意这样做是真正的PITA。警告的数量巨大,文档缺乏和烦人。我遇到了许多奇怪的错误和陷阱,其中一些需要花费很多时间来弄明白。

UPDATE: For newer .xlsx files, the recommended library for reading and writing appears to be openpyxl.

更新:对于较新的.xlsx文件,推荐的读写库似乎是openpyxl。

#2


21  

Using pandas:

使用熊猫:

import pandas as pd

xls = pd.ExcelFile("yourfilename.xls")

sheetX = xls.parse(2) #2 is the sheet number

var1 = sheetX['ColumnName']

print(var1[1]) #1 is the row number...

#3


12  

python xlrd library can better solution for this problem

python xlrd库可以更好地解决这个问题

import xlrd

to open a workbook

打开一本工作簿

workbook = xlrd.open_workbook('your_file_name.xlsx')

open sheet by name

按名称打开表格

worksheet = workbook.sheet_by_name('Name of the Sheet')

open sheet by index

按索引打开表格

worksheet = workbook.sheet_by_index(0)

read cell value

读取单元格值

worksheet.cell(0, 0).value    

#4


1  

You can use any of the libraries listed here (like Pyxlreader that is based on JExcelApi, or xlwt), plus COM automation to use Excel itself for the reading of the files, but for that you are introducing Office as a dependency of your software, which might not be always an option.

您可以使用此处列出的任何库(如基于JExcelApi或xlwt的Pyxlreader),以及使用Excel自身来读取文件的COM自动化,但为此您将Office作为软件的依赖项,这可能不总是一个选择。

#5


1  

You might also consider running the (non-python) program xls2csv. Feed it an xls file, and you should get back a csv.

您也可以考虑运行(非python)程序xls2csv。给它一个xls文件,你应该回来一个csv。

#6


0  

For older Excel files there is the OleFileIO_PL module that can read the OLE structured storage format used.

对于较旧的Excel文件,OleFileIO_PL模块可以读取使用的OLE结构化存储格式。

#7


0  

Python Excelerator handles this task as well. http://ghantoos.org/2007/10/25/python-pyexcelerator-small-howto/

Python Excelerator也处理此任务。 http://ghantoos.org/2007/10/25/python-pyexcelerator-small-howto/

It's also available in Debian and Ubuntu:

它也可以在Debian和Ubuntu中使用:

 sudo apt-get install python-excelerator

#8


0  

I think Pandas is the best way to go. There is already one answer here with Pandas using ExcelFile function, but it did not work properly for me. From here I found the read_excel function which works just fine:

我认为熊猫是最好的方式。 Pandas使用ExcelFile函数已经有了一个答案,但它对我来说无法正常工作。从这里我发现read_excel函数可以正常工作:

import pandas as pd
dfs = pd.read_excel("your_file_name.xlsx", sheet_name="your_sheet_name")
print(dfs.head(10))

P.S. You need to have the xlrd installed for read_excel function to work

附:您需要安装xlrd才能使read_excel函数正常工作

#1


72  

I highly recommend xlrd for reading .xls files.

我强烈建议使用xlrd来读取.xls文件。

voyager mentioned the use of COM automation. Having done this myself a few years ago, be warned that doing this is a real PITA. The number of caveats is huge and the documentation is lacking and annoying. I ran into many weird bugs and gotchas, some of which took many hours to figure out.

旅行者提到了COM自动化的使用。几年前我自己做了这个,请注意这样做是真正的PITA。警告的数量巨大,文档缺乏和烦人。我遇到了许多奇怪的错误和陷阱,其中一些需要花费很多时间来弄明白。

UPDATE: For newer .xlsx files, the recommended library for reading and writing appears to be openpyxl.

更新:对于较新的.xlsx文件,推荐的读写库似乎是openpyxl。

#2


21  

Using pandas:

使用熊猫:

import pandas as pd

xls = pd.ExcelFile("yourfilename.xls")

sheetX = xls.parse(2) #2 is the sheet number

var1 = sheetX['ColumnName']

print(var1[1]) #1 is the row number...

#3


12  

python xlrd library can better solution for this problem

python xlrd库可以更好地解决这个问题

import xlrd

to open a workbook

打开一本工作簿

workbook = xlrd.open_workbook('your_file_name.xlsx')

open sheet by name

按名称打开表格

worksheet = workbook.sheet_by_name('Name of the Sheet')

open sheet by index

按索引打开表格

worksheet = workbook.sheet_by_index(0)

read cell value

读取单元格值

worksheet.cell(0, 0).value    

#4


1  

You can use any of the libraries listed here (like Pyxlreader that is based on JExcelApi, or xlwt), plus COM automation to use Excel itself for the reading of the files, but for that you are introducing Office as a dependency of your software, which might not be always an option.

您可以使用此处列出的任何库(如基于JExcelApi或xlwt的Pyxlreader),以及使用Excel自身来读取文件的COM自动化,但为此您将Office作为软件的依赖项,这可能不总是一个选择。

#5


1  

You might also consider running the (non-python) program xls2csv. Feed it an xls file, and you should get back a csv.

您也可以考虑运行(非python)程序xls2csv。给它一个xls文件,你应该回来一个csv。

#6


0  

For older Excel files there is the OleFileIO_PL module that can read the OLE structured storage format used.

对于较旧的Excel文件,OleFileIO_PL模块可以读取使用的OLE结构化存储格式。

#7


0  

Python Excelerator handles this task as well. http://ghantoos.org/2007/10/25/python-pyexcelerator-small-howto/

Python Excelerator也处理此任务。 http://ghantoos.org/2007/10/25/python-pyexcelerator-small-howto/

It's also available in Debian and Ubuntu:

它也可以在Debian和Ubuntu中使用:

 sudo apt-get install python-excelerator

#8


0  

I think Pandas is the best way to go. There is already one answer here with Pandas using ExcelFile function, but it did not work properly for me. From here I found the read_excel function which works just fine:

我认为熊猫是最好的方式。 Pandas使用ExcelFile函数已经有了一个答案,但它对我来说无法正常工作。从这里我发现read_excel函数可以正常工作:

import pandas as pd
dfs = pd.read_excel("your_file_name.xlsx", sheet_name="your_sheet_name")
print(dfs.head(10))

P.S. You need to have the xlrd installed for read_excel function to work

附:您需要安装xlrd才能使read_excel函数正常工作