使用命令行将xlsx转换为Linux中的csv

时间:2022-04-10 02:04:35

I'm looking for a way to convert xlsx files to csv files on Linux.

我正在寻找一种将xlsx文件转换为Linux上的csv文件的方法。

I do not want to use PHP/Perl or anything like that since I'm looking at processing several millions of lines, so I need something quick. I found a program on the Ubuntu repos called xls2csv but it will only convert xls (Office 2003) files (which I'm currently using) but I need support for the newer Excel files.

我不想使用PHP/Perl或类似的东西,因为我正在处理数百万行,所以我需要一些快速的东西。我在Ubuntu repos上发现了一个名为xls2csv的程序,但它只会转换xls (Office 2003)文件(我目前正在使用),但我需要支持更新的Excel文件。

Any ideas?

什么好主意吗?

11 个解决方案

#1


167  

The Gnumeric spreadsheet application comes with a command line utility called ssconvert that can convert between a variety of spreadsheet formats:

Gnumeric电子表格应用程序附带一个名为ssconvert的命令行实用程序,可以在各种电子表格格式之间进行转换:

$ ssconvert Book1.xlsx newfile.csv
Using exporter Gnumeric_stf:stf_csv

$ cat newfile.csv 
Foo,Bar,Baz
1,2,3
123.6,7.89,
2012/05/14,,
The,last,Line

To install on Ubuntu:

在Ubuntu安装:

apt-get install gnumeric

To install on Mac:

安装在Mac:

brew install gnumeric

#2


105  

You can do this with LibreOffice:

你可以使用LibreOffice:

libreoffice --headless --convert-to csv $filename --outdir $outdir

For reasons not clear to me, you might need to run this with sudo. You can make LibreOffice work with sudo without requiring a password by adding this line to you sudoers file:

由于我不清楚的原因,您可能需要使用sudo来运行它。通过向sudo文件中添加这一行,您可以使LibreOffice使用sudo而不需要密码:

users ALL=(ALL) NOPASSWD: libreoffice

#3


91  

If you already have a Desktop environment then I'm sure Gnumeric / LibreOffice would work well, but on a headless server (such as Amazon Web Services), they require dozens of dependencies that you also need to install.

如果您已经有了桌面环境,那么我确信Gnumeric / LibreOffice一定可以很好地工作,但是在一个无头服务器(例如Amazon Web Services)上,它们需要几十个您也需要安装的依赖项。

I found this Python alternative:

我发现了Python的另一种选择:

https://github.com/dilshod/xlsx2csv

https://github.com/dilshod/xlsx2csv

$ easy_install xlsx2csv
$ xlsx2csv file.xlsx > newfile.csv

Took 2 seconds to install and works like a charm.

花了2秒的时间安装和工作,就像一个魅力。

If you have multiple sheets you can export all at once, or one at a time:

如果您有多个表,您可以一次导出所有,或一次导出一个:

$ xlsx2csv file.xlsx --all > all.csv
$ xlsx2csv file.xlsx --all -p '' > all-no-delimiter.csv
$ xlsx2csv file.xlsx -s 1 > sheet1.csv

He also links to several alternatives built in Bash, Python, Ruby, and Java.

他还链接到在Bash、Python、Ruby和Java中构建的几个备选方案。

#4


22  

Use csvkit

使用csvkit

in2csv data.xlsx > data.csv

For details check their excellent docs

详细检查他们优秀的文档。

#5


20  

In bash, I used this libreoffice command to convert all my xlsx files in the current directory:

在bash中,我使用这个libreoffice命令转换当前目录中的所有xlsx文件:

for i   in *.xlsx; do  libreoffice --headless --convert-to csv "$i" ; done

It takes care of spaces in the filename.

它负责文件名中的空格。

Tried again some years later, and it didn't work. This thread gives some tips, but the quickiest solution was to run as root (or running a sudo libreoffice). Not elegant, but quick.

几年后又试了一次,但没有成功。这个线程提供了一些技巧,但是最快速的解决方案是作为根(或运行sudo libreoffice)运行。不优雅,但快。

Use the command scalc.exe in Windows

使用命令scalc。exe在Windows中

#6


7  

Another option would be to use R via a small bash wrapper for convenience:

另一个选择是通过一个小型bash包装器使用R,以方便:

xlsx2txt(){
echo '
require(xlsx)
write.table(read.xlsx2(commandArgs(TRUE)[1], 1), stdout(), quote=F, row.names=FALSE, col.names=T, sep="\t")
' | Rscript --vanilla - $1 2>/dev/null
}

xlsx2txt file.xlsx > file.txt

#7


7  

If .xlsx file has many sheets, -s flag can be used to get the sheet you want. For example:

如果.xlsx文件有许多表,可以使用-s标志获得所需的表。例如:

xlsx2csv "my_file.xlsx" -s 2 second_sheet.csv

second_sheet.csv would contain data of 2nd sheet in my_file.xlsx.

second_sheet。csv将包含my_file.xlsx中第二个表的数据。

#8


3  

If you are OK to run Java command line then you can do it with Apache POI HSSF's Excel Extractor. It has a main method that says to be the command line extractor. This one seems to just dump everything out. They point out to this example that converts to CSV. You would have to compile it before you can run it but it too has a main method so you should not have to do much coding per se to make it work.

如果可以运行Java命令行,那么可以使用Apache POI HSSF的Excel提取器。它的主要方法是命令行提取器。这个似乎把所有东西都倒了出来。他们指出了这个转换为CSV的例子。您必须在运行它之前编译它,但是它也有一个主要的方法,所以您不应该为了使它工作而必须做大量的编码。

Another option that might fly but will require some work on the other end is to make your Excel files come to you as Excel XML Data or XML Spreadsheet of whatever MS calls that format these days. It will open a whole new world of opportunities for you to slice and dice it the way you want.

另一个可行但需要做一些工作的选项是,让您的Excel文件作为Excel XML数据或XML电子表格提供给您,这些天MS调用的任何格式。它将为你打开一个全新的机会世界,让你以你想要的方式切割它。

#9


3  

Using the Gnumeric spreadsheet application which comes which a commandline utility called ssconvert is indeed super simple:

使用Gnumeric电子表格应用程序,一个叫做ssconvert的命令行实用程序确实非常简单:

find . -name '*.xlsx' -exec ssconvert -T Gnumeric_stf:stf_csv {} \;

and you're done!

和你做的!

#10


0  

As others said, libreoffice can convert xls files to csv. The problem for me was the sheet selection.

正如其他人所说,libreoffice可以将xls文件转换为csv。我的问题是选择表格。

This libreoffice Python script does a fine job at converting a single sheet to CSV.

这个libreoffice Python脚本在将单个表转换为CSV方面做得很好。

Usage is:

用法是:

./libreconverter.py File.xls:"Sheet Name" output.csv

The only downside (on my end) is that --headless doesn't seem to work. I have a LO window that shows up for a second and then quits.
That's OK with me, it's the only tool that does the job rapidly.

唯一的缺点(在我这边)是——无头看起来不太好用。我有一个LO窗口,它会出现一秒钟,然后退出。这对我来说没问题,这是唯一能快速完成工作的工具。

#11


-2  

You could try the Open/LibreOffice spreadsheet. It's not a command line tool but there's a good chance they'll support xlsx. See www.libreoffice.org/features/calc/. It mentions xlsx support.

您可以尝试打开/LibreOffice电子表格。它不是命令行工具,但很有可能支持xlsx。见www.libreoffice.org/features/calc/。它提到xlsx支持。

#1


167  

The Gnumeric spreadsheet application comes with a command line utility called ssconvert that can convert between a variety of spreadsheet formats:

Gnumeric电子表格应用程序附带一个名为ssconvert的命令行实用程序,可以在各种电子表格格式之间进行转换:

$ ssconvert Book1.xlsx newfile.csv
Using exporter Gnumeric_stf:stf_csv

$ cat newfile.csv 
Foo,Bar,Baz
1,2,3
123.6,7.89,
2012/05/14,,
The,last,Line

To install on Ubuntu:

在Ubuntu安装:

apt-get install gnumeric

To install on Mac:

安装在Mac:

brew install gnumeric

#2


105  

You can do this with LibreOffice:

你可以使用LibreOffice:

libreoffice --headless --convert-to csv $filename --outdir $outdir

For reasons not clear to me, you might need to run this with sudo. You can make LibreOffice work with sudo without requiring a password by adding this line to you sudoers file:

由于我不清楚的原因,您可能需要使用sudo来运行它。通过向sudo文件中添加这一行,您可以使LibreOffice使用sudo而不需要密码:

users ALL=(ALL) NOPASSWD: libreoffice

#3


91  

If you already have a Desktop environment then I'm sure Gnumeric / LibreOffice would work well, but on a headless server (such as Amazon Web Services), they require dozens of dependencies that you also need to install.

如果您已经有了桌面环境,那么我确信Gnumeric / LibreOffice一定可以很好地工作,但是在一个无头服务器(例如Amazon Web Services)上,它们需要几十个您也需要安装的依赖项。

I found this Python alternative:

我发现了Python的另一种选择:

https://github.com/dilshod/xlsx2csv

https://github.com/dilshod/xlsx2csv

$ easy_install xlsx2csv
$ xlsx2csv file.xlsx > newfile.csv

Took 2 seconds to install and works like a charm.

花了2秒的时间安装和工作,就像一个魅力。

If you have multiple sheets you can export all at once, or one at a time:

如果您有多个表,您可以一次导出所有,或一次导出一个:

$ xlsx2csv file.xlsx --all > all.csv
$ xlsx2csv file.xlsx --all -p '' > all-no-delimiter.csv
$ xlsx2csv file.xlsx -s 1 > sheet1.csv

He also links to several alternatives built in Bash, Python, Ruby, and Java.

他还链接到在Bash、Python、Ruby和Java中构建的几个备选方案。

#4


22  

Use csvkit

使用csvkit

in2csv data.xlsx > data.csv

For details check their excellent docs

详细检查他们优秀的文档。

#5


20  

In bash, I used this libreoffice command to convert all my xlsx files in the current directory:

在bash中,我使用这个libreoffice命令转换当前目录中的所有xlsx文件:

for i   in *.xlsx; do  libreoffice --headless --convert-to csv "$i" ; done

It takes care of spaces in the filename.

它负责文件名中的空格。

Tried again some years later, and it didn't work. This thread gives some tips, but the quickiest solution was to run as root (or running a sudo libreoffice). Not elegant, but quick.

几年后又试了一次,但没有成功。这个线程提供了一些技巧,但是最快速的解决方案是作为根(或运行sudo libreoffice)运行。不优雅,但快。

Use the command scalc.exe in Windows

使用命令scalc。exe在Windows中

#6


7  

Another option would be to use R via a small bash wrapper for convenience:

另一个选择是通过一个小型bash包装器使用R,以方便:

xlsx2txt(){
echo '
require(xlsx)
write.table(read.xlsx2(commandArgs(TRUE)[1], 1), stdout(), quote=F, row.names=FALSE, col.names=T, sep="\t")
' | Rscript --vanilla - $1 2>/dev/null
}

xlsx2txt file.xlsx > file.txt

#7


7  

If .xlsx file has many sheets, -s flag can be used to get the sheet you want. For example:

如果.xlsx文件有许多表,可以使用-s标志获得所需的表。例如:

xlsx2csv "my_file.xlsx" -s 2 second_sheet.csv

second_sheet.csv would contain data of 2nd sheet in my_file.xlsx.

second_sheet。csv将包含my_file.xlsx中第二个表的数据。

#8


3  

If you are OK to run Java command line then you can do it with Apache POI HSSF's Excel Extractor. It has a main method that says to be the command line extractor. This one seems to just dump everything out. They point out to this example that converts to CSV. You would have to compile it before you can run it but it too has a main method so you should not have to do much coding per se to make it work.

如果可以运行Java命令行,那么可以使用Apache POI HSSF的Excel提取器。它的主要方法是命令行提取器。这个似乎把所有东西都倒了出来。他们指出了这个转换为CSV的例子。您必须在运行它之前编译它,但是它也有一个主要的方法,所以您不应该为了使它工作而必须做大量的编码。

Another option that might fly but will require some work on the other end is to make your Excel files come to you as Excel XML Data or XML Spreadsheet of whatever MS calls that format these days. It will open a whole new world of opportunities for you to slice and dice it the way you want.

另一个可行但需要做一些工作的选项是,让您的Excel文件作为Excel XML数据或XML电子表格提供给您,这些天MS调用的任何格式。它将为你打开一个全新的机会世界,让你以你想要的方式切割它。

#9


3  

Using the Gnumeric spreadsheet application which comes which a commandline utility called ssconvert is indeed super simple:

使用Gnumeric电子表格应用程序,一个叫做ssconvert的命令行实用程序确实非常简单:

find . -name '*.xlsx' -exec ssconvert -T Gnumeric_stf:stf_csv {} \;

and you're done!

和你做的!

#10


0  

As others said, libreoffice can convert xls files to csv. The problem for me was the sheet selection.

正如其他人所说,libreoffice可以将xls文件转换为csv。我的问题是选择表格。

This libreoffice Python script does a fine job at converting a single sheet to CSV.

这个libreoffice Python脚本在将单个表转换为CSV方面做得很好。

Usage is:

用法是:

./libreconverter.py File.xls:"Sheet Name" output.csv

The only downside (on my end) is that --headless doesn't seem to work. I have a LO window that shows up for a second and then quits.
That's OK with me, it's the only tool that does the job rapidly.

唯一的缺点(在我这边)是——无头看起来不太好用。我有一个LO窗口,它会出现一秒钟,然后退出。这对我来说没问题,这是唯一能快速完成工作的工具。

#11


-2  

You could try the Open/LibreOffice spreadsheet. It's not a command line tool but there's a good chance they'll support xlsx. See www.libreoffice.org/features/calc/. It mentions xlsx support.

您可以尝试打开/LibreOffice电子表格。它不是命令行工具,但很有可能支持xlsx。见www.libreoffice.org/features/calc/。它提到xlsx支持。