我的Perl脚本如何确定Excel文件是XLS还是XLSX格式?

时间:2023-01-15 13:23:03

I have a Perl script that reads data from an Excel (xls) binary file. But the client that sends us these files has started sending us XLSX format files at times. I've updated the script to be able to read those as well. However, the client sometimes likes to name the XLSX files with an .xls extension, which currently confuses the heck outta my script since it uses the file name to determine which file type it is.

我有一个Perl脚本,从Excel(xls)二进制文件中读取数据。但是向我们发送这些文件的客户端有时会开始向我们发送XLSX格式文件。我已经更新了脚本以便能够阅读这些内容。但是,客户端有时喜欢用扩展名为.xls的XLSX文件命名,因为它使用文件名来确定它的文件类型,因此目前会混淆我的脚本。

An XLSX file is a zip file that contains XML stuff. Is there a simple way for my script to look at the file and tell whether it's a zip file or not? If so, I can make my script go by that instead of just the file name.

XLSX文件是包含XML内容的zip文件。我的脚本是否有一种简单的方法来查看文件并判断它是否是zip文件?如果是这样,我可以让我的脚本去,而不仅仅是文件名。

7 个解决方案

#1


16  

.xlsx files have the first 2 bytes as 'PK', so a simple open and examination of the first 2 characters will do.

.xlsx文件的前2个字节为'PK',因此对前2个字符进行简单的打开和检查即可。

#2


17  

Yes, it is possible by checking magic number.

是的,可以通过检查幻数来实现。

There are quite a few modules in Perl for checking magic number in a file.

Perl中有很多模块用于检查文件中的幻数。

An example using File::LibMagic:

使用File :: LibMagic的示例:

use strict;
use warnings;

use File::LibMagic;

my $lm = File::LibMagic->new();

if ( $lm->checktype_filename($filename) eq 'application/zip; charset=binary' ) {
    # XLSX format
}
elsif ( $lm->checktype_filename($filename) eq 'application/vnd.ms-office; charset=binary' ) {
    # XLS format
}

Another example, using File::Type:

另一个例子,使用File :: Type:

use strict;
use warnings;

use File::Type;

my $ft = File::Type->new();

if ( $ft->mime_type($file) eq 'application/zip' ) {
    # XLSX format
}
else {
    # probably XLS format
}

#3


6  

Edit: Archive::Zip is a better

编辑:Archive :: Zip是更好的

solution
 # Read a Zip file
   my $somezip = Archive::Zip->new();
   unless ( $somezip->read( 'someZip.zip' ) == AZ_OK ) {
       die 'read error';
   }

#4


2  

Use File::Type:

使用File :: Type:

my $file = "foo.zip";
my $filetype = File::Type->new( );

if( $filetype->mime_type( $file ) eq 'application/zip' ) {
  # File is a zip archive.
  ...
}

I just tested it with a .xlsx file, and the mime_type() returned application/zip. Similarly, for a .xls file the mime_type() is application/octet-stream.

我刚用.xlsx文件测试它,mime_type()返回了application / zip。类似地,对于.xls文件,mime_type()是application / octet-stream。

#5


1  

You can detect the xls file by checking the first bytes of the file for Excel headers.

您可以通过检查Excel头文件的第一个字节来检测xls文件。

A list of valid older Excel headers can be gotten from here (unless you know exact version of their Excel, check for all applicable possibilities):

可以从此处获取有效的旧Excel标题列表(除非您知道其Excel的确切版本,请检查所有适用的可能性):

http://toorcon.techpathways.com/uploads/headersig.txt

http://toorcon.techpathways.com/uploads/headersig.txt


Zip headers are described here: http://en.wikipedia.org/wiki/ZIP_(file_format)#File_headers but i'm not sure if .xlsx files have the same headers.

这里描述了Zip标题:http://en.wikipedia.org/wiki/ZIP_(file_format)#File_headers但我不确定.xlsx文件是否具有相同的标题。

File::Type's logic seems to be "PK\003\004" as the file header to decide on zip files... but I'm not certain if that logic would work as far as .xlsx, not having a file to test.

File :: Type的逻辑似乎是“PK \ 003 \ 004”作为决定zip文件的文件头...但是我不确定该逻辑是否可以工作到.xlsx,没有要测试的文件。

#6


-1  

The-Evil-MacBook:~ ivucica$ file --mime-type --brief file.zip 
application/zip

Hence, probably comparing

因此,可能比较

`file --mime-type --brief $filename`

with application/zipwould do the trick of detecting zips. Of course, you need to have file installed which is quite usual on UNIX systems. I'm afraid I cannot provide Perl example since all knowledge of Perl evaporated from my memory, and I have no examples at hand.

使用application / zipwould可以检测拉链。当然,您需要安装文件,这在UNIX系统上很常见。我担心我无法提供Perl示例,因为Perl的所有知识都从我的记忆中消失了,而且我手头没有例子。

#7


-2  

I can't say about Perl, but with the framework I use, .Net, there are a number of libraries available that will manipulate zip files you could use.

关于Perl,我不能说,但是对于我使用的框架.Net,有许多库可以操作你可以使用的zip文件。

Another thing that I've seen people use is the command-line version of WinZip. It give a return-value that is 0 when a file is unzipped and non-zero when there is an error.

我见过人们使用的另一件事是WinZip的命令行版本。它提供的返回值在文件解压缩时为0,在出现错误时为非零。

This may not be the best way to do this, but it's a start.

这可能不是最好的方法,但这是一个开始。

#1


16  

.xlsx files have the first 2 bytes as 'PK', so a simple open and examination of the first 2 characters will do.

.xlsx文件的前2个字节为'PK',因此对前2个字符进行简单的打开和检查即可。

#2


17  

Yes, it is possible by checking magic number.

是的,可以通过检查幻数来实现。

There are quite a few modules in Perl for checking magic number in a file.

Perl中有很多模块用于检查文件中的幻数。

An example using File::LibMagic:

使用File :: LibMagic的示例:

use strict;
use warnings;

use File::LibMagic;

my $lm = File::LibMagic->new();

if ( $lm->checktype_filename($filename) eq 'application/zip; charset=binary' ) {
    # XLSX format
}
elsif ( $lm->checktype_filename($filename) eq 'application/vnd.ms-office; charset=binary' ) {
    # XLS format
}

Another example, using File::Type:

另一个例子,使用File :: Type:

use strict;
use warnings;

use File::Type;

my $ft = File::Type->new();

if ( $ft->mime_type($file) eq 'application/zip' ) {
    # XLSX format
}
else {
    # probably XLS format
}

#3


6  

Edit: Archive::Zip is a better

编辑:Archive :: Zip是更好的

solution
 # Read a Zip file
   my $somezip = Archive::Zip->new();
   unless ( $somezip->read( 'someZip.zip' ) == AZ_OK ) {
       die 'read error';
   }

#4


2  

Use File::Type:

使用File :: Type:

my $file = "foo.zip";
my $filetype = File::Type->new( );

if( $filetype->mime_type( $file ) eq 'application/zip' ) {
  # File is a zip archive.
  ...
}

I just tested it with a .xlsx file, and the mime_type() returned application/zip. Similarly, for a .xls file the mime_type() is application/octet-stream.

我刚用.xlsx文件测试它,mime_type()返回了application / zip。类似地,对于.xls文件,mime_type()是application / octet-stream。

#5


1  

You can detect the xls file by checking the first bytes of the file for Excel headers.

您可以通过检查Excel头文件的第一个字节来检测xls文件。

A list of valid older Excel headers can be gotten from here (unless you know exact version of their Excel, check for all applicable possibilities):

可以从此处获取有效的旧Excel标题列表(除非您知道其Excel的确切版本,请检查所有适用的可能性):

http://toorcon.techpathways.com/uploads/headersig.txt

http://toorcon.techpathways.com/uploads/headersig.txt


Zip headers are described here: http://en.wikipedia.org/wiki/ZIP_(file_format)#File_headers but i'm not sure if .xlsx files have the same headers.

这里描述了Zip标题:http://en.wikipedia.org/wiki/ZIP_(file_format)#File_headers但我不确定.xlsx文件是否具有相同的标题。

File::Type's logic seems to be "PK\003\004" as the file header to decide on zip files... but I'm not certain if that logic would work as far as .xlsx, not having a file to test.

File :: Type的逻辑似乎是“PK \ 003 \ 004”作为决定zip文件的文件头...但是我不确定该逻辑是否可以工作到.xlsx,没有要测试的文件。

#6


-1  

The-Evil-MacBook:~ ivucica$ file --mime-type --brief file.zip 
application/zip

Hence, probably comparing

因此,可能比较

`file --mime-type --brief $filename`

with application/zipwould do the trick of detecting zips. Of course, you need to have file installed which is quite usual on UNIX systems. I'm afraid I cannot provide Perl example since all knowledge of Perl evaporated from my memory, and I have no examples at hand.

使用application / zipwould可以检测拉链。当然,您需要安装文件,这在UNIX系统上很常见。我担心我无法提供Perl示例,因为Perl的所有知识都从我的记忆中消失了,而且我手头没有例子。

#7


-2  

I can't say about Perl, but with the framework I use, .Net, there are a number of libraries available that will manipulate zip files you could use.

关于Perl,我不能说,但是对于我使用的框架.Net,有许多库可以操作你可以使用的zip文件。

Another thing that I've seen people use is the command-line version of WinZip. It give a return-value that is 0 when a file is unzipped and non-zero when there is an error.

我见过人们使用的另一件事是WinZip的命令行版本。它提供的返回值在文件解压缩时为0,在出现错误时为非零。

This may not be the best way to do this, but it's a start.

这可能不是最好的方法,但这是一个开始。