如何在Apache POI中写入excel文件的大小(以字节为单位)?

时间:2022-12-23 20:24:22

I have a bit of a problem around here, I just can't get it done.

我在这附近有点问题,我无法完成它。

The thing is that I'm using POI for a project in Java and I have to get the final outputs in .xls format(HSSF for Apache).

问题是我在Java中使用POI作为项目,我必须得到.xls格式的最终​​输出(Apache的HSSF)。

So, my business rules state that each file I generate has to be 12 MB as a maximun file size.

因此,我的业务规则规定我生成的每个文件必须是12 MB作为最大文件大小。

But I know .xls has some internal way of dealing the data(XML I guess), so this adds more bytes than just putting the result in a plain text file. I just can't get the size of the Excel Workbook since it generates itself in a temporal location(I can't find it) and I just can't read it while writting.

但我知道.xls有一些处理数据的内部方式(我猜想是XML),所以这会增加更多的字节,而不仅仅是将结果放在纯文本文件中。我只是无法获得Excel工作簿的大小,因为它在一个临时位置生成自己(我找不到它),我只是在写作时无法读取它。

Is there any way to get the size in bytes of the Excel output file while Java writes to it using the HSSF Workbook Object?

当Java使用HSSF工作簿对象写入时,有没有办法获得Excel输出文件的大小(以字节为单位)?

2 个解决方案

#1


0  

Your best bet is probably to periodically write the file out, and see how big it is. The only way to know for sure how big the file will be is to write it out...

你最好的选择可能是定期写出文件,看看它有多大。要确定文件有多大的唯一方法就是把它写出来......

With HSSF, not all cells take up the same amount of size. String cells take up a different size to numeric cells, formula cells vary depending on the number of operators and values in them, string cells vary based on if they're using the same text as a previous cell or not etc. You can do some rough guesses based on the kinds of things you're adding (remembering to take account of cell styles, named ranges, pictures etc), but the only way to be sure is to write it out every so often and see how big it is.

对于HSSF,并非所有单元都占用相同的大小。字符串单元格与数字单元格的大小不同,公式单元格根据运算符的数量和值的不同而不同,字符串单元格会根据它们是否使用与前一个单元格相同的文本而有所不同等等。您可以执行某些操作粗略的猜测基于你要添加的东西(记住要考虑细胞样式,命名范围,图片等),但唯一的方法是确保每隔一段时间写出来,看看它有多大。

For XSSF, it's even more complicated. Not only do different cells take up different amounts of characters in the XML (much as for HSSF), the .xlsx file format is a compressed format. So, writing the same snippet of XML can take variable amounts of space in the output file, based on how the compression algorithms manage it. (The first one will take more than subsequent ones for example). So, there's even less hope for being certain without saving and testing. Again, you can probably come up with some rough guesses, but the only way to be sure is to save and see.

对于XSSF来说,它更复杂。不仅不同的单元格在XML中占用不同数量的字符(与HSSF一样),.xlsx文件格式是压缩格式。因此,根据压缩算法如何管理XML,编写相同的XML片段可能会在输出文件中占用不同的空间。 (例如,第一个将比后续的更多)。因此,在没有保存和测试的情况下确定更加没有希望。同样,您可能会想出一些粗略的猜测,但唯一可以确定的方法是保存并查看。

If you want a predictable file size, you'll have to use something purely text based, eg a .CSV file.

如果你想要一个可预测的文件大小,你将不得不使用纯文本的东西,例如.CSV文件。

#2


0  

Well, after some research on the API, I found out that the method called getBytes() returns an Array of bytes of every data on the workbook (Sheets, rows, data, etc) so using the lenght would return a very close aproximate of the bytes generated by the final workbook used by the user.

好吧,经过对API的一些研究后,我发现名为getBytes()的方法返回工作簿上每个数据的字节数组(表格,行,数据等),因此使用长度将返回非常接近的近似值由用户使用的最终工作簿生成的字节。

#1


0  

Your best bet is probably to periodically write the file out, and see how big it is. The only way to know for sure how big the file will be is to write it out...

你最好的选择可能是定期写出文件,看看它有多大。要确定文件有多大的唯一方法就是把它写出来......

With HSSF, not all cells take up the same amount of size. String cells take up a different size to numeric cells, formula cells vary depending on the number of operators and values in them, string cells vary based on if they're using the same text as a previous cell or not etc. You can do some rough guesses based on the kinds of things you're adding (remembering to take account of cell styles, named ranges, pictures etc), but the only way to be sure is to write it out every so often and see how big it is.

对于HSSF,并非所有单元都占用相同的大小。字符串单元格与数字单元格的大小不同,公式单元格根据运算符的数量和值的不同而不同,字符串单元格会根据它们是否使用与前一个单元格相同的文本而有所不同等等。您可以执行某些操作粗略的猜测基于你要添加的东西(记住要考虑细胞样式,命名范围,图片等),但唯一的方法是确保每隔一段时间写出来,看看它有多大。

For XSSF, it's even more complicated. Not only do different cells take up different amounts of characters in the XML (much as for HSSF), the .xlsx file format is a compressed format. So, writing the same snippet of XML can take variable amounts of space in the output file, based on how the compression algorithms manage it. (The first one will take more than subsequent ones for example). So, there's even less hope for being certain without saving and testing. Again, you can probably come up with some rough guesses, but the only way to be sure is to save and see.

对于XSSF来说,它更复杂。不仅不同的单元格在XML中占用不同数量的字符(与HSSF一样),.xlsx文件格式是压缩格式。因此,根据压缩算法如何管理XML,编写相同的XML片段可能会在输出文件中占用不同的空间。 (例如,第一个将比后续的更多)。因此,在没有保存和测试的情况下确定更加没有希望。同样,您可能会想出一些粗略的猜测,但唯一可以确定的方法是保存并查看。

If you want a predictable file size, you'll have to use something purely text based, eg a .CSV file.

如果你想要一个可预测的文件大小,你将不得不使用纯文本的东西,例如.CSV文件。

#2


0  

Well, after some research on the API, I found out that the method called getBytes() returns an Array of bytes of every data on the workbook (Sheets, rows, data, etc) so using the lenght would return a very close aproximate of the bytes generated by the final workbook used by the user.

好吧,经过对API的一些研究后,我发现名为getBytes()的方法返回工作簿上每个数据的字节数组(表格,行,数据等),因此使用长度将返回非常接近的近似值由用户使用的最终工作簿生成的字节。