如何在Apache POI中写入excel文件的大小(以字节为单位)?

时间:2022-12-23 20:24:22

I have a bit of a problem around here, I just can't get it done.


The thing is that I'm using POI for a project in Java and I have to get the final outputs in .xls format(HSSF for Apache).


So, my business rules state that each file I generate has to be 12 MB as a maximun file size.

因此,我的业务规则规定我生成的每个文件必须是12 MB作为最大文件大小。

But I know .xls has some internal way of dealing the data(XML I guess), so this adds more bytes than just putting the result in a plain text file. I just can't get the size of the Excel Workbook since it generates itself in a temporal location(I can't find it) and I just can't read it while writting.


Is there any way to get the size in bytes of the Excel output file while Java writes to it using the HSSF Workbook Object?


2 个解决方案



Your best bet is probably to periodically write the file out, and see how big it is. The only way to know for sure how big the file will be is to write it out...


With HSSF, not all cells take up the same amount of size. String cells take up a different size to numeric cells, formula cells vary depending on the number of operators and values in them, string cells vary based on if they're using the same text as a previous cell or not etc. You can do some rough guesses based on the kinds of things you're adding (remembering to take account of cell styles, named ranges, pictures etc), but the only way to be sure is to write it out every so often and see how big it is.


For XSSF, it's even more complicated. Not only do different cells take up different amounts of characters in the XML (much as for HSSF), the .xlsx file format is a compressed format. So, writing the same snippet of XML can take variable amounts of space in the output file, based on how the compression algorithms manage it. (The first one will take more than subsequent ones for example). So, there's even less hope for being certain without saving and testing. Again, you can probably come up with some rough guesses, but the only way to be sure is to save and see.

对于XSSF来说,它更复杂。不仅不同的单元格在XML中占用不同数量的字符(与HSSF一样),.xlsx文件格式是压缩格式。因此,根据压缩算法如何管理XML,编写相同的XML片段可能会在输出文件中占用不同的空间。 (例如,第一个将比后续的更多)。因此,在没有保存和测试的情况下确定更加没有希望。同样,您可能会想出一些粗略的猜测,但唯一可以确定的方法是保存并查看。

If you want a predictable file size, you'll have to use something purely text based, eg a .CSV file.




Well, after some research on the API, I found out that the method called getBytes() returns an Array of bytes of every data on the workbook (Sheets, rows, data, etc) so using the lenght would return a very close aproximate of the bytes generated by the final workbook used by the user.




Your best bet is probably to periodically write the file out, and see how big it is. The only way to know for sure how big the file will be is to write it out...


With HSSF, not all cells take up the same amount of size. String cells take up a different size to numeric cells, formula cells vary depending on the number of operators and values in them, string cells vary based on if they're using the same text as a previous cell or not etc. You can do some rough guesses based on the kinds of things you're adding (remembering to take account of cell styles, named ranges, pictures etc), but the only way to be sure is to write it out every so often and see how big it is.


For XSSF, it's even more complicated. Not only do different cells take up different amounts of characters in the XML (much as for HSSF), the .xlsx file format is a compressed format. So, writing the same snippet of XML can take variable amounts of space in the output file, based on how the compression algorithms manage it. (The first one will take more than subsequent ones for example). So, there's even less hope for being certain without saving and testing. Again, you can probably come up with some rough guesses, but the only way to be sure is to save and see.

对于XSSF来说,它更复杂。不仅不同的单元格在XML中占用不同数量的字符(与HSSF一样),.xlsx文件格式是压缩格式。因此,根据压缩算法如何管理XML,编写相同的XML片段可能会在输出文件中占用不同的空间。 (例如,第一个将比后续的更多)。因此,在没有保存和测试的情况下确定更加没有希望。同样,您可能会想出一些粗略的猜测,但唯一可以确定的方法是保存并查看。

If you want a predictable file size, you'll have to use something purely text based, eg a .CSV file.




Well, after some research on the API, I found out that the method called getBytes() returns an Array of bytes of every data on the workbook (Sheets, rows, data, etc) so using the lenght would return a very close aproximate of the bytes generated by the final workbook used by the user.
