Java 多线程写zip文件遇到的错误 write beyond end of stream!

时间:2022-11-05 15:07:14

  最近在写一个大量小文件直接压缩到一个zip的需求,由于zip中的entry每一个都是独立的,不需要追加写入,也就是一个entry文件,写一个内容,

因此直接使用了多线程来处理,结果就翻车了,代码给出了如下的错误write beyond end of stream!

      下面直接还原当时的代码场景:

 1 public class MultiThreadWriteZipFile {
 2 
 3     private static ExecutorService executorService = Executors.newFixedThreadPool(50);
 4 
 5     private static  CountDownLatch countDownLatch = new CountDownLatch(50);
 6 
 7 
 8     @Test
 9     public void multiThreadWriteZip() throws IOException, InterruptedException {
10         File file = new File("D:\\Gis开发\\数据\\影像数据\\china_tms\\2\\6\\2.jpeg");
11         //创建一个zip
12         ZipOutputStream zipOutputStream =
13                 new ZipOutputStream(new FileOutputStream(new File("E:\\java\\test\\test.zip")));
14 
15         for (int i = 0; i < 50; i++){
16             String entryName = i + File.separator + i + File.separator + i + ".jpeg";
17             executorService.submit(() -> {
18                 try {
19                     writeSource2ZipFile(new FileInputStream(file),entryName,zipOutputStream);
20                     countDownLatch.countDown();
21                 } catch (IOException e) {
22                     e.getLocalizedMessage();
23                 }
24             });
25         }
26         //阻塞主线程
27         countDownLatch.await();
28         //关闭流
29         zipOutputStream.close();
30     }
31 
32 
33     public void writeSource2ZipFile(InputStream inputStream,
34                                            String zipEntryName,
35                                            ZipOutputStream zipOutputStream) throws IOException {
36         //新建entry
37         zipOutputStream.putNextEntry(new ZipEntry(zipEntryName));
38         byte[] buf = new byte[1024];
39         int position;
40         //entry中写数据
41         while((position = inputStream.read(buf)) != -1){
42             zipOutputStream.write(buf);
43         }
44         zipOutputStream.closeEntry();
45         zipOutputStream.flush();
46     }
47 }

 直接运行上面的代码就会报错:write beyond end of stream

 将 private static ExecutorService executorService = Executors.newFixedThreadPool(50);

修改为

private static ExecutorSercvice executorService = Executors.newSingleThreadExecutor();

此时代码运行正常!

至于原因嘛,我们跟踪下代码也就明白其中的原因了,我们先来看报错的代码出处:

在java.util包下的DeflaterOutputStream的201行(jdk1.8,其它版本可能会有差异),我们来看代码

 public void write(byte[] b, int off, int len) throws IOException {
        if (def.finished()) {
            throw new IOException("write beyond end of stream");
        }
        if ((off | len | (off + len) | (b.length - (off + len))) < 0) {
            throw new IndexOutOzfBoundsException();
        } else if (len == 0) {
            return;
        }
        if (!def.finished()) {
            def.setInput(b, off, len);
            while (!def.needsInput()) {
                deflate();
            }
        }
    }

关键的原因就是def.finished()对应的状态信息,而这个状态是在Deflater这个类中定义的,这个类也是Java基于ZLIB压缩库实现的,一个压缩工具类。

而下面的这段代码就是改变这个状态的,

public void finish() {
        synchronized (zsRef) {
            finish = true;
        }
    }

而这个代码的调用之处,最源头就是我们上面的zipOutputStream.putNextEntry(new ZipEntry(zipEntryName)); 这行代码,

其实先思路,就是每次新增一个entry的时候,都需要将上一次的entry关闭掉,此时也就触发了这个条件,而这个状态并不是线程私有的,我们通过下面的代码就可以知道

public
class Deflater {

    private final ZStreamRef zsRef;
    private byte[] buf = new byte[0];
    private int off, len;
    private int level, strategy;
    private boolean setParams;
    private boolean finish, finished;
    private long bytesRead;
    private long bytesWritten;

因此在多线程下,这个状态肯定是线程不安全的!

好了本次关于多线程下写zip报错的问题,就介绍到这里!