在性能方面,在什么时候用BufferedOutputStream包装FileOutputStream是有意义的?

时间:2022-09-12 20:28:47

I have a module that is responsible for reading, processing, and writing bytes to disk. The bytes come in over UDP and, after the individual datagrams are assembled, the final byte array that gets processed and written to disk is typically between 200 bytes and 500,000 bytes. Occassionally, there will be byte arrays that, after assembly, are over 500,000 bytes, but these are relatively rare.

我有一个模块负责读取,处理和写入磁盘的字节。字节通过UDP传入,并且在组装各个数据报之后,处理并写入磁盘的最终字节数组通常在200字节和500,000字节之间。有时候,会有一些字节数组在组装后超过500,000字节,但这些数组相对较少。

I'm currently using the FileOutputStream's write(byte\[\]) method. I'm also experimenting with wrapping the FileOutputStream in a BufferedOutputStream, including using the constructor that accepts a buffer size as a parameter.

我目前正在使用FileOutputStream的write(byte \ [\])方法。我也在尝试在BufferedOutputStream中包装FileOutputStream,包括使用接受缓冲区大小作为参数的构造函数。

It appears that using the BufferedOutputStream is tending toward slightly better performance, but I've only just begun to experiment with different buffer sizes. I only have a limited set of sample data to work with (two data sets from sample runs that I can pipe through my application). Is there a general rule-of-thumb that I might be able to apply to try to calculate the optimal buffer sizes to reduce disk writes and maximize the performance of the disk writing given the information that I know about the data I'm writing?

似乎使用BufferedOutputStream趋向于略微更好的性能,但我只是开始尝试不同的缓冲区大小。我只有一组有限的样本数据可供使用(来自样本运行的两个数据集,我可以通过我的应用程序管道)。是否有一般的经验法则我可以应用于尝试计算最佳缓冲区大小以减少磁盘写入并最大化磁盘写入的性能,因为我知道有关我正在编写的数据的信息?

2 个解决方案

#1


29  

BufferedOutputStream helps when the writes are smaller than the buffer size e.g. 8 KB. For larger writes it doesn't help nor does it make it much worse. If ALL your writes are larger than the buffer size or you always flush() after every write, I would not use a buffer. However if a good portion of your writes are less that the buffer size and you don't use flush() every time, its worth having.

当写入小于缓冲区大小时,BufferedOutputStream会有所帮助,例如: 8 KB。对于较大的写入,它没有帮助,也没有使它变得更糟。如果所有写入都大于缓冲区大小,或者每次写入后总是刷新(),我就不会使用缓冲区。但是,如果您的写入的大部分比缓冲区大小少,并且每次都不使用flush(),那么它的价值就是。

You may find increasing the buffer size to 32 KB or larger gives you a marginal improvement, or make it worse. YMMV

您可能会发现将缓冲区大小增加到32 KB或更大会使您获得边际改进,或者使其变得更糟。因人而异


You might find the code for BufferedOutputStream.write useful

您可能会发现BufferedOutputStream.write的代码很有用

/**
 * Writes <code>len</code> bytes from the specified byte array
 * starting at offset <code>off</code> to this buffered output stream.
 *
 * <p> Ordinarily this method stores bytes from the given array into this
 * stream's buffer, flushing the buffer to the underlying output stream as
 * needed.  If the requested length is at least as large as this stream's
 * buffer, however, then this method will flush the buffer and write the
 * bytes directly to the underlying output stream.  Thus redundant
 * <code>BufferedOutputStream</code>s will not copy data unnecessarily.
 *
 * @param      b     the data.
 * @param      off   the start offset in the data.
 * @param      len   the number of bytes to write.
 * @exception  IOException  if an I/O error occurs.
 */
public synchronized void write(byte b[], int off, int len) throws IOException {
    if (len >= buf.length) {
        /* If the request length exceeds the size of the output buffer,
           flush the output buffer and then write the data directly.
           In this way buffered streams will cascade harmlessly. */
        flushBuffer();
        out.write(b, off, len);
        return;
    }
    if (len > buf.length - count) {
        flushBuffer();
    }
    System.arraycopy(b, off, buf, count, len);
    count += len;
}

#2


1  

I have lately been trying to explore IO performance. From what I have observed, directly writing to a FileOutputStream has led to better results; which I have attributed to FileOutputStream's native call for write(byte[], int, int). Moreover, I have also observed that when BufferedOutputStream's latency begins to converge towards that of direct FileOutputStream, it fluctuates a lot more i.e. it can abruptly even double-up (I haven't yet been able to find out why).

我最近一直试图探索IO性能。根据我的观察,直接写入FileOutputStream可以获得更好的结果;我将其归因于FileOutputStream的本地调用write(byte [],int,int)。此外,我还观察到当BufferedOutputStream的延迟开始收敛到直接FileOutputStream的延迟时,它会波动很多,即它甚至可以突然加倍(我还没有找到原因)。

P.S. I am using Java 8 and will not be able to comment right now on whether my observations will hold for previous java versions.

附:我正在使用Java 8,现在无法评论我的观察是否适用于以前的java版本。

Here's the code I tested, where my input was a ~10KB file

这是我测试的代码,我输入的是一个~10KB的文件

public class WriteCombinationsOutputStreamComparison {
    private static final Logger LOG = LogManager.getLogger(WriteCombinationsOutputStreamComparison.class);

public static void main(String[] args) throws IOException {

    final BufferedInputStream input = new BufferedInputStream(new FileInputStream("src/main/resources/inputStream1.txt"), 4*1024);
    final ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
    int data = input.read();
    while (data != -1) {
        byteArrayOutputStream.write(data); // everything comes in memory
        data = input.read();
    }
    final byte[] bytesRead = byteArrayOutputStream.toByteArray();
    input.close();

    /*
     * 1. WRITE USING A STREAM DIRECTLY with entire byte array --> FileOutputStream directly uses a native call and writes
     */
    try (OutputStream outputStream = new FileOutputStream("src/main/resources/outputStream1.txt")) {
        final long begin = System.nanoTime();
        outputStream.write(bytesRead);
        outputStream.flush();
        final long end = System.nanoTime();
        LOG.info("Total time taken for file write, writing entire array [nanos=" + (end - begin) + "], [bytesWritten=" + bytesRead.length + "]");
        if (LOG.isDebugEnabled()) {
            LOG.debug("File reading result was: \n" + new String(bytesRead, Charset.forName("UTF-8")));
        }
    }

    /*
     * 2. WRITE USING A BUFFERED STREAM, write entire array
     */

    // changed the buffer size to different combinations --> write latency fluctuates a lot for same buffer size over multiple runs
    try (BufferedOutputStream outputStream = new BufferedOutputStream(new FileOutputStream("src/main/resources/outputStream1.txt"), 16*1024)) {
        final long begin = System.nanoTime();
        outputStream.write(bytesRead);
        outputStream.flush();
        final long end = System.nanoTime();
        LOG.info("Total time taken for buffered file write, writing entire array [nanos=" + (end - begin) + "], [bytesWritten=" + bytesRead.length + "]");
        if (LOG.isDebugEnabled()) {
            LOG.debug("File reading result was: \n" + new String(bytesRead, Charset.forName("UTF-8")));
        }
    }
}
}

OUTPUT:

OUTPUT:

2017-01-30 23:38:59.064 [INFO] [main] [WriteCombinationsOutputStream] - Total time taken for file write, writing entire array [nanos=100990], [bytesWritten=11059]

2017-01-30 23:38:59.086 [INFO] [main] [WriteCombinationsOutputStream] - Total time taken for buffered file write, writing entire array [nanos=142454], [bytesWritten=11059]

#1


29  

BufferedOutputStream helps when the writes are smaller than the buffer size e.g. 8 KB. For larger writes it doesn't help nor does it make it much worse. If ALL your writes are larger than the buffer size or you always flush() after every write, I would not use a buffer. However if a good portion of your writes are less that the buffer size and you don't use flush() every time, its worth having.

当写入小于缓冲区大小时,BufferedOutputStream会有所帮助,例如: 8 KB。对于较大的写入,它没有帮助,也没有使它变得更糟。如果所有写入都大于缓冲区大小,或者每次写入后总是刷新(),我就不会使用缓冲区。但是,如果您的写入的大部分比缓冲区大小少,并且每次都不使用flush(),那么它的价值就是。

You may find increasing the buffer size to 32 KB or larger gives you a marginal improvement, or make it worse. YMMV

您可能会发现将缓冲区大小增加到32 KB或更大会使您获得边际改进,或者使其变得更糟。因人而异


You might find the code for BufferedOutputStream.write useful

您可能会发现BufferedOutputStream.write的代码很有用

/**
 * Writes <code>len</code> bytes from the specified byte array
 * starting at offset <code>off</code> to this buffered output stream.
 *
 * <p> Ordinarily this method stores bytes from the given array into this
 * stream's buffer, flushing the buffer to the underlying output stream as
 * needed.  If the requested length is at least as large as this stream's
 * buffer, however, then this method will flush the buffer and write the
 * bytes directly to the underlying output stream.  Thus redundant
 * <code>BufferedOutputStream</code>s will not copy data unnecessarily.
 *
 * @param      b     the data.
 * @param      off   the start offset in the data.
 * @param      len   the number of bytes to write.
 * @exception  IOException  if an I/O error occurs.
 */
public synchronized void write(byte b[], int off, int len) throws IOException {
    if (len >= buf.length) {
        /* If the request length exceeds the size of the output buffer,
           flush the output buffer and then write the data directly.
           In this way buffered streams will cascade harmlessly. */
        flushBuffer();
        out.write(b, off, len);
        return;
    }
    if (len > buf.length - count) {
        flushBuffer();
    }
    System.arraycopy(b, off, buf, count, len);
    count += len;
}

#2


1  

I have lately been trying to explore IO performance. From what I have observed, directly writing to a FileOutputStream has led to better results; which I have attributed to FileOutputStream's native call for write(byte[], int, int). Moreover, I have also observed that when BufferedOutputStream's latency begins to converge towards that of direct FileOutputStream, it fluctuates a lot more i.e. it can abruptly even double-up (I haven't yet been able to find out why).

我最近一直试图探索IO性能。根据我的观察,直接写入FileOutputStream可以获得更好的结果;我将其归因于FileOutputStream的本地调用write(byte [],int,int)。此外,我还观察到当BufferedOutputStream的延迟开始收敛到直接FileOutputStream的延迟时,它会波动很多,即它甚至可以突然加倍(我还没有找到原因)。

P.S. I am using Java 8 and will not be able to comment right now on whether my observations will hold for previous java versions.

附:我正在使用Java 8,现在无法评论我的观察是否适用于以前的java版本。

Here's the code I tested, where my input was a ~10KB file

这是我测试的代码,我输入的是一个~10KB的文件

public class WriteCombinationsOutputStreamComparison {
    private static final Logger LOG = LogManager.getLogger(WriteCombinationsOutputStreamComparison.class);

public static void main(String[] args) throws IOException {

    final BufferedInputStream input = new BufferedInputStream(new FileInputStream("src/main/resources/inputStream1.txt"), 4*1024);
    final ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
    int data = input.read();
    while (data != -1) {
        byteArrayOutputStream.write(data); // everything comes in memory
        data = input.read();
    }
    final byte[] bytesRead = byteArrayOutputStream.toByteArray();
    input.close();

    /*
     * 1. WRITE USING A STREAM DIRECTLY with entire byte array --> FileOutputStream directly uses a native call and writes
     */
    try (OutputStream outputStream = new FileOutputStream("src/main/resources/outputStream1.txt")) {
        final long begin = System.nanoTime();
        outputStream.write(bytesRead);
        outputStream.flush();
        final long end = System.nanoTime();
        LOG.info("Total time taken for file write, writing entire array [nanos=" + (end - begin) + "], [bytesWritten=" + bytesRead.length + "]");
        if (LOG.isDebugEnabled()) {
            LOG.debug("File reading result was: \n" + new String(bytesRead, Charset.forName("UTF-8")));
        }
    }

    /*
     * 2. WRITE USING A BUFFERED STREAM, write entire array
     */

    // changed the buffer size to different combinations --> write latency fluctuates a lot for same buffer size over multiple runs
    try (BufferedOutputStream outputStream = new BufferedOutputStream(new FileOutputStream("src/main/resources/outputStream1.txt"), 16*1024)) {
        final long begin = System.nanoTime();
        outputStream.write(bytesRead);
        outputStream.flush();
        final long end = System.nanoTime();
        LOG.info("Total time taken for buffered file write, writing entire array [nanos=" + (end - begin) + "], [bytesWritten=" + bytesRead.length + "]");
        if (LOG.isDebugEnabled()) {
            LOG.debug("File reading result was: \n" + new String(bytesRead, Charset.forName("UTF-8")));
        }
    }
}
}

OUTPUT:

OUTPUT:

2017-01-30 23:38:59.064 [INFO] [main] [WriteCombinationsOutputStream] - Total time taken for file write, writing entire array [nanos=100990], [bytesWritten=11059]

2017-01-30 23:38:59.086 [INFO] [main] [WriteCombinationsOutputStream] - Total time taken for buffered file write, writing entire array [nanos=142454], [bytesWritten=11059]