如何配置文件I / O?

时间:2022-06-25 20:43:44

Our build is annoyingly slow. It's a Java system built with Ant, and I'm running mine on Windows XP. Depending on the hardware, it can take between 5 to 15 minutes to complete.

我们的构建非常缓慢。它是一个用Ant构建的Java系统,我在Windows XP上运行它。根据硬件的不同,可能需要5到15分钟才能完成。

Watching overall performance metrics on the machine, as well as correlating hardware differences with build times, indicates that the process is I/O bound. It also shows that the process does a lot more reading than writing.

查看计算机上的整体性能指标,以及将硬件差异与构建时间相关联,表明该进程受I / O限制。它还表明,该过程比写作更能阅读。

However, I haven't found a good way to determine which files are being read or written, and how many times. My suspicion is that with our many subprojects and subsequent invocations of the compiler, the build is re-reading the same commonly used libraries many times.

但是,我还没有找到一种很好的方法来确定正在读取或写入的文件,以及多少次。我怀疑,在我们的许多子项目和随后的编译器调用中,构建多次重读相同的常用库。

What are some profiling tools that will tell me what a given process is doing with which files? Free is nice, but not essential.

什么是一些分析工具会告诉我给定的进程在使用哪些文件?免费很好,但不是必需的。


Using Process Monitor, as suggested by Jon Skeet, I was able to confirm my suspicion: almost all of the disk activity was reading and re-reading of libraries, with the JDK's copies of "rt.jar" and other libraries at the top of the list. I can't make a RAM disk large enough to hold all the libraries I used, but mounting the "hottest" libraries on a RAM disk cut build times by about 40%; clearly, Windows file system caching isn't doing a good enough job, even though I've told Windows to optimize for that.

使用过程监视器,正如Jon Skeet建议的那样,我能够证实我的怀疑:几乎所有的磁盘活动都是读取和重新读取库,JDK的“rt.jar”副本和其他库位于顶部。列表。我不能使RAM​​磁盘足够大以容纳我使用的所有库,但是将“最热”的库安装在RAM磁盘上会使构建时间缩短约40%;很明显,Windows文件系统缓存工作做得不够好,尽管我告诉Windows要优化它。

One interesting thing I noticed is that the typical 'read' operation on a JAR file is just a few dozen bytes; usually there are two or three of these, followed by a skip several kilobytes further on in the file. It appeared to be ill-suited to bulk reads.

我注意到一件有趣的事情是,JAR文件上的典型“读取”操作只有几十个字节;通常有两个或三个,然后在文件中进一步跳过几千字节。它似乎不适合批量读取。

I'm going to do more testing with all of my third-party libraries on a flash drive, and see what effect that has.

我将在闪存驱动器上对我的所有第三方库进行更多测试,看看它有什么影响。

5 个解决方案

#1


If you only need it for Windows, SysInternals Process Monitor should show you everything you need to know. You can select the process, then see each operation as it goes and get a summary of file operation as well.

如果您只需要Windows,SysInternals Process Monitor应该向您展示您需要知道的一切。您可以选择该过程,然后查看每个操作,并获取文件操作的摘要。

#2


An oldie but a goodie: create a RAM disk and compile your files from there.

一个老人但是好东西:创建一个RAM磁盘并从那里编译你的文件。

#3


Back when I still used Windows I used to get good results speeding my build up by having all build output written to a separate partition if maybe 3 GB in size, and periodically formatting that at night once a week via a scheduled task. It's just build output, so it doesn't matter if it gets unilaterally flattened occasionally.

回到我仍然使用Windows时,我曾经通过将所有构建输出写入单独的分区(如果大小为3 GB)并通过计划任务定期每周一次格式化来获得良好的结果来加速我的构建。它只是构建输出,所以它偶尔会被单边扁平化并不重要。

But honestly, since moving to Linux, disk fragmentation is something I never worry about any more.

但老实说,自从迁移到Linux以来,磁盘碎片是我再也不用担心的了。

Another reason to try your build on Linux, at least once, is so that you can run strace (grepped for calls to open) to see what files your build is touching.

尝试在Linux上进行构建的另一个原因是至少一次,这样你就可以运行strace(grepped for calls to open)来查看你的构建所涉及的文件。

#4


I used to build a massive Java webapp (JSP frontend) using Ant on Windows and it would take upwards of 3 minutes. I wiped my computer and installed Linux, and suddenly the builds took 18 seconds. Those are real numbers, albeit about 3 years old. I can only assume that Java prefers the Linux memory management and threading models to the Windows equivalents, as all Java programs appear to run better under Linux in my experience (especially Eclipse). Linux seems a lot better about preventing extra reads from the disk when you're doing a lot of reading of files that haven't changed (i.e. exectuables and libraries). This may be a property of the disk cache or the filesystem, I'm not sure which.

我曾经在Windows上使用Ant构建一个庞大的Java webapp(JSP前端),这需要花费3分钟。我擦了我的电脑并安装了Linux,突然间构建耗时18秒。这些都是实数,尽管大约3岁。我只能假设Java更喜欢Linux内存管理和线程模型到Windows等价物,因为根据我的经验(特别是Eclipse),所有Java程序似乎在Linux下运行得更好。当您正在大量读取未更改的文件(即可执行文件和库)时,Linux似乎要更好地防止磁盘上的额外读取。这可能是磁盘缓存或文件系统的属性,我不知道哪个。

One of the great things about Java is that it's cross-platform, so setting up a Linux-based build server is actually an option for you. Being something of a Linux evangelist, I'd of course prefer to see you switch your dev environment to Linux, but I know that a lot of people don't want to do that (or can't for practical reasons).

Java的一大优点是它是跨平台的,因此设置基于Linux的构建服务器实际上是一种选择。作为一名Linux传播者,我当然更愿意看到你将开发环境转换为Linux,但我知道很多人不想这样做(或者出于实际原因不能这样做)。

If you're not willing to even set up a Linux build server to see if it runs faster, you could at least try defragmenting your Windows machine's hard drive. That makes a huge difference for C++ builds on my work computer. Try JkDefrag, which seems a lot better than the defragmenter that comes with Windows.

如果您不愿意设置Linux构建服务器以查看它是否运行得更快,您至少可以尝试对Windows机器的硬盘进行碎片整理。这对我的工作计算机上的C ++构建产生了巨大的影响。试试JkDefrag,它似乎比Windows附带的碎片整理程序好很多。

EDIT: I'd assume I got a downvote because my answer doesn't address the exact question asked. It is, however, in the tradition of * to help people fix their real problem, not just treat the symptoms. I'm not one of those people for whom the answer to every question is "use linux". In this instance, however, I have very real, measured performance gains in exactly the situation the OP is asking about, so I thought it worth sharing my experiences.

编辑:我认为我得到了一个downvote,因为我的答案没有解决问题的确切问题。然而,*的传统是帮助人们解决他们的真正问题,而不仅仅是治疗症状。我不是那些对每个问题的答案都是“使用linux”的人之一。然而,在这种情况下,我在OP所询问的情况下有非常真实的,可测量的性能提升,因此我认为值得分享我的经验。

#5


Actually FileMon is a more direct tool than ProcMon. In general, when running performance analysis for disk I/O, consider the following two:

实际上FileMon是比ProcMon更直接的工具。通常,在运行磁盘I / O的性能分析时,请考虑以下两点:

  • Throughput (speed of read/write of bytes per second)
  • 吞吐量(每秒读取/写入字节的速度)

  • Latency (how much in waiting in the queue for read/write)
  • 延迟(在队列中等待读/写多少)

Once you evaluate the performance of your system in terms of the above, it is easy to identify the bottleneck and take corrective action: get faster disks or change your code (whichever works out cheaper).

根据上述内容评估系统性能后,很容易识别出瓶颈并采取纠正措施:获得更快的磁盘或更改代码(以更便宜的价格为准)。

#1


If you only need it for Windows, SysInternals Process Monitor should show you everything you need to know. You can select the process, then see each operation as it goes and get a summary of file operation as well.

如果您只需要Windows,SysInternals Process Monitor应该向您展示您需要知道的一切。您可以选择该过程,然后查看每个操作,并获取文件操作的摘要。

#2


An oldie but a goodie: create a RAM disk and compile your files from there.

一个老人但是好东西:创建一个RAM磁盘并从那里编译你的文件。

#3


Back when I still used Windows I used to get good results speeding my build up by having all build output written to a separate partition if maybe 3 GB in size, and periodically formatting that at night once a week via a scheduled task. It's just build output, so it doesn't matter if it gets unilaterally flattened occasionally.

回到我仍然使用Windows时,我曾经通过将所有构建输出写入单独的分区(如果大小为3 GB)并通过计划任务定期每周一次格式化来获得良好的结果来加速我的构建。它只是构建输出,所以它偶尔会被单边扁平化并不重要。

But honestly, since moving to Linux, disk fragmentation is something I never worry about any more.

但老实说,自从迁移到Linux以来,磁盘碎片是我再也不用担心的了。

Another reason to try your build on Linux, at least once, is so that you can run strace (grepped for calls to open) to see what files your build is touching.

尝试在Linux上进行构建的另一个原因是至少一次,这样你就可以运行strace(grepped for calls to open)来查看你的构建所涉及的文件。

#4


I used to build a massive Java webapp (JSP frontend) using Ant on Windows and it would take upwards of 3 minutes. I wiped my computer and installed Linux, and suddenly the builds took 18 seconds. Those are real numbers, albeit about 3 years old. I can only assume that Java prefers the Linux memory management and threading models to the Windows equivalents, as all Java programs appear to run better under Linux in my experience (especially Eclipse). Linux seems a lot better about preventing extra reads from the disk when you're doing a lot of reading of files that haven't changed (i.e. exectuables and libraries). This may be a property of the disk cache or the filesystem, I'm not sure which.

我曾经在Windows上使用Ant构建一个庞大的Java webapp(JSP前端),这需要花费3分钟。我擦了我的电脑并安装了Linux,突然间构建耗时18秒。这些都是实数,尽管大约3岁。我只能假设Java更喜欢Linux内存管理和线程模型到Windows等价物,因为根据我的经验(特别是Eclipse),所有Java程序似乎在Linux下运行得更好。当您正在大量读取未更改的文件(即可执行文件和库)时,Linux似乎要更好地防止磁盘上的额外读取。这可能是磁盘缓存或文件系统的属性,我不知道哪个。

One of the great things about Java is that it's cross-platform, so setting up a Linux-based build server is actually an option for you. Being something of a Linux evangelist, I'd of course prefer to see you switch your dev environment to Linux, but I know that a lot of people don't want to do that (or can't for practical reasons).

Java的一大优点是它是跨平台的,因此设置基于Linux的构建服务器实际上是一种选择。作为一名Linux传播者,我当然更愿意看到你将开发环境转换为Linux,但我知道很多人不想这样做(或者出于实际原因不能这样做)。

If you're not willing to even set up a Linux build server to see if it runs faster, you could at least try defragmenting your Windows machine's hard drive. That makes a huge difference for C++ builds on my work computer. Try JkDefrag, which seems a lot better than the defragmenter that comes with Windows.

如果您不愿意设置Linux构建服务器以查看它是否运行得更快,您至少可以尝试对Windows机器的硬盘进行碎片整理。这对我的工作计算机上的C ++构建产生了巨大的影响。试试JkDefrag,它似乎比Windows附带的碎片整理程序好很多。

EDIT: I'd assume I got a downvote because my answer doesn't address the exact question asked. It is, however, in the tradition of * to help people fix their real problem, not just treat the symptoms. I'm not one of those people for whom the answer to every question is "use linux". In this instance, however, I have very real, measured performance gains in exactly the situation the OP is asking about, so I thought it worth sharing my experiences.

编辑:我认为我得到了一个downvote,因为我的答案没有解决问题的确切问题。然而,*的传统是帮助人们解决他们的真正问题,而不仅仅是治疗症状。我不是那些对每个问题的答案都是“使用linux”的人之一。然而,在这种情况下,我在OP所询问的情况下有非常真实的,可测量的性能提升,因此我认为值得分享我的经验。

#5


Actually FileMon is a more direct tool than ProcMon. In general, when running performance analysis for disk I/O, consider the following two:

实际上FileMon是比ProcMon更直接的工具。通常,在运行磁盘I / O的性能分析时,请考虑以下两点:

  • Throughput (speed of read/write of bytes per second)
  • 吞吐量(每秒读取/写入字节的速度)

  • Latency (how much in waiting in the queue for read/write)
  • 延迟(在队列中等待读/写多少)

Once you evaluate the performance of your system in terms of the above, it is easy to identify the bottleneck and take corrective action: get faster disks or change your code (whichever works out cheaper).

根据上述内容评估系统性能后,很容易识别出瓶颈并采取纠正措施:获得更快的磁盘或更改代码(以更便宜的价格为准)。