针对大型应用程序的JVM性能调优

时间:2022-04-07 17:20:47

The default JVM parameters are not optimal for running large applications. Any insights from people who have tuned it on a real application would be helpful. We are running the application on a 32-bit windows machine, where the client JVM is used by default. We have added -server and changed the NewRatio to 1:3 (A larger young generation).

默认JVM参数不是运行大型应用程序的最佳选择。来自在真实应用程序上进行调整的人的任何见解都会有所帮助。我们在32位Windows机器上运行应用程序,默认使用客户机JVM。我们添加了-server并将NewRatio更改为1:3(更大的年轻一代)。

Any other parameters/tuning which you have tried and found useful?

您尝试过并发现有用的任何其他参数/​​调整?

[Update] The specific type of application I'm talking about is a server application that are rarely shutdown, taking at least -Xmx1024m. Also assume that the application is profiled already. I'm looking for general guidelines in terms of JVM performance only.

[更新]我正在谈论的特定类型的应用程序是一个很少关闭的服务器应用程序,至少需要-Xmx1024m。还假设已经分析了应用程序。我正在寻找仅在JVM性能方面的一般指导原则。

7 个解决方案

#1


17  

There are great quantities of that information around.

周围有大量的信息。

First, profile the code before tuning the JVM.

首先,在调优JVM之前分析代码。

Second, read the JVM documentation carefully; there are a lot of sort of "urban legends" around. For example, the -server flag only helps if the JVM is staying resident and running for some time; -server "turns up" the JIT/HotSpot, and that needs to have many passes through the same path to get turned up. -server, on the other hand, slows initial execution of the JVM, as there's more setup time.

其次,仔细阅读JVM文档;周围有很多“城市传说”。例如,-server标志仅在JVM保持驻留并运行一段时间时才有用; -server“关闭”JIT / HotSpot,并且需要通过相同的路径进行多次传递才能启动。另一方面,-server减慢了JVM的初始执行速度,因为设置时间更长。

There are several good books and websites around. See, for example, http://www.javaperformancetuning.com/

周围有好几本好书和网站。例如,请参阅http://www.javaperformancetuning.com/

#2


19  

Foreword

Background

Been at a Java shop. Spent entire months dedicated to running performance tests on distributed systems, the main apps being in Java. Some of which implying products developed and sold by Sun themselves (then Oracle).

去过Java商店。花了整整几个月专门在分布式系统上运行性能测试,主要应用程序是Java。其中一些暗示Sun自己(然后是Oracle)开发和销售的产品。

I will go over the lessons I learned, some history about the JVM, some talks about the internals, a couple of parameters explained and finally some tuning. Trying to keep it to the point so you can apply it in practice.

我将介绍我学到的经验教训,一些关于JVM的历史,一些关于内部的讨论,一些参数的解释以及最后的一些调整。试着保持这一点,以便你可以在实践中应用它。

Things are changing fast in the Java world so part of it might be already outdated since the last year I've done all that. (Is Java 10 out already?)

Java世界的情况正在快速变化,因此自从去年我完成所有这些工作以来,其中一部分可能已经过时了。 (Java 10已经出来了吗?)

Good Practices

What you SHOULD do: benchmark, Benchmark, BENCHMARK!

When you really need to know about performances, you need to perform real benchmarks, specific to your workload. There is no alternatives.

当您真正需要了解性能时,您需要针对您的工作负载执行真正的基准测试。没有其他选择。

Also, you should monitor the JVM. Enable monitoring. The good applications usually provide a monitoring web page and/or an API. Otherwise there is the common Java tooling (JVisualVM, JMX, hprof, and some JVM flags).

此外,您应该监视JVM。启用监控。好的应用程序通常提供监视网页和/或API。否则就有常见的Java工具(JVisualVM,JMX,hprof和一些JVM标志)。

Be aware that there is usually no performance to gain by tuning the JVM. It's more a "to crash or not to crash, finding the transition point". It's about knowing that when you give that amount of resources to your application, you can consistently expect that amount of performances in return. Knowledge is power.

请注意,通过调整JVM通常无法获得性能。它更像是“崩溃或不崩溃,找到过渡点”。这是关于知道当你为你的应用程序提供大量资源时,你可以始终如一地期待大量的表现。知识就是力量。

Performances is mostly dictated by your application. If you want faster, you gotta write better code.

表演主要取决于您的申请。如果你想要更快,你必须编写更好的代码。

What you WILL do most of the time: Live with reliable sensitive defaults

We don't get time to optimize and tune every single application out there. Most of the time we'll simply live with sensible defaults.

我们没有时间来优化和调整每个应用程序。大多数时候,我们只会采用合理的默认设置。

The first thing to do when configuring a new application is to read the documentation. Most of the serious applications comes with a guide for performance tuning, including advice on JVM settings.

配置新应用程序时要做的第一件事是阅读文档。大多数严肃的应用程序都附带了性能调优指南,包括有关JVM设置的建议。

Then you can configure the application: JAVA_OPTS: -server -Xms???g -Xmx???g

然后你可以配置应用程序:JAVA_OPTS:-server -Xms ??? g -Xmx ??? g

  • -server: enable full optimizations (this flag is automatic on most JVM nowadays)
  • -server:启用完全优化(此标志现在在大多数JVM上是自动的)

  • -Xms -Xmx: set the minimum and maximum heap (always the same value for both, that's about the only optimizations to do).
  • -Xms -Xmx:设置最小和最大堆(两者总是相同的值,这是唯一要做的优化)。

Well done, you know about all the optimization parameters there is to know about the JVM, congratulations! That was simple :D

干得好,你知道有关JVM的所有优化参数,恭喜!这很简单:D

What you SHALL NOT do, EVER:

Please do NOT copy random string you found on the internet, especially when they take multiple lines like that:

请不要复制您在互联网上找到的随机字符串,特别是当他们采取多行时:

-server  -Xms1g -Xmx1g  -XX:PermSize=1g -XX:MaxPermSize=256m  -Xmn256m -Xss64k  -XX:SurvivorRatio=30  -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled  -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=10  -XX:+ScavengeBeforeFullGC -XX:+CMSScavengeBeforeRemark  -XX:+PrintGCDateStamps -verbose:gc -XX:+PrintGCDetails -Dsun.net.inetaddr.ttl=5  -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=`date`.hprof   -Dcom.sun.management.jmxremote.port=5616 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -server -Xms2g -Xmx2g -XX:MaxPermSize=256m -XX:NewRatio=1 -XX:+UseConcMarkSweepGC

For instance, this thing found on the first page of google is plain terrible. There are arguments specified multiples times with conflicting values. Some are just forcing the JVM defaults (eventually the defaults from 2 JVM versions ago). A few are obsolete and simply ignored. And finaly at least one parameter is so invalid that it will consistently crash the JVM at startup by it's mere existence.

例如,谷歌第一页上发现的这件事很明显。有些参数指定了具有冲突值的倍数。有些只是强制JVM默认值(最终默认来自2个JVM版本)。有些是过时的,完全被忽略了。最后,至少有一个参数是如此无效,以至于它在启动时会一直使JVM崩溃。

Actual tuning

How do you choose the memory size:

Read the guide from your application, it should give some indication. Monitor production and adjust afterwards. Perform some benchmarks if you need accuracy.

从您的应用程序阅读指南,它应该给出一些指示。监控生产并随后进行调整。如果您需要准确性,请执行一些基准测试

Important Note: The java process will take up to max heap PLUS 10%. The X% overhead being the heap management, not included in the heap itself.

重要说明:java进程最多需要10%的最大堆。 X%开销是堆管理,不包含在堆本身中。

All the memory is usually preallocated by the process on startup. You may see the process using max heap ALL THE TIME. It's simply not true. You need to use Java monitoring tools to see what is really being used.

所有内存通常由启动时的进程预先分配。您可以使用max heap ALL THE TIME查看进程。这根本不是真的。您需要使用Java监视工具来查看实际使用的内容。

Finding the right size:

找到合适的尺寸:

  • If it crashes with OutOfMemoryException, it ain't enough memory
  • 如果它与OutOfMemoryException崩溃,则内存不足

  • If it doesn't crash with OutOfMemoryException, it's too much memory
  • 如果它没有与OutOfMemoryException崩溃,那就是内存太多了

  • If it's too much memory BUT the hardware got it and/or is already paid for, it's the perfect number, job done!
  • 如果内存太多,但是硬件得到它和/或已经付费,那就是完美的数字,完成工作!

JVM6 is bronze, JVM7 is gold, JVM8 is platinum...

The JVM is forever improving. Garbage Collection is a very complex thing and there are a lot of very smart people working on it. It had tremendous improvements in the past decade and it will continue to do so.

JVM永远在改进。垃圾收集是一件非常复杂的事情,有很多非常聪明的人在研究它。它在过去十年中取得了巨大的进步,并将继续这样做。

For informational purpose. They are at least 4 available Garbage Collectors in Oracle Java 7-8 (HotSpot) and OpenJDK 7-8. (Other JVM may be entirely different e.g. Android, IBM, embedded):

用于提供信息。它们是Oracle Java 7-8(HotSpot)和OpenJDK 7-8中至少4个可用的垃圾收集器。 (其他JVM可能完全不同,例如Android,IBM,嵌入式):

  • SerialGC
  • ParallelGC
  • ConcurrentMarkSweepGC
  • G1GC
  • (plus variants and settings)
  • (加上变种和设置)

[Starting from Java 7 and onward. The Oracle and OpenJDK code are partially shared. The GC should be (mostly) the same on both platforms.]

[从Java 7开始,然后继续。 Oracle和OpenJDK代码是部分共享的。 GC(在大多数情况下)应该在两个平台上都相同。

JVM >= 7 have many optimizations and pick decent defaults. It changes a bit by platform. It balances multiple things. For instance deciding to enable multicore optimizations or not whether the CPU has multiple cores. You should let it do it. Do not change or force GC settings.

JVM> = 7有很多优化并且选择了不错的默认值。它按平台改变了一点。它平衡了很多东西。例如,决定是否启用多核优化,而不是CPU是否具有多个核心。你应该让它做到。请勿更改或强制GC设置。

It's okay to let the computer takes decision for you (that's what computers are for). It's better to have the JVM settings being 95%-optimal all the time than forcing a "always 8 core aggressive collection for lower pause times" on all the boxes, half of them being t2.small in the end.

让计算机为你决定(这就是计算机的用途)是可以的。最好让JVM设置始终保持95%的最佳状态,而不是在所有方框上强制“总是8核心积极收集以减少暂停时间”,其中一半最终为t2.small。

Exception: When the application comes with a performance guide and specific tuning in place. It's perfectly okay to leave the provided settings as is.

例外:当应用程序附带性能指南和特定调整时。保留所提供的设置是完全可以的。

Tip: Moving to a newer JVM to benefit from the latest improvements can sometimes provide a good boost without much effort.

提示:迁移到较新的JVM以从最新的改进中受益有时可以提供很好的提升而不需要太多努力。

Special Case: -XX:+UseCompressedOops

The JVM has a special setting that forces using 32bits index internally (read: pointers-like). That allows to address 4 294 967 295 objects * 8 bytes address => 32 GB of memory. (NOT to be confused with the 4GB address space for REAL pointers).

JVM有一个特殊的设置,强制在内部使用32位索引(读取:指针式)。这允许寻址4 294 967 295个对象* 8个字节地址=> 32 GB的内存。 (不要与REAL指针的4GB地址空间混淆)。

It reduces the overall memory consumption with a potential positive impact on all caching levels.

它可以降低整体内存消耗,并对所有缓存级别产生潜在的积极影响。

Real life example: ElasticSearch documentation states that a running 32GB 32bits node may be equivalent to a 40GB 64bits node in terms of actual data kept in memory.

现实生活中的例子:ElasticSearch文档指出,就内存中保存的实际数据而言,运行的32GB 32位节点可能相当于40GB的64位节点。

A note on history: The flag was known to be unstable in pre-java-7 era (maybe even pre-java-6). It's been working perfectly in newer JVM for a while.

关于历史的一个注释:在java-7之前的时代(甚至可能是java-6之前的版本),已知旗帜是不稳定的。它已经在新的JVM中完美地工作了一段时间。

Java HotSpot™Virtual Machine Performance Enhancements

Java HotSpot™虚拟机性能增强

[...] In Java SE 7, use of compressed oops is the default for 64-bit JVM processes when -Xmx isn't specified and for values of -Xmx less than 32 gigabytes. For JDK 6 before the 6u23 release, use the -XX:+UseCompressedOops flag with the java command to enable the feature.

[...]在Java SE 7中,当未指定-Xmx且-Xmx值小于32千兆字节时,使用压缩oops是64位JVM进程的缺省值。对于6u23发行版之前的JDK 6,请使用-XX:+ UseCompressedOops标志和java命令来启用该功能。

See: Once again the JVM is lights years ahead over manual tuning. Still, it's interesting to know about it =)

请参阅:JVM在手动调整方面再次亮相。知道它仍然很有趣=)

Special Case: -XX:+UseNUMA

Non-uniform memory access (NUMA) is a computer memory design used in multiprocessing, the memory access time depends on the memory location relative to the processor. Source: Wikipedia

非均匀存储器访问(NUMA)是用于多处理的计算机存储器设计,存储器访问时间取决于相对于处理器的存储器位置。资料来源:*

Modern systems have extremely complex memory architectures with multiple layers of memory and caches, either private and shared, across cores and CPU.

现代系统具有极其复杂的内存架构,在内核和CPU上具有多层内存和高速缓存(私有和共享)。

Quite obviously accessing a data in the L2 cache in the current processor is A LOT faster than having to go all the way to a memory stick from another socket.

很明显,在当前处理器中访问L2缓存中的数据比从另一个套接字一直到记忆棒要快得多。

I believe that all multi-socket systems sold today are NUMA by design, while all consumers systems are NOT. Check whether your server supports NUMA with the command numactl --show on linux.

我相信今天销售的所有多插槽系统都是NUMA设计,而所有消费者系统都不是。使用linux上的命令numactl --show检查您的服务器是否支持NUMA。

The NUMA-aware flag tells the JVM to optimize memory allocations for the underlying hardware topology.

NUMA-aware标志告诉JVM优化底层硬件拓扑的内存分配。

The performance boost can be substantial (i.e. two digits: +XX%). In fact someone switching from a "NOT-NUMA 10CPU 100GB" to a "NUMA 40CPU 400GB" might experience a [dramatic] loss in performances if he doesn't know about the flag.

性能提升可能很大(即两位数:+ XX%)。实际上有人从“NOT-NUMA 10CPU 100GB”切换到“NUMA 40CPU 400GB”如果他不了解旗帜,可能会遇到[戏剧性]性能损失。

Note: There are discussions to detect NUMA and set the flag automatically in the JVM http://openjdk.java.net/jeps/163

注意:有讨论要检测NUMA并在JVM中自动设置标志http://openjdk.java.net/jeps/163

Bonus: All applications intending to run on big fat hardware (i.e. NUMA) needs to be optimized for it. It is not specific to Java applications.

额外奖励:所有打算在大型硬件硬件(即NUMA)上运行的应用程序都需要针对它进行优化。它不是特定于Java应用程序。

Toward the future: -XX:+UseG1GC

The latest improvement in Garbage Collection is the G1 collector (read: Garbage First).

垃圾收集的最新改进是G1收集器(阅读:Garbage First)。

It is intended for high cores, high memory systems. At the absolute minimum 4 cores + 6 GB memory. It is targeted toward databases and memory intensive applications using 10 times that and beyond.

它适用于高内核,高内存系统。绝对最少4核+ 6 GB内存。它使用10次以上的数据库和内存密集型应用程序。

Short version, at these sizes the traditional GC are facing too much data to process at once and pauses are getting out of hand. The G1 splits the heap in many small sections that can be managed independently and in parallel while the application is running.

简短版本,在这些尺寸下,传统的GC面临着过多的数据需要立即处理,而暂停则失控。 G1在许多小部分中拆分堆,这些部分可以在应用程序运行时独立并行地进行管理。

The first version was available in 2013. It is mature enough for production now but it will not be going as default anytime soon. That is worth a try for large applications.

第一个版本于2013年推出。它已经足够成熟,可以立即投入生产,但不会很快成为默认版本。这对于大型应用程序来说值得一试。

Do not touch: Generation Sizes (NewGen, PermGen...)

The GC split the memory in multiple sections. (Not getting into details, you can google "Java GC Generations".)

GC将内存分成多个部分。 (没有详细说明,你可以google“Java GC Generations”。)

The last time I've been spending a week to try 20 different combination of generations flags on an app taking 10000 hit/s. I was getting a magnificent boost ranging from -1% to +1%.

我最后一次花了一个星期的时间在一个应用程序上尝试20个不同的代组合标志,达到10000次/秒。我获得了-1%到+ 1%的惊人提升。

Java GC generations are an interesting topic to read papers on or to write one about. They are not a thing to tune unless you're part of the 1% who can devote substantial time for negligible gains among the 1% of people who really need optimizations.

Java GC代是一个有趣的主题,可以阅读论文或撰写论文。它们不是一个可以调整的东西,除非你是1%的人中的一员,他们可以在1%真正需要优化的人中投入大量时间获得微不足道的收益。

Conclusion

Hope this can help you. Have fun with the JVM.

希望这可以帮到你。享受JVM的乐趣。

Java is the best language and the best platform in the world! Go spread the love :D

Java是世界上最好的语言和最好的平台!去传播爱情:D

#3


7  

Look here (or do a google search for hotspot tuning) http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html

看这里(或者谷歌搜索热点调整)http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html

You definitely want to profile your app before you try to tune the vm. NetBeans has a nice profiler built into it that will let you see all sorts of things.

在尝试调整虚拟机之前,您肯定想要分析您的应用。 NetBeans内置了一个很好的分析器,可以让你看到各种各样的东西。

I once had someone tell me that the GC was broken for their app - I looked at the code and found that they never closed any of their database query results so they were retaining massive amounts of byte arrays. Once we closed the results the time went from over 20 mins and a GB of memory to about 2 mins and a very small amount of memory. They were able to remove the JVM tuning parameters and things were happy.

我曾经有人告诉我,他们的应用程序中断了GC - 我查看了代码,发现他们从未关闭任何数据库查询结果,因此他们保留了大量的字节数组。一旦我们关闭结果,时间从超过20分钟和GB内存到大约2分钟和非常少量的内存。他们能够删除JVM调整参数,事情很开心。

#4


1  

I suggest you profile your application with CPU sampling and object allocation monitoring turned on at the same time. You will find you get very different results which can be helpful in tuning your code. Also try using the built in hprof profiler, it can give very different results as well.

我建议您在同时启用CPU采样和对象分配监视的情况下分析您的应用程序。您将发现得到的结果非常不同,这有助于调整代码。另外,尝试使用内置的hprof分析器,它也可以提供非常不同的结果。

In general profiling your application makes much more difference than JVM args.

一般来说,分析应用程序比JVM args有更大的不同。

#5


1  

The absolute best way to answer this is to perform controlled testing on the application in as close to a 'production' environment as you can create. It's quite possible that the use of -server, a reasonable starting heap size and the relatively smart behavior of recent JVMs will behave as well or better than the vast majority of settings one would normally try.

回答这个问题的绝对最佳方法是在尽可能接近“生产”环境的情况下对应用程序执行受控测试。使用-server,合理的起始堆大小以及最近JVM的相对智能行为很可能会比通常尝试的绝大多数设置表现得更好或更好。

There is one specific exception to this broad generalization: in the case that you are running in a web container, there is a really high chance that you will want to increase the permanent generation settings.

这种广泛的概括有一个特定的例外:如果您在Web容器中运行,那么您很可能希望增加永久生成设置。

#6


1  

Java on 32-bit windows machine, your choices are limited. In my experience, the follow parameter setting will impact the application performance:

在32位Windows机器上的Java,您的选择是有限的。根据我的经验,以下参数设置将影响应用程序性能:

  1. memory sizes
  2. choice of GC collectors
  3. 选择GC收集器

  4. parameters related to GC collectors
  5. 与GC收集器相关的参数

#7


0  

This will be highly dependent on your application and the vendor and version of the JVM. You need to be clear about what you consider to be a performance problem. Are you concerned with certain critical sections of code? Have you profiled the app yet? Is the JVM spending too much time garbage collecting?

这将高度依赖于您的应用程序以及JVM的供应商和版本。您需要明确您认为的性能问题。您是否关注代码的某些关键部分?你有没有想过应用程序? JVM是否花费太多时间进行垃圾收集?

I would probably start with the -verbose:gc JVM option to watch how garbage collecting is working. Many times, the simplest fix to just increase the max heap size with -Xmx . If you learn to interpret the -verbose:gc output, it will tell you nearly all you need to know about tuning the JVM as a whole. But doing this alone will not magically make badly tuned code just go faster. Most of the JVM tuning options are designed to improve the performance of the garbage collector and/or memory sizes.

我可能会从-verbose:gc JVM选项开始,观察垃圾收集是如何工作的。很多时候,最简单的解决方法是使用-Xmx增加最大堆大小。如果您学习解释-verbose:gc输出,它将告诉您几乎所有关于调整整个JVM的知识。但单独做这件事并不会让错误调整的代码变得更快。大多数JVM调优选项旨在提高垃圾收集器和/或内存大小的性能。

For profiling, I like yourkit.com

对于分析,我喜欢yourkit.com

#1


17  

There are great quantities of that information around.

周围有大量的信息。

First, profile the code before tuning the JVM.

首先,在调优JVM之前分析代码。

Second, read the JVM documentation carefully; there are a lot of sort of "urban legends" around. For example, the -server flag only helps if the JVM is staying resident and running for some time; -server "turns up" the JIT/HotSpot, and that needs to have many passes through the same path to get turned up. -server, on the other hand, slows initial execution of the JVM, as there's more setup time.

其次,仔细阅读JVM文档;周围有很多“城市传说”。例如,-server标志仅在JVM保持驻留并运行一段时间时才有用; -server“关闭”JIT / HotSpot,并且需要通过相同的路径进行多次传递才能启动。另一方面,-server减慢了JVM的初始执行速度,因为设置时间更长。

There are several good books and websites around. See, for example, http://www.javaperformancetuning.com/

周围有好几本好书和网站。例如,请参阅http://www.javaperformancetuning.com/

#2


19  

Foreword

Background

Been at a Java shop. Spent entire months dedicated to running performance tests on distributed systems, the main apps being in Java. Some of which implying products developed and sold by Sun themselves (then Oracle).

去过Java商店。花了整整几个月专门在分布式系统上运行性能测试,主要应用程序是Java。其中一些暗示Sun自己(然后是Oracle)开发和销售的产品。

I will go over the lessons I learned, some history about the JVM, some talks about the internals, a couple of parameters explained and finally some tuning. Trying to keep it to the point so you can apply it in practice.

我将介绍我学到的经验教训,一些关于JVM的历史,一些关于内部的讨论,一些参数的解释以及最后的一些调整。试着保持这一点,以便你可以在实践中应用它。

Things are changing fast in the Java world so part of it might be already outdated since the last year I've done all that. (Is Java 10 out already?)

Java世界的情况正在快速变化,因此自从去年我完成所有这些工作以来,其中一部分可能已经过时了。 (Java 10已经出来了吗?)

Good Practices

What you SHOULD do: benchmark, Benchmark, BENCHMARK!

When you really need to know about performances, you need to perform real benchmarks, specific to your workload. There is no alternatives.

当您真正需要了解性能时,您需要针对您的工作负载执行真正的基准测试。没有其他选择。

Also, you should monitor the JVM. Enable monitoring. The good applications usually provide a monitoring web page and/or an API. Otherwise there is the common Java tooling (JVisualVM, JMX, hprof, and some JVM flags).

此外,您应该监视JVM。启用监控。好的应用程序通常提供监视网页和/或API。否则就有常见的Java工具(JVisualVM,JMX,hprof和一些JVM标志)。

Be aware that there is usually no performance to gain by tuning the JVM. It's more a "to crash or not to crash, finding the transition point". It's about knowing that when you give that amount of resources to your application, you can consistently expect that amount of performances in return. Knowledge is power.

请注意,通过调整JVM通常无法获得性能。它更像是“崩溃或不崩溃,找到过渡点”。这是关于知道当你为你的应用程序提供大量资源时,你可以始终如一地期待大量的表现。知识就是力量。

Performances is mostly dictated by your application. If you want faster, you gotta write better code.

表演主要取决于您的申请。如果你想要更快,你必须编写更好的代码。

What you WILL do most of the time: Live with reliable sensitive defaults

We don't get time to optimize and tune every single application out there. Most of the time we'll simply live with sensible defaults.

我们没有时间来优化和调整每个应用程序。大多数时候,我们只会采用合理的默认设置。

The first thing to do when configuring a new application is to read the documentation. Most of the serious applications comes with a guide for performance tuning, including advice on JVM settings.

配置新应用程序时要做的第一件事是阅读文档。大多数严肃的应用程序都附带了性能调优指南,包括有关JVM设置的建议。

Then you can configure the application: JAVA_OPTS: -server -Xms???g -Xmx???g

然后你可以配置应用程序:JAVA_OPTS:-server -Xms ??? g -Xmx ??? g

  • -server: enable full optimizations (this flag is automatic on most JVM nowadays)
  • -server:启用完全优化(此标志现在在大多数JVM上是自动的)

  • -Xms -Xmx: set the minimum and maximum heap (always the same value for both, that's about the only optimizations to do).
  • -Xms -Xmx:设置最小和最大堆(两者总是相同的值,这是唯一要做的优化)。

Well done, you know about all the optimization parameters there is to know about the JVM, congratulations! That was simple :D

干得好,你知道有关JVM的所有优化参数,恭喜!这很简单:D

What you SHALL NOT do, EVER:

Please do NOT copy random string you found on the internet, especially when they take multiple lines like that:

请不要复制您在互联网上找到的随机字符串,特别是当他们采取多行时:

-server  -Xms1g -Xmx1g  -XX:PermSize=1g -XX:MaxPermSize=256m  -Xmn256m -Xss64k  -XX:SurvivorRatio=30  -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled  -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=10  -XX:+ScavengeBeforeFullGC -XX:+CMSScavengeBeforeRemark  -XX:+PrintGCDateStamps -verbose:gc -XX:+PrintGCDetails -Dsun.net.inetaddr.ttl=5  -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=`date`.hprof   -Dcom.sun.management.jmxremote.port=5616 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -server -Xms2g -Xmx2g -XX:MaxPermSize=256m -XX:NewRatio=1 -XX:+UseConcMarkSweepGC

For instance, this thing found on the first page of google is plain terrible. There are arguments specified multiples times with conflicting values. Some are just forcing the JVM defaults (eventually the defaults from 2 JVM versions ago). A few are obsolete and simply ignored. And finaly at least one parameter is so invalid that it will consistently crash the JVM at startup by it's mere existence.

例如,谷歌第一页上发现的这件事很明显。有些参数指定了具有冲突值的倍数。有些只是强制JVM默认值(最终默认来自2个JVM版本)。有些是过时的,完全被忽略了。最后,至少有一个参数是如此无效,以至于它在启动时会一直使JVM崩溃。

Actual tuning

How do you choose the memory size:

Read the guide from your application, it should give some indication. Monitor production and adjust afterwards. Perform some benchmarks if you need accuracy.

从您的应用程序阅读指南,它应该给出一些指示。监控生产并随后进行调整。如果您需要准确性,请执行一些基准测试

Important Note: The java process will take up to max heap PLUS 10%. The X% overhead being the heap management, not included in the heap itself.

重要说明:java进程最多需要10%的最大堆。 X%开销是堆管理,不包含在堆本身中。

All the memory is usually preallocated by the process on startup. You may see the process using max heap ALL THE TIME. It's simply not true. You need to use Java monitoring tools to see what is really being used.

所有内存通常由启动时的进程预先分配。您可以使用max heap ALL THE TIME查看进程。这根本不是真的。您需要使用Java监视工具来查看实际使用的内容。

Finding the right size:

找到合适的尺寸:

  • If it crashes with OutOfMemoryException, it ain't enough memory
  • 如果它与OutOfMemoryException崩溃,则内存不足

  • If it doesn't crash with OutOfMemoryException, it's too much memory
  • 如果它没有与OutOfMemoryException崩溃,那就是内存太多了

  • If it's too much memory BUT the hardware got it and/or is already paid for, it's the perfect number, job done!
  • 如果内存太多,但是硬件得到它和/或已经付费,那就是完美的数字,完成工作!

JVM6 is bronze, JVM7 is gold, JVM8 is platinum...

The JVM is forever improving. Garbage Collection is a very complex thing and there are a lot of very smart people working on it. It had tremendous improvements in the past decade and it will continue to do so.

JVM永远在改进。垃圾收集是一件非常复杂的事情,有很多非常聪明的人在研究它。它在过去十年中取得了巨大的进步,并将继续这样做。

For informational purpose. They are at least 4 available Garbage Collectors in Oracle Java 7-8 (HotSpot) and OpenJDK 7-8. (Other JVM may be entirely different e.g. Android, IBM, embedded):

用于提供信息。它们是Oracle Java 7-8(HotSpot)和OpenJDK 7-8中至少4个可用的垃圾收集器。 (其他JVM可能完全不同,例如Android,IBM,嵌入式):

  • SerialGC
  • ParallelGC
  • ConcurrentMarkSweepGC
  • G1GC
  • (plus variants and settings)
  • (加上变种和设置)

[Starting from Java 7 and onward. The Oracle and OpenJDK code are partially shared. The GC should be (mostly) the same on both platforms.]

[从Java 7开始,然后继续。 Oracle和OpenJDK代码是部分共享的。 GC(在大多数情况下)应该在两个平台上都相同。

JVM >= 7 have many optimizations and pick decent defaults. It changes a bit by platform. It balances multiple things. For instance deciding to enable multicore optimizations or not whether the CPU has multiple cores. You should let it do it. Do not change or force GC settings.

JVM> = 7有很多优化并且选择了不错的默认值。它按平台改变了一点。它平衡了很多东西。例如,决定是否启用多核优化,而不是CPU是否具有多个核心。你应该让它做到。请勿更改或强制GC设置。

It's okay to let the computer takes decision for you (that's what computers are for). It's better to have the JVM settings being 95%-optimal all the time than forcing a "always 8 core aggressive collection for lower pause times" on all the boxes, half of them being t2.small in the end.

让计算机为你决定(这就是计算机的用途)是可以的。最好让JVM设置始终保持95%的最佳状态,而不是在所有方框上强制“总是8核心积极收集以减少暂停时间”,其中一半最终为t2.small。

Exception: When the application comes with a performance guide and specific tuning in place. It's perfectly okay to leave the provided settings as is.

例外:当应用程序附带性能指南和特定调整时。保留所提供的设置是完全可以的。

Tip: Moving to a newer JVM to benefit from the latest improvements can sometimes provide a good boost without much effort.

提示:迁移到较新的JVM以从最新的改进中受益有时可以提供很好的提升而不需要太多努力。

Special Case: -XX:+UseCompressedOops

The JVM has a special setting that forces using 32bits index internally (read: pointers-like). That allows to address 4 294 967 295 objects * 8 bytes address => 32 GB of memory. (NOT to be confused with the 4GB address space for REAL pointers).

JVM有一个特殊的设置,强制在内部使用32位索引(读取:指针式)。这允许寻址4 294 967 295个对象* 8个字节地址=> 32 GB的内存。 (不要与REAL指针的4GB地址空间混淆)。

It reduces the overall memory consumption with a potential positive impact on all caching levels.

它可以降低整体内存消耗,并对所有缓存级别产生潜在的积极影响。

Real life example: ElasticSearch documentation states that a running 32GB 32bits node may be equivalent to a 40GB 64bits node in terms of actual data kept in memory.

现实生活中的例子:ElasticSearch文档指出,就内存中保存的实际数据而言,运行的32GB 32位节点可能相当于40GB的64位节点。

A note on history: The flag was known to be unstable in pre-java-7 era (maybe even pre-java-6). It's been working perfectly in newer JVM for a while.

关于历史的一个注释:在java-7之前的时代(甚至可能是java-6之前的版本),已知旗帜是不稳定的。它已经在新的JVM中完美地工作了一段时间。

Java HotSpot™Virtual Machine Performance Enhancements

Java HotSpot™虚拟机性能增强

[...] In Java SE 7, use of compressed oops is the default for 64-bit JVM processes when -Xmx isn't specified and for values of -Xmx less than 32 gigabytes. For JDK 6 before the 6u23 release, use the -XX:+UseCompressedOops flag with the java command to enable the feature.

[...]在Java SE 7中,当未指定-Xmx且-Xmx值小于32千兆字节时,使用压缩oops是64位JVM进程的缺省值。对于6u23发行版之前的JDK 6,请使用-XX:+ UseCompressedOops标志和java命令来启用该功能。

See: Once again the JVM is lights years ahead over manual tuning. Still, it's interesting to know about it =)

请参阅:JVM在手动调整方面再次亮相。知道它仍然很有趣=)

Special Case: -XX:+UseNUMA

Non-uniform memory access (NUMA) is a computer memory design used in multiprocessing, the memory access time depends on the memory location relative to the processor. Source: Wikipedia

非均匀存储器访问(NUMA)是用于多处理的计算机存储器设计,存储器访问时间取决于相对于处理器的存储器位置。资料来源:*

Modern systems have extremely complex memory architectures with multiple layers of memory and caches, either private and shared, across cores and CPU.

现代系统具有极其复杂的内存架构,在内核和CPU上具有多层内存和高速缓存(私有和共享)。

Quite obviously accessing a data in the L2 cache in the current processor is A LOT faster than having to go all the way to a memory stick from another socket.

很明显,在当前处理器中访问L2缓存中的数据比从另一个套接字一直到记忆棒要快得多。

I believe that all multi-socket systems sold today are NUMA by design, while all consumers systems are NOT. Check whether your server supports NUMA with the command numactl --show on linux.

我相信今天销售的所有多插槽系统都是NUMA设计,而所有消费者系统都不是。使用linux上的命令numactl --show检查您的服务器是否支持NUMA。

The NUMA-aware flag tells the JVM to optimize memory allocations for the underlying hardware topology.

NUMA-aware标志告诉JVM优化底层硬件拓扑的内存分配。

The performance boost can be substantial (i.e. two digits: +XX%). In fact someone switching from a "NOT-NUMA 10CPU 100GB" to a "NUMA 40CPU 400GB" might experience a [dramatic] loss in performances if he doesn't know about the flag.

性能提升可能很大(即两位数:+ XX%)。实际上有人从“NOT-NUMA 10CPU 100GB”切换到“NUMA 40CPU 400GB”如果他不了解旗帜,可能会遇到[戏剧性]性能损失。

Note: There are discussions to detect NUMA and set the flag automatically in the JVM http://openjdk.java.net/jeps/163

注意:有讨论要检测NUMA并在JVM中自动设置标志http://openjdk.java.net/jeps/163

Bonus: All applications intending to run on big fat hardware (i.e. NUMA) needs to be optimized for it. It is not specific to Java applications.

额外奖励:所有打算在大型硬件硬件(即NUMA)上运行的应用程序都需要针对它进行优化。它不是特定于Java应用程序。

Toward the future: -XX:+UseG1GC

The latest improvement in Garbage Collection is the G1 collector (read: Garbage First).

垃圾收集的最新改进是G1收集器(阅读:Garbage First)。

It is intended for high cores, high memory systems. At the absolute minimum 4 cores + 6 GB memory. It is targeted toward databases and memory intensive applications using 10 times that and beyond.

它适用于高内核,高内存系统。绝对最少4核+ 6 GB内存。它使用10次以上的数据库和内存密集型应用程序。

Short version, at these sizes the traditional GC are facing too much data to process at once and pauses are getting out of hand. The G1 splits the heap in many small sections that can be managed independently and in parallel while the application is running.

简短版本,在这些尺寸下,传统的GC面临着过多的数据需要立即处理,而暂停则失控。 G1在许多小部分中拆分堆,这些部分可以在应用程序运行时独立并行地进行管理。

The first version was available in 2013. It is mature enough for production now but it will not be going as default anytime soon. That is worth a try for large applications.

第一个版本于2013年推出。它已经足够成熟,可以立即投入生产,但不会很快成为默认版本。这对于大型应用程序来说值得一试。

Do not touch: Generation Sizes (NewGen, PermGen...)

The GC split the memory in multiple sections. (Not getting into details, you can google "Java GC Generations".)

GC将内存分成多个部分。 (没有详细说明,你可以google“Java GC Generations”。)

The last time I've been spending a week to try 20 different combination of generations flags on an app taking 10000 hit/s. I was getting a magnificent boost ranging from -1% to +1%.

我最后一次花了一个星期的时间在一个应用程序上尝试20个不同的代组合标志,达到10000次/秒。我获得了-1%到+ 1%的惊人提升。

Java GC generations are an interesting topic to read papers on or to write one about. They are not a thing to tune unless you're part of the 1% who can devote substantial time for negligible gains among the 1% of people who really need optimizations.

Java GC代是一个有趣的主题,可以阅读论文或撰写论文。它们不是一个可以调整的东西,除非你是1%的人中的一员,他们可以在1%真正需要优化的人中投入大量时间获得微不足道的收益。

Conclusion

Hope this can help you. Have fun with the JVM.

希望这可以帮到你。享受JVM的乐趣。

Java is the best language and the best platform in the world! Go spread the love :D

Java是世界上最好的语言和最好的平台!去传播爱情:D

#3


7  

Look here (or do a google search for hotspot tuning) http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html

看这里(或者谷歌搜索热点调整)http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html

You definitely want to profile your app before you try to tune the vm. NetBeans has a nice profiler built into it that will let you see all sorts of things.

在尝试调整虚拟机之前,您肯定想要分析您的应用。 NetBeans内置了一个很好的分析器,可以让你看到各种各样的东西。

I once had someone tell me that the GC was broken for their app - I looked at the code and found that they never closed any of their database query results so they were retaining massive amounts of byte arrays. Once we closed the results the time went from over 20 mins and a GB of memory to about 2 mins and a very small amount of memory. They were able to remove the JVM tuning parameters and things were happy.

我曾经有人告诉我,他们的应用程序中断了GC - 我查看了代码,发现他们从未关闭任何数据库查询结果,因此他们保留了大量的字节数组。一旦我们关闭结果,时间从超过20分钟和GB内存到大约2分钟和非常少量的内存。他们能够删除JVM调整参数,事情很开心。

#4


1  

I suggest you profile your application with CPU sampling and object allocation monitoring turned on at the same time. You will find you get very different results which can be helpful in tuning your code. Also try using the built in hprof profiler, it can give very different results as well.

我建议您在同时启用CPU采样和对象分配监视的情况下分析您的应用程序。您将发现得到的结果非常不同,这有助于调整代码。另外,尝试使用内置的hprof分析器,它也可以提供非常不同的结果。

In general profiling your application makes much more difference than JVM args.

一般来说,分析应用程序比JVM args有更大的不同。

#5


1  

The absolute best way to answer this is to perform controlled testing on the application in as close to a 'production' environment as you can create. It's quite possible that the use of -server, a reasonable starting heap size and the relatively smart behavior of recent JVMs will behave as well or better than the vast majority of settings one would normally try.

回答这个问题的绝对最佳方法是在尽可能接近“生产”环境的情况下对应用程序执行受控测试。使用-server,合理的起始堆大小以及最近JVM的相对智能行为很可能会比通常尝试的绝大多数设置表现得更好或更好。

There is one specific exception to this broad generalization: in the case that you are running in a web container, there is a really high chance that you will want to increase the permanent generation settings.

这种广泛的概括有一个特定的例外:如果您在Web容器中运行,那么您很可能希望增加永久生成设置。

#6


1  

Java on 32-bit windows machine, your choices are limited. In my experience, the follow parameter setting will impact the application performance:

在32位Windows机器上的Java,您的选择是有限的。根据我的经验,以下参数设置将影响应用程序性能:

  1. memory sizes
  2. choice of GC collectors
  3. 选择GC收集器

  4. parameters related to GC collectors
  5. 与GC收集器相关的参数

#7


0  

This will be highly dependent on your application and the vendor and version of the JVM. You need to be clear about what you consider to be a performance problem. Are you concerned with certain critical sections of code? Have you profiled the app yet? Is the JVM spending too much time garbage collecting?

这将高度依赖于您的应用程序以及JVM的供应商和版本。您需要明确您认为的性能问题。您是否关注代码的某些关键部分?你有没有想过应用程序? JVM是否花费太多时间进行垃圾收集?

I would probably start with the -verbose:gc JVM option to watch how garbage collecting is working. Many times, the simplest fix to just increase the max heap size with -Xmx . If you learn to interpret the -verbose:gc output, it will tell you nearly all you need to know about tuning the JVM as a whole. But doing this alone will not magically make badly tuned code just go faster. Most of the JVM tuning options are designed to improve the performance of the garbage collector and/or memory sizes.

我可能会从-verbose:gc JVM选项开始,观察垃圾收集是如何工作的。很多时候,最简单的解决方法是使用-Xmx增加最大堆大小。如果您学习解释-verbose:gc输出,它将告诉您几乎所有关于调整整个JVM的知识。但单独做这件事并不会让错误调整的代码变得更快。大多数JVM调优选项旨在提高垃圾收集器和/或内存大小的性能。

For profiling, I like yourkit.com

对于分析,我喜欢yourkit.com