Java 8 VM GC Tunning Guide Charter 7-8-b

时间:2023-03-09 04:07:11
Java 8 VM GC Tunning Guide Charter 7-8-b

第七章 并发gc

Java 8提供两种并发gc,CMS和G1

Concurrent Mark Sweep (CMS) Collector

This collector is for applications that prefer shorter garbage collection pauses and can afford to share processor resources with the garbage collection.

CMS GC适用于偏好尽可能短的gc时间(因为gc会造成程序暂停)和能承担共享处理器资源从而带来开销的应用程序。

Garbage First GC (G1)

This server-style collector is for multiprocessor machines with large memories. It meets garbage collection pause time goals with high probability while achieving high throughput.

G1是服务器端gc,适用于大内存多进程处理的程序。G1 gc适用于在满足吞吐量的目标基础上,尽可能同时满足暂停时间的应用程序。

Overhead of Concurrency

The mostly concurrent collector trades processor resources (which would otherwise be available to the application) for shorter major collection pause times. The most visible overhead is the use of one or more processors during the concurrent parts of the collection. On an N processor system, the concurrent part of the collection will use K/N of the available processors, where 1<=K<=ceiling{N/4}. (Note that the precise choice of and bounds on K are subject to change.) In addition to the use of processors during concurrent phases, additional overhead is incurred to enable concurrency. Thus while garbage collection pauses are typically much shorter with the concurrent collector, application throughput also tends to be slightly lower than with the other collectors.

并发gc牺牲处理器资源来换取更短的程序暂停时间。在一个N核的系统上,并发回收时使用的核心个数为1<=K<={N/4}(向上取整)。尽管并发gc有更短的暂停时间,但是总体应用程序的吞吐量会有下降。

On a machine with more than one processing core, processors are available for application threads during the concurrent part of the collection, so the concurrent garbage collector thread does not "pause" the application. This usually results in shorter pauses, but again fewer processor resources are available to the application and some slowdown should be expected, especially if the application uses all of the processing cores maximally. As N increases, the reduction in processor resources due to concurrent garbage collection becomes smaller, and the benefit from concurrent collection increases. The section Concurrent Mode Failure in Concurrent Mark Sweep (CMS) Collector discusses potential limits to such scaling.

CMS这种gc,要求机器核数越多越好(多余的核用来处理并发gc),这样占用资源的劣势就容易被抵消;相反,如果是一个核数较少的机器,因为资源被占用,可能会导致程序处理能力的下降。

Because at least one processor is used for garbage collection during the concurrent phases, the concurrent collectors do not normally provide any benefit on a uniprocessor (single-core) machine. However, there is a separate mode available for CMS (not G1) that can achieve low pauses on systems with only one or two processors; see Incremental Mode in Concurrent Mark Sweep (CMS) Collector for details. This feature is being deprecated in Java SE 8 and may be removed in a later major release.

在单核机器上,并发gc的表现惨不忍睹。但是有一个CMS的隔离模式也许可以有改善。这个模式叫做CMS的Incremental模式。遗憾的是,从Java 8 开始,这个模式就慢慢被放弃了,以后可能就看不见了。

第八章 并发gc - CMS

The Concurrent Mark Sweep (CMS) collector is designed for applications that prefer shorter garbage collection pauses and that can afford to share processor resources with the garbage collector while the application is running. Typically applications that have a relatively large set of long-lived data (a large tenured generation) and run on machines with two or more processors tend to benefit from the use of this collector. However, this collector should be considered for any application with a low pause time requirement. The CMS collector is enabled with the command-line option -XX:+UseConcMarkSweepGC.

CMS gc的设计意图针对需要尽可能短的暂停时间的程序(并且能够接受和gc共享处理器资源)。一般这样的程序都有相当大的活动对象数据,并且至少运行在两核以上的机器上(当然其他应用可以考虑使用CMS来缩短暂停时间)。CMS gc使用命令行参数-XX:+UseConcMarkSweepGC来指定。

Similar to the other available collectors, the CMS collector is generational; thus both minor and major collections occur. The CMS collector attempts to reduce pause times due to major collections by using separate garbage collector threads to trace the reachable objects concurrently with the execution of the application threads. During each major collection cycle, the CMS collector pauses all the application threads for a brief period at the beginning of the collection and again toward the middle of the collection. The second pause tends to be the longer of the two pauses. Multiple threads are used to do the collection work during both pauses. The remainder of the collection (including most of the tracing of live objects and sweeping of unreachable objects) is done with one or more garbage collector threads that run concurrently with the application. Minor collections can interleave with an ongoing major cycle, and are done in a manner similar to the parallel collector (in particular, the application threads are stopped during minor collections).

CMS gc也是分代管理内存,也有minor gc和major gc。CMS减少major gc暂停时间的主要方式是使用相隔离的多线程在程序运行的时候就开始跟踪活跃对象(这和Parallel gc不一样,Parallel gc要求程序运行线程暂停)。在每个major gc周期内,CMS会在开始回收和回收过程中,短暂的暂停一下程序,第二次暂停比第一次稍长。在两次暂停期间,使用多线程来回收对象。Minor gc和major gc交错执行。

Concurrent Mode Failure

并发模式失败

The CMS collector uses one or more garbage collector threads that run simultaneously with the application threads with the goal of completing the collection of the tenured generation before it becomes full. As described previously, in normal operation, the CMS collector does most of its tracing and sweeping work with the application threads still running, so only brief pauses are seen by the application threads. However, if the CMS collector is unable to finish reclaiming the unreachable objects before the tenured generation fills up, or if an allocation cannot be satisfied with the available free space blocks in the tenured generation, then the application is paused and the collection is completed with all the application threads stopped. The inability to complete a collection concurrently is referred to as concurrent mode failure and indicates the need to adjust the CMS collector parameters. If a concurrent collection is interrupted by an explicit garbage collection (System.gc()) or for a garbage collection needed to provide information for diagnostic tools, then a concurrent mode interruption is reported.

如果在老生代满了之前仍然不能完成回收工作,或者应用程序不能分配到足够的可用空间,那么VM仍然会暂停掉所有程序线程的执行进行回收工作。这种暂停被称为并发模式失败,意味着某些CMS的参数需要调整。如果外部回收命令(比如System.gc())或者一个垃圾回收诊断工具打断了CMS的执行,那么VM会报告一个并发模式被打断错误。

 

Excessive GC Time and OutOfMemoryError

GC时间超长和内存耗尽错误

The CMS collector throws an OutOfMemoryError if too much time is being spent in garbage collection: if more than 98% of the total time is spent in garbage collection and less than 2% of the heap is recovered, then an OutOfMemoryError is thrown. This feature is designed to prevent applications from running for an extended period of time while making little or no progress because the heap is too small. If necessary, this feature can be disabled by adding the option -XX:-UseGCOverheadLimit to the command line.

在以下场景CMS GC会报OutOfMemoryError错误:超过98%的时间都在gc上,并且少于2%的堆内存被回收。这种设计是为了防止因为堆内存太小而导致gc过度执行。可以使用命令行参数-XX:-UseGCOverheadLimit来阻止发生此错误信息的报告。

The policy is the same as that in the parallel collector, except that time spent performing concurrent collections is not counted toward the 98% time limit. In other words, only collections performed while the application is stopped count toward excessive GC time. Such collections are typically due to a concurrent mode failure or an explicit collection request (for example, a call to System.gc).

和并行gc(parallel gc)计算98%的这个时间阀值的计算方式不一样的是,CMS只计算真正的程序暂停的时间,并发跟踪对象的时间是不计算在内的。而程序线程的暂停主要都是因为调用系统外部gc(System.gc())引起的。

Floating Garbage

游离的垃圾对象

The CMS collector, like all the other collectors in Java HotSpot VM, is a tracing collector that identifies at least all the reachable objects in the heap. In the parlance of Richard Jones and Rafael D. Lins in their publication Garbage Collection: Algorithms for Automated Dynamic Memory, it is an incremental update collector. Because application threads and the garbage collector thread run concurrently during a major collection, objects that are traced by the garbage collector thread may subsequently become unreachable by the time collection process ends. Such unreachable objects that have not yet been reclaimed are referred to as floating garbage. The amount of floating garbage depends on the duration of the concurrent collection cycle and on the frequency of reference updates, also known as mutations, by the application. Furthermore, because the young generation and the tenured generation are collected independently, each acts a source of roots to the other. As a rough guideline, try increasing the size of the tenured generation by 20% to account for the floating garbage. Floating garbage in the heap at the end of one concurrent collection cycle is collected during the next collection cycle.

由于CMS线程采用的是对象标记方法,在标记对象的同时应用程序线程也在进行,因此就可能发生在回收期间,还有新的垃圾对象产生。这些由于并发的时间差导致出现的没有被标记为垃圾的对象叫做游离垃圾对象。这些对象所占用的空间的大小取决于垃圾回收的周期和对象引用更新的频率。粗略的估计,增加20%老生代的空间可以有效的容纳这些游离的垃圾。游离的垃圾将在下一个回收周期内被释放掉。

Pauses

关于暂停

The CMS collector pauses an application twice during a concurrent collection cycle. The first pause is to mark as live the objects directly reachable from the roots (for example, object references from application thread stacks and registers, static objects and so on) and from elsewhere in the heap (for example, the young generation). This first pause is referred to as the initial mark pause. The second pause comes at the end of the concurrent tracing phase and finds objects that were missed by the concurrent tracing due to updates by the application threads of references in an object after the CMS collector had finished tracing that object. This second pause is referred to as the remark pause.

CMS需要暂停两次。第一次标记出被引用的活动对象,这叫做初始标记暂停。第二次暂停是找出第一次标记之后因为应用程序线程更新引用而miss掉的对象(换句话说就是两次标记的差)。第二次暂停叫做重复标记暂停。

Concurrent Phases

并发的阶段

The concurrent tracing of the reachable object graph occurs between the initial mark pause and the remark pause. During this concurrent tracing phase one or more concurrent garbage collector threads may be using processor resources that would otherwise have been available to the application. As a result, compute-bound applications may see a commensurate fall in application throughput during this and other concurrent phases even though the application threads are not paused. After the remark pause, a concurrent sweeping phase collects the objects identified as unreachable. Once a collection cycle completes, the CMS collector waits, consuming almost no computational resources, until the start of the next major collection cycle.

两次暂停之间会进行多线程并发的追踪活动对象的活动。在进行这种活动的时候,gc的线程开销会消耗原本属于应用程序线程的资源,从而造成应用程序的吞吐量的下降。在第二次暂停之后,一个并发的清理阶段就立刻开始,这个阶段里,gc使用多线程将标记好的垃圾对象清理掉。当一个回收周期结束的时候,CMS将等待一会儿,然后开始下一个回收周期,这个等待的周期是几乎不消耗系统资源的。

Starting a Concurrent Collection Cycle

开始一个并发的回收周期

With the serial collector a major collection occurs whenever the tenured generation becomes full and all application threads are stopped while the collection is done. In contrast, the start of a concurrent collection must be timed such that the collection can finish before the tenured generation becomes full; otherwise, the application would observe longer pauses due to concurrent mode failure. There are several ways to start a concurrent collection.

CMS的并发回收开始必须在老生代满了之前,有下几种方式:

Based on recent history, the CMS collector maintains estimates of the time remaining before the tenured generation will be exhausted and of the time needed for a concurrent collection cycle. Using these dynamic estimates, a concurrent collection cycle is started with the aim of completing the collection cycle before the tenured generation is exhausted. These estimates are padded for safety, because concurrent mode failure can be very costly.

基于最近的历史数据启动方法,CMS保留了老生代内存即将耗尽的评估时间和一次回收需要的时间。使用这些动态评估的数据来决定何时启动。

A concurrent collection also starts if the occupancy of the tenured generation exceeds an initiating occupancy (a percentage of the tenured generation). The default value for this initiating occupancy threshold is approximately 92%, but the value is subject to change from release to release. This value can be manually adjusted using the command-line option -XX:CMSInitiatingOccupancyFraction=<N>, where <N> is an integral percentage (0 to 100) of the tenured generation size.

如果老生代内存的使用超过一定比例,那么CMS也会启动。默认的初始阀值为92%,这个值可以通过命令行参数-XX:CMSInitiatingOccupancyFraction=<N>来设定。

Scheduling Pauses

有计划的暂停

The pauses for the young generation collection and the tenured generation collection occur independently. They do not overlap, but may occur in quick succession such that the pause from one collection, immediately followed by one from the other collection, can appear to be a single, longer pause. To avoid this, the CMS collector attempts to schedule the remark pause roughly midway between the previous and next young generation pauses. This scheduling is currently not done for the initial mark pause, which is usually much shorter than the remark pause.

新生代和老生代的CMS暂停是彼此独立的(不互相交叠)。有可能的一个现象就是两个暂停在连续的时间内发生,这样会导致应用程序出现暂停时间过长的情况。为了避免这个情况,CMS尝试安排第二次暂停在新生代的两次暂停之间。

Incremental Mode

递增模式

Note that the incremental mode is being deprecated in Java SE 8 and may be removed in a future major release.

快被废弃了,不讨论了

GC日志详细输出

[GC [1 CMS-initial-mark: 13991K(20288K)] 14103K(22400K), 0.0023781 secs]

#初始标记完成

[GC [DefNew: 2112K->64K(2112K), 0.0837052 secs] 16103K->15476K(22400K), 0.0838519 secs]

...

[GC [DefNew: 2077K->63K(2112K), 0.0126205 secs] 17552K->15855K(22400K), 0.0127482 secs]

[CMS-concurrent-mark: 0.267/0.374 secs]

#并发追踪活动对象完成

[GC [DefNew: 2111K->64K(2112K), 0.0190851 secs] 17903K->16154K(22400K), 0.0191903 secs]

[CMS-concurrent-preclean: 0.044/0.064 secs]

#预清理工作完成

[GC [1 CMS-remark: 16090K(20288K)] 17242K(22400K), 0.0210460 secs]

#第二次标记完成

[GC [DefNew: 2112K->63K(2112K), 0.0716116 secs] 18177K->17382K(22400K), 0.0718204 secs]

[GC [DefNew: 2111K->63K(2112K), 0.0830392 secs] 19363K->18757K(22400K), 0.0832943 secs]

...

[GC [DefNew: 2111K->0K(2112K), 0.0035190 secs] 17527K->15479K(22400K), 0.0036052 secs]

[CMS-concurrent-sweep: 0.291/0.662 secs]

#释放内存完成

[GC [DefNew: 2048K->0K(2112K), 0.0013347 secs] 17527K->15479K(27912K), 0.0014231 secs]

[CMS-concurrent-reset: 0.016/0.016 secs]

#本次回收周期完成,等待下一个回收周期开始

[GC [DefNew: 2048K->1K(2112K), 0.0013936 secs] 17527K->15479K(27912K), 0.0014814 secs

]